Metadata
Metadata APIs
LibCST ships with a metadata interface that defines a standardized way to
associate nodes in a CST with arbitrary metadata while maintaining the immutability
of the tree. The metadata interface is designed to be declarative and type safe.
Here’s a quick example of using the metadata interface to get line and column
numbers of nodes through the PositionProvider:
class NamePrinter(cst.CSTVisitor):
METADATA_DEPENDENCIES = (cst.metadata.PositionProvider,)
def visit_Name(self, node: cst.Name) -> None:
pos = self.get_metadata(cst.metadata.PositionProvider, node).start
print(f"{node.value} found at line {pos.line}, column {pos.column}")
wrapper = cst.metadata.MetadataWrapper(cst.parse_module("x = 1"))
result = wrapper.visit(NamePrinter()) # should print "x found at line 1, column 0"
More examples of using the metadata interface can be found on the Metadata Tutorial.
Accessing Metadata
To work with metadata you need to wrap a module with a MetadataWrapper.
The wrapper provides a resolve() function and a
resolve_many() function to generate metadata.
- class libcst.metadata.MetadataWrapper[source]
A wrapper around a
Modulethat stores associated metadata for that module.When a
MetadataWrapperis constructed over a module, the wrapper will store a deep copy of the original module. This meansMetadataWrapper(module).module == moduleisFalse.This copying operation ensures that a node will never appear twice (by identity) in the same tree. This allows us to uniquely look up metadata for a node based on a node’s identity.
- __init__(module: Module, unsafe_skip_copy: bool = False, cache: Mapping[ProviderT, object] = {}) None[source]
- Parameters:
module – The module to wrap. This is deeply copied by default.
unsafe_skip_copy – When true, this skips the deep cloning of the module. This can provide a small performance benefit, but you should only use this if you know that there are no duplicate nodes in your tree (e.g. this module came from the parser).
cache – Pass the needed cache to wrapper to be used when resolving metadata.
- property module: Module
The module that’s wrapped by this MetadataWrapper. By default, this is a deep copy of the passed in module.
mw = ModuleWrapper(module) # Because `mw.module is not module`, you probably want to do visit and do # your analysis on `mw.module`, not `module`. mw.module.visit(DoSomeAnalysisVisitor)
- resolve(provider: Type[BaseMetadataProvider[_T]]) Mapping[CSTNode, _T][source]
Returns a copy of the metadata mapping computed by
provider.
- resolve_many(providers: Collection[ProviderT]) Mapping[ProviderT, Mapping[CSTNode, object]][source]
Returns a copy of the map of metadata mapping computed by each provider in
providers.The returned map does not contain any metadata from undeclared metadata dependencies that
providershas.
- visit(visitor: CSTVisitorT) Module[source]
Convenience method to resolve metadata before performing a traversal over
self.modulewithvisitor. Seevisit().
- visit_batched(visitors: Iterable[BatchableCSTVisitor], before_visit: Callable[[CSTNode], None] | None = None, after_leave: Callable[[CSTNode], None] | None = None) CSTNode[source]
Convenience method to resolve metadata before performing a traversal over
self.modulewithvisitors. Seevisit_batched().
If you’re working with visitors, which extend MetadataDependent,
metadata dependencies will be automatically computed when visited by a
MetadataWrapper and are accessible through
get_metadata()
- class libcst.MetadataDependent[source]
The low-level base class for all classes that declare required metadata dependencies.
CSTVisitorandCSTTransformerextend this class.- METADATA_DEPENDENCIES: ClassVar[Collection[ProviderT]] = ()
The set of metadata dependencies declared by this class.
- metadata: Mapping[ProviderT, Mapping[CSTNode, object]]
A cached copy of metadata computed by
resolve(). Prefer usingget_metadata()over accessing this attribute directly.
- classmethod get_inherited_dependencies() Collection[ProviderT][source]
Returns all metadata dependencies declared by classes in the MRO of
clsthat subclass this class.Recursively searches the MRO of the subclass for metadata dependencies.
- resolve(wrapper: MetadataWrapper) Iterator[None][source]
Context manager that resolves all metadata dependencies declared by
self(usingget_inherited_dependencies()) onwrapperand caches it onselffor use withget_metadata().Upon exiting this context manager, the metadata cache on
selfis cleared.
- get_metadata(key: ~typing.Type[BaseMetadataProvider[_T]], node: CSTNode, default: ~libcst._metadata_dependent._T = <class 'libcst._metadata_dependent._UNDEFINED_DEFAULT'>) _T[source]
Returns the metadata provided by the
keyif it is accessible from this visitor. Metadata is accessible in a subclass of this class ifkeyis declared as a dependency by any class in the MRO of this class.
Providing Metadata
Metadata is generated through provider classes that can be be passed to
MetadataWrapper.resolve() or
declared as a dependency of a MetadataDependent. These
providers are then resolved automatically using methods provided by
MetadataWrapper.
In most cases, you should extend
BatchableMetadataProvider when writing a provider,
unless you have a particular reason to not to use a batchable visitor. Only
extend from BaseMetadataProvider if your provider does
not use the visitor pattern for computing metadata for a tree.
- class libcst.BaseMetadataProvider[source]
The low-level base class for all metadata providers. This class should be extended for metadata providers that are not visitor-based.
This class is generic. A subclass of
BaseMetadataProvider[T]will provider metadata of typeT.- gen_cache: Callable[[Path, List[str], int], Mapping[str, object]] | None = None
Implement gen_cache to indicate the metadata provider depends on cache from external system. This function will be called by
FullRepoManagerto compute required cache object per file path.
- set_metadata(node: CSTNode, value: LazyValue[_ProvidedMetadataT] | _ProvidedMetadataT) None[source]
Record a metadata value
valuefornode.
- get_metadata(key: ~typing.Type[BaseMetadataProvider[_MetadataT]], node: CSTNode, default: ~libcst._metadata_dependent.LazyValue[~libcst.metadata.base_provider._ProvidedMetadataT] | ~libcst.metadata.base_provider._ProvidedMetadataT | ~typing.Type[~libcst._metadata_dependent._UNDEFINED_DEFAULT] = <class 'libcst._metadata_dependent._UNDEFINED_DEFAULT'>) _T[source]
The same method as
get_metadata()except metadata is accessed fromself._computedin addition toself.metadata. Seeget_metadata().
- class libcst.metadata.BatchableMetadataProvider[source]
The low-level base class for all batchable visitor-based metadata providers. Batchable providers should be preferred when possible as they are more efficient to run compared to non-batchable visitor-based providers. Inherits from
BatchableCSTVisitor.This class is generic. A subclass of
BatchableMetadataProvider[T]will provider metadata of typeT.
- class libcst.metadata.VisitorMetadataProvider[source]
The low-level base class for all non-batchable visitor-based metadata providers. Inherits from
CSTVisitor.This class is generic. A subclass of
VisitorMetadataProvider[T]will provider metadata of typeT.
Metadata Providers
PositionProvider,
ByteSpanPositionProvider,
WhitespaceInclusivePositionProvider,
ExpressionContextProvider,
ScopeProvider,
QualifiedNameProvider,
ParentNodeProvider, and
TypeInferenceProvider
are currently provided. Each metadata provider may has its own custom data structure.
Position Metadata
There are two types of position metadata available. They both track the same position concept, but differ in terms of representation. One represents position with line and column numbers, while the other outputs byte offset and length pairs.
Line and column numbers are available through the metadata interface by
declaring one of PositionProvider or
WhitespaceInclusivePositionProvider. For
most cases, PositionProvider is what you probably
want.
Node positions are is represented with CodeRange
objects. See the above example.
- class libcst.metadata.PositionProvider[source]
Generates line and column metadata.
These positions are defined by the start and ending bounds of a node ignoring most instances of leading and trailing whitespace when it is not syntactically significant.
The positions provided by this provider should eventually match the positions used by Pyre for equivalent nodes.
- class libcst.metadata.WhitespaceInclusivePositionProvider[source]
Generates line and column metadata.
The start and ending bounds of the positions produced by this provider include all whitespace owned by the node.
- class libcst.metadata.CodeRange[source]
- start: CodePosition
Starting position of a node (inclusive).
- end: CodePosition
Ending position of a node (exclusive).
Byte offset and length pairs can be accessed using
ByteSpanPositionProvider. This provider represents
positions using CodeSpan, which will contain the
byte offsets of a CSTNode from the start of the file, and
its length (also in bytes).
- class libcst.metadata.ByteSpanPositionProvider[source]
Generates offset and length metadata for nodes’ positions.
For each
CSTNodethis provider generates aCodeSpanthat contains the byte-offset of the node from the start of the file, and its length (also in bytes). The whitespace owned by the node is not included in this length.Note: offset and length measure bytes, not characters (which is significant for example in the case of Unicode characters encoded in more than one byte)
Expression Context Metadata
- class libcst.metadata.ExpressionContextProvider[source]
Provides
ExpressionContextmetadata (mimics the expr_context in ast) for the following node types:Attribute,Subscript,StarredElement,List,TupleandName. Note that aNamemay not always have context because of the differences between ast and LibCST. E.g.attris aNamein LibCST but a str in ast. To honor ast implementation, we don’t assign context toattr.Three context types
ExpressionContext.STORE,ExpressionContext.LOADandExpressionContext.DELare provided.
- class libcst.metadata.ExpressionContext[source]
Used in
ExpressionContextProviderto represent context of a variable reference.- LOAD = 1
Load the value of a variable reference.
>>> libcst.MetadataWrapper(libcst.parse_module("a")).resolve(libcst.ExpressionContextProvider) mappingproxy({Name( value='a', lpar=[], rpar=[], ): <ExpressionContext.LOAD: 1>})
- STORE = 2
Store a value to a variable reference by
Assign(=),AugAssign(e.g.+=,-=, etc), orAnnAssign.>>> libcst.MetadataWrapper(libcst.parse_module("a = b")).resolve(libcst.ExpressionContextProvider) mappingproxy({Name( value='a', lpar=[], rpar=[], ): <ExpressionContext.STORE: 2>, Name( value='b', lpar=[], rpar=[], ): <ExpressionContext.LOAD: 1>})
- DEL = 3
Delete value of a variable reference by
del.>>> libcst.MetadataWrapper(libcst.parse_module("del a")).resolve(libcst.ExpressionContextProvider) mappingproxy({Name( value='a', lpar=[], rpar=[], ): < ExpressionContext.DEL: 3 >})
Scope Metadata
Scopes contain and separate variables from each other. Scopes enforce that a local variable name bound inside of a function is not available outside of that function.
While many programming languages are “block-scoped”, Python is function-scoped. New scopes are created for classes, functions, and comprehensions. Other block constructs like conditional statements, loops, and try…except don’t create their own scope.
There are five different type of scope in Python:
BuiltinScope,
GlobalScope,
ClassScope,
FunctionScope, and
ComprehensionScope.
LibCST allows you to inspect these scopes to see what local variables are assigned or accessed within.
Note
Import statements bring new symbols into scope that are declared in other files.
As such, they are represented by Assignment for scope
analysis purposes. Dotted imports (e.g. import a.b.c) generate multiple
Assignment objects — one for each module. When analyzing
references, only the most specific access is recorded.
For example, the above import a.b.c statement generates three
Assignment objects: one for a, one for a.b, and
one for a.b.c. A reference for a.b.c records an access only for the last
assignment, while a reference for a.d only records an access for the
Assignment representing a.
- class libcst.metadata.ScopeProvider[source]
ScopeProvidertraverses the entire module and creates the scope inheritance structure. It provides the scope of name assignment and accesses. It is useful for more advanced static analysis. E.g. given aFunctionDefnode, we can check the type of its Scope to figure out whether it is a class method (ClassScope) or a regular function (GlobalScope).Scope metadata is available for most node types other than formatting information nodes (whitespace, parentheses, etc.).
- METADATA_DEPENDENCIES: ClassVar[Collection['ProviderT']] = (<class 'libcst.metadata.expression_context_provider.ExpressionContextProvider'>,)
The set of metadata dependencies declared by this class.
- class libcst.metadata.BaseAssignment[source]
Abstract base class of
AssignmentandBuitinAssignment.- property references: Collection[Access]
Return all accesses of the assignment.
- class libcst.metadata.Access[source]
An Access records an access of an assignment.
Note
This scope analysis only analyzes access via a
Nameor aNamenode embedded in other node likeCallorAttribute. It doesn’t support type annontation usingSimpleStringliteral for forward references. E.g. in this example, the"Tree"isn’t parsed as an access:class Tree: def __new__(cls) -> "Tree": ...
- node: Name | Attribute | BaseString
The node of the access. A name is an access when the expression context is
ExpressionContext.LOAD. This is usually the name node representing the access, except for: 1) dotted imports, when it might be the attribute that represents the most specific part of the imported symbol; and 2) string annotations, when it is the entire string literal
- scope: Scope
The scope of the access. Note that a access could be in a child scope of its assignment.
- property referents: Collection[BaseAssignment]
Return all assignments of the access.
- record_assignment(assignment: BaseAssignment) None[source]
- class libcst.metadata.Assignment[source]
An assignment records the name, CSTNode and its accesses.
- node: CSTNode
The node of assignment, it could be a
Import,ImportFrom,Name,FunctionDef, orClassDef.
- get_qualified_names_for(full_name: str) Set[QualifiedName][source]
- class libcst.metadata.BuiltinAssignment[source]
A BuiltinAssignment represents an value provide by Python as a builtin, including functions, constants, and types.
- get_qualified_names_for(full_name: str) Set[QualifiedName][source]
- class libcst.metadata.Scope[source]
Base class of all scope classes. Scope object stores assignments from imports, variable assignments, function definition or class definition. A scope has a parent scope which represents the inheritance relationship. That means an assignment in parent scope is viewable to the child scope and the child scope may overwrites the assignment by using the same name.
Use
name in scopeto check whether a name is viewable in the scope. Usescope[name]to retrieve all viewable assignments in the scope.Note
This scope analysis module only analyzes local variable names and it doesn’t handle attribute names; for example, given
a.b.c = 1, local variable nameais recorded as an assignment instead ofcora.b.c. To analyze the assignment/access of arbitrary object attributes, we leave the job to type inference metadata provider coming in the future.- globals: GlobalScope
Refers to the GlobalScope.
- abstract __contains__(name: str) bool[source]
Check if the name str exist in current scope by
name in scope.
- abstract __getitem__(name: str) Set[BaseAssignment][source]
Get assignments given a name str by
scope[name].Note
Why does it return a list of assignments given a name instead of just one assignment?
Many programming languages differentiate variable declaration and assignment. Further, those programming languages often disallow duplicate declarations within the same scope, and will often hoist the declaration (without its assignment) to the top of the scope. These design decisions make static analysis much easier, because it’s possible to match a name against its single declaration for a given scope.
As an example, the following code would be valid in JavaScript:
function fn() { console.log(value); // value is defined here, because the declaration is hoisted, but is currently 'undefined'. var value = 5; // A function-scoped declaration. } fn(); // prints 'undefined'.
In contrast, Python’s declaration and assignment are identical and are not hoisted:
if conditional_value: value = 5 elif other_conditional_value: value = 10 print(value) # possibly valid, depending on conditional execution
This code may throw a
NameErrorif both conditional values are falsy. It also means that depending on the codepath taken, the original declaration could come from eithervalue = ...assignment node. As a result, instead of returning a single declaration, we’re forced to return a collection of all of the assignments we think could have defined a given name by the time a piece of code is executed. For the above example, value would resolve to a set of both assignments.
- get_qualified_names_for(node: str | CSTNode) Collection[QualifiedName][source]
Get all
QualifiedNamein current scope given aCSTNode. The source of a qualified name can be eitherQualifiedNameSource.IMPORT,QualifiedNameSource.BUILTINorQualifiedNameSource.LOCAL. Given the following example,chas qualified namea.b.cwith sourceIMPORT,fhas qualified nameCls.fwith sourceLOCAL,ahas qualified nameCls.f.<locals>.a,ihas qualified nameCls.f.<locals>.<comprehension>.i, and the builtininthas qualified namebuiltins.intwith sourceBUILTIN:from a.b import c class Cls: def f(self) -> "c": c() a = int("1") [i for i in c()]
We extends PEP-3155 (defines
__qualname__for class and function only; function namespace is followed by a<locals>) to provide qualified name for allCSTNoderecorded byAssignmentandAccess. The namespace of a comprehension (ListComp,SetComp,DictComp) is represented with<comprehension>.An imported name may be used for type annotation with
SimpleStringand currently resolving the qualified givenSimpleStringis not supported considering it could be a complex type annotation in the string which is hard to resolve, e.g.List[Union[int, str]].
- property assignments: Assignments
Return an
Assignmentscontains all assignmens in current scope.
- class libcst.metadata.BuiltinScope[source]
A BuiltinScope represents python builtin declarations. See https://docs.python.org/3/library/builtins.html
- class libcst.metadata.GlobalScope[source]
A GlobalScope is the scope of module. All module level assignments are recorded in GlobalScope.
- class libcst.metadata.FunctionScope[source]
When a function is defined, it creates a FunctionScope.
- class libcst.metadata.ComprehensionScope[source]
Comprehensions and generator expressions create their own scope. For example, in
[i for i in range(10)]
The variable
iis only viewable within the ComprehensionScope.
- class libcst.metadata.Assignments[source]
A container to provide all assignments in a scope.
- __iter__() Iterator[BaseAssignment][source]
Iterate through all assignments by
for i in scope.assignments.
- __getitem__(node: str | CSTNode) Collection[BaseAssignment][source]
Get assignments given a name str or
CSTNodebyscope.assignments[node]
Qualified Name Metadata
Qualified name provides an unambiguous name to locate the definition of variable and it’s
introduced for class and function in PEP-3155.
QualifiedNameProvider provides possible QualifiedName given a
CSTNode.
We don’t call it fully qualified name because the name refers to the current module which doesn’t consider the hierarchy of code repository.
For fully qualified names, there’s FullyQualifiedNameProvider
which is similar to the above but takes the current module’s location (relative to some
python root folder, usually the repository’s root) into account.
- class libcst.metadata.QualifiedName[source]
-
- source: QualifiedNameSource
Source of the name, either
QualifiedNameSource.IMPORT,QualifiedNameSource.BUILTINorQualifiedNameSource.LOCAL.
- class libcst.metadata.QualifiedNameProvider[source]
Compute possible qualified names of a variable CSTNode (extends PEP-3155). It uses the
get_qualified_names_for()underlying to get qualified names. Multiple qualified names may be returned, such as when we have conditional imports or an import shadows another. E.g., the provider findsa.b,d.eandf.gas possible qualified names ofc:>>> wrapper = MetadataWrapper( >>> cst.parse_module(dedent( >>> ''' >>> if something: >>> from a import b as c >>> elif otherthing: >>> from d import e as c >>> else: >>> from f import g as c >>> c() >>> ''' >>> )) >>> ) >>> call = wrapper.module.body[1].body[0].value >>> wrapper.resolve(QualifiedNameProvider)[call], { QualifiedName(name="a.b", source=QualifiedNameSource.IMPORT), QualifiedName(name="d.e", source=QualifiedNameSource.IMPORT), QualifiedName(name="f.g", source=QualifiedNameSource.IMPORT), }
For qualified name of a variable in a function or a comprehension, please refer
get_qualified_names_for()for more detail.- METADATA_DEPENDENCIES: ClassVar[Collection['ProviderT']] = (<class 'libcst.metadata.scope_provider.ScopeProvider'>,)
The set of metadata dependencies declared by this class.
- static has_name(visitor: MetadataDependent, node: CSTNode, name: str | QualifiedName) bool[source]
Check if any of qualified name has the str name or
QualifiedNamename.
- class libcst.metadata.FullyQualifiedNameProvider[source]
Provide fully qualified names for CST nodes. Like
QualifiedNameProvider, but the providedQualifiedNameinstances have absolute identifier names instead of local to the current module.This provider is initialized with the current module’s fully qualified name, and can be used with
FullRepoManager. The module’s fully qualified name itself is stored as a metadata of theModulenode. Compared toQualifiedNameProvider, it also resolves relative imports.Example usage:
>>> mgr = FullRepoManager(".", {"dir/a.py"}, {FullyQualifiedNameProvider}) >>> wrapper = mgr.get_metadata_wrapper_for_path("dir/a.py") >>> fqnames = wrapper.resolve(FullyQualifiedNameProvider) >>> {type(k): v for (k, v) in fqnames.items()} {<class 'libcst._nodes.module.Module'>: {QualifiedName(name='dir.a', source=<QualifiedNameSource.LOCAL: 3>)}}
- METADATA_DEPENDENCIES: ClassVar[Collection['ProviderT']] = (<class 'libcst.metadata.name_provider.QualifiedNameProvider'>,)
The set of metadata dependencies declared by this class.
Parent Node Metadata
A CSTNode only has attributes link to its child nodes and thus only top-down
tree traversal is doable. Sometimes user may want to access the parent CSTNode
for more information or traverse in bottom-up manner.
We provide ParentNodeProvider for those use cases.
File Path Metadata
This provides the absolute file path on disk for any module being visited.
Requires an active FullRepoManager when using this provider.
- class libcst.metadata.FilePathProvider[source]
Provides the path to the current file on disk as metadata for the root
Modulenode. Requires aFullRepoManager. The returned path will always be resolved to an absolute path usingpathlib.Path.resolve().Example usage:
class CustomVisitor(CSTVisitor): METADATA_DEPENDENCIES = [FilePathProvider] path: pathlib.Path def visit_Module(self, node: libcst.Module) -> None: self.path = self.get_metadata(FilePathProvider, node)
>>> mgr = FullRepoManager(".", {"libcst/_types.py"}, {FilePathProvider}) >>> wrapper = mgr.get_metadata_wrapper_for_path("libcst/_types.py") >>> fqnames = wrapper.resolve(FilePathProvider) >>> {type(k): v for k, v in wrapper.resolve(FilePathProvider).items()} {<class 'libcst._nodes.module.Module'>: PosixPath('/home/user/libcst/_types.py')}
Type Inference Metadata
Type inference is to automatically infer
data types of expression for deeper understanding source code.
In Python, type checkers like Mypy or
Pyre analyze type annotations
and infer types for expressions.
TypeInferenceProvider is provided by Pyre Query API
which requires setup watchman for incremental typechecking.
FullRepoManger is built for manage the inter process communication to Pyre.
- class libcst.metadata.TypeInferenceProvider[source]
Access inferred type annotation through Pyre Query API. It requires setup watchman and start pyre server by running
pyrecommand. The inferred type is a string of type annotation. E.g.typing.List[libcst._nodes.expression.Name]is the inferred type of namenin expressionn = [cst.Name("")]. All name references use the fully qualified name regardless how the names are imported. (e.g.import libcst; libcst.Nameandimport libcst as cst; cst.Namerefer to the same name.) Pyre infers the type ofName,AttributeandCallnodes. The inter process communication to Pyre server is managed byFullRepoManager.- METADATA_DEPENDENCIES: ClassVar[Collection['ProviderT']] = (<class 'libcst.metadata.position_provider.PositionProvider'>,)
The set of metadata dependencies declared by this class.
- class libcst.metadata.FullRepoManager[source]
- __init__(repo_root_dir: str | PurePath, paths: Collection[str], providers: Collection[ProviderT], timeout: int = 5) None[source]
Given project root directory with pyre and watchman setup,
FullRepoManagerhandles the inter process communication to read the required full repository cache data for metadata provider likeTypeInferenceProvider.- Parameters:
paths – a collection of paths to access full repository data.
providers – a collection of metadata provider classes require accessing full repository data, currently supports
TypeInferenceProviderandFullyQualifiedNameProvider.timeout – number of seconds. Raises TimeoutExpired when timeout.
- property cache: Dict[ProviderT, Mapping[str, object]]
The full repository cache data for all metadata providers passed in the
providersparameter when constructingFullRepoManager. Each provider is mapped to a mapping of path to cache.
- resolve_cache() None[source]
Resolve cache for all providers that require it. Normally this is called by
get_cache_for_path()so you do not need to call it manually. However, if you intend to do a single cache resolution pass before forking, it is a good idea to call this explicitly to control when cache resolution happens.
- get_cache_for_path(path: str) Mapping[ProviderT, object][source]
Retrieve cache for a source file. The file needs to appear in the
pathsparameter when constructingFullRepoManager.manager = FullRepoManager(".", {"a.py", "b.py"}, {TypeInferenceProvider}) MetadataWrapper(module, cache=manager.get_cache_for_path("a.py"))
- get_metadata_wrapper_for_path(path: str) MetadataWrapper[source]
Create a
MetadataWrappergiven a source file path. The path needs to be a path relative to project root directory. The source code is read and parsed asModuleforMetadataWrapper.manager = FullRepoManager(".", {"a.py", "b.py"}, {TypeInferenceProvider}) wrapper = manager.get_metadata_wrapper_for_path("a.py")