JavaScript is not currently enabled, but is required for full CodeSonar manual search and browse functionality.

If you are viewing this file in your hub's Web GUI, enable JavaScript in your browser: you will also need it for GUI functionality.

If you opened this file directly from disk, your browser may be directly suppressing JavaScript functionality: certain browsers perform this suppression on local files (but not files delivered by web servers) for security reasons.

CodeSonar® 9.2p0 CONFIDENTIAL CodeSecure Inc
C and C++
Binaries

AST: Patterns

Patterns are constructs that describe AST properties. They are used for testing whether a specific AST conforms to those properties ("matches the pattern"), particularly in AST traversal.



Overview

Patterns describe ASTs using an extended version of AST syntax. The following table summarizes the forms patterns may take. For further details on any form, select the corresponding link in the table.

  Pattern Description
Wildcard ?- Matches anything.
Wild child ?# Matches any child.
Wild attribute ?@ Matches any attribute.
Pattern variable ?symbol Matches anything; binds symbol to the matched entity. If ?symbol occurs multiple times with the same symbol in the same pattern, they must all match the same entity.
Boolean (and pat1 ... patk),
(or pat1 ... patk),
(not subpat)
The Boolean operations AND, OR, and NOT, respectively.
Closure C++, Python: not supported.

C: %P
(pointers to the desired function and any additional arguments are passed separately)

Matches unless the closure returns FALSE when passed the target of the match.
Primitive primitive Many primitive values are patterns.
AST (pat0 :f1 pat1 ... :fk patk) Matches an AST of class pat0 in which each AST field fi matches pattern pati.
Unwrap (unwrap patwrap patmatch) Matches anything that matches patmatch, and anything that would match patmatch if not for syntactic differences described by patwrap.

Use

The use of AST patterns with the C++, Python, and C APIs will typically involve three steps:

  1. Given a string specification (as described in the overview, and in more detail below), obtain an AST pattern object. Note that AST classes in pattern specification strings should be referred to by their canonical names.
    API Language Pattern object type Obtain mypattern from string specification patstr:
    C++ cs::ast_pattern
    cs::ast_pattern *mypattern = new cs::ast_pattern(patstr); // constructor invocation
    
    Python cs.ast_pattern
    mypattern = cs.ast_pattern(patstr)                        # constructor invocation
    
    C cs_ast_pattern
    r = cs_ast_pattern_compile(patstr, &mypattern,...);
    /* then check r */
    
  2. Attempt to match one or more ASTs against the resulting AST pattern object. Some applications will make use of the variable bindings imposed by the match, others will be interested only in whether or not a match was found.
    API Language Binding object type Check: does mypattern match AST a? Match mypattern against AST a, with result abind:
    C++ cs::ast_bindings
    if (mypattern->match(a)){
        // ...
    
    cs::ast_bindings abind = mypattern->match_with_bindings(a); 
    
    Python Dictionary {str: ast | class:symbol | bool | int | str}, or False if there is no match.
    if mypattern.match(a):
        # ...
    
    abind =  mypattern.match(a)
    
    C cs_ast_binding
    r = cs_ast_match(a, &mypattern, NULL, 0, &needed);
    if (r == CS_SUCCESS || r == CS_TRUNCATED){ 
        /* ... */
    
    cs_ast_binding *abind = NULL;
    cs_size_t needed;
    r = cs_ast_match(a, &mypattern, NULL, 0, &needed);
    if (r == CS_SUCCESS || r == CS_TRUNCATED){
        abind = malloc(needed);
        if (abind){     /* only proceed if can allocate enough space for abind */  
            cs_ast_match(a, &mypattern, abind, needed, &needed);
        }
    }
    
    You can use AST traversal in conjunction with pattern matching to search a tree for a specific AST. This technique is illustrated in the AST tutorial.
  3. Once the AST pattern object is no longer required, clean it up.
    API Language To dispose of mypattern:
    C++
    delete mypattern;   // destructor invocation
    
    Python Generally no need to explicitly invoke destructor.
    C
    r = cs_ast_pattern_close(mypattern);
    /* then check r */
    
    Note that this will also free the variable name strings in any bindings created against mypattern.

The (ast-dynamic-*) versions of these functions are for use in cases where the patterns are not statically known: their pattern arguments must be quoted to prevent early evaluation

Examples

Listed below are some examples of patterns and a description of which ASTs they match:

Pattern String Representation Pattern Specification ASTs matched by pattern
(c:=
  :1 (c:variable :1 "x")
  :2 (c:integer-value :1 1))
"(c:=
  :1 (c:variable :1 \"x\")
  :2 (c:integer-value :1 1))"
The assignment "x = 1".
(c:=
  :1 (c:variable :1 "x"))
"(c:=
  :1 (c:variable :1 \"x\"))"
Any assignment where "x" is the left operand.
(c:=)
"(c:=)"
Any assignment.
(c:= :1 ?x)
"(c:= :1 ?x)"
Any assignment. The left operand is bound to the symbol x.
(c:= :1 ?x :2 ?x)
"(c:= :1 ?x :2 ?x)"
Any assignment where the left and right operands are identical. The operand is bound to the symbol x.
(c:=
  :2 (c:+
    :1 (c:integer-value
      :1 1)))
"(c:=
  :2 (c:+
    :1 (c:integer-value
      :1 1)))"
Assignment where the right operand is an addition, and the left operand of the addition is the integer literal 1. Note that the left hand side of the assignment expression is not mentioned and thus does not prohibit a match.
(c:=
  :1 ?x
  :2 (c:+
    :1 ?y
    :2 ?y))
"(c:=
  :1 ?x
  :2 (c:+
    :1 ?y
    :2 ?y))"
Any assignment where the right operand is an addition, and the addition's operands are identical. The left operand of the assignment is bound to the symbol x. The operands of the addition are bound to the symbol y.

Details

Wildcard

String Specification "?-"
Pattern ?-
Matches Anything whatsoever (though it cannot appear as a field specifier (like :foo), only a field value).
Notes Matches any AST, AST_CLASS, or primitive

Wild Child

String Specification "?#"
Pattern ?#
Matches Any child.

Wild Attribute

String Specification "?@"
Pattern ?@
Matches Any attribute.

Pattern Variable

String Specification "?symbol" (where symbol is any valid Scheme identifier except "-", "#", "@", or "").
Pattern ?symbol (where symbol is any valid Scheme identifier except "-", "#", "@", or "").
Matches The first (leftmost) occurrence of this subpattern in a pattern matches anything whatsoever (AST, AST_CLASS, or primitive). Subsequent occurrences match only if the target is equal to the value already bound to symbol.
Examples
pattern... ...matches... ...and binds...
?foo
anything: a primitive, an AST, or an AST_CLASS the matched entity to foo
(?foo)
any AST the AST's AST_CLASS to foo.
(cc:integer-value 
  :value (?cl :value ?foo))
a cc:integer-value object the object's integer representation to foo.
(c:= 
  :1 ?foo)
a c:= object the object's left hand side (usually a c:lvalue object) to foo.
Notes A binding is made only once but may be used any number of times.

Boolean

String Specification "(not subpattern)" matches if and only if subpattern does not match.
"(or pat1 ... patk)" matches if any subpattern pati matches. (k must be at least 2).
"(and pat1 ... patk)" matches if every subpattern pati matches. (k must be at least 2).
Patterns (not subpattern)
(or pat1 ... patk)
(and pat1 ... patk)
Example Pattern
(not (not ?b))
matches anything, and binds the matched entity to b (just as pattern ?b would).
Notes
  • The and and or patterns use short circuiting. So, for example:
    (and pat1 ... patk) if pat1 does not match then the entire pattern does not match, and pat2, pat3, ... patk will never be tested.
    (or pat1 ... patk) if pat1 matches, then the entire pattern matches, and pat2, pat3, ... patk will never be tested.
    Handling for pattern variables that are unbound because of short circuiting (or for other reasons) depends on the API implementation.
    • C++ API: The ast_bindings object returned by ast_pattern::match_with_bindings() omits unbound variables entirely.
    • C API: function cs_ast_match() always sets up an out_bindings array entry for every variable in the pattern. Unbound variables will have the .f.type field of their array entry set to csft_null.
    • Python API: The bindings dictionary returned by ast_pattern.match() omits unbound variables entirely.
  • These are patterns, and so may appear only where patterns may appear. For example,
    (c:+ (or :1 :2) (c:+))
    
    is not a valid pattern. Two valid patterns that match addition expressions in which one operand is also an addition expression are:
    (c:+ :?- (c:+))
    
    and
    (or 
      (c:+ :1 (c:+)) 
      (c:+ :2 (c:+)))
    

Closure

String Specification "%P"
(C API only: not supported in C++ and Python APIs)
Pattern (? closure)
Matches Anything, unless the closure returns FALSE when passed the the target of the match.
C Example
cs_boolean my_closure(cs_ast_field *fld, void *args){
    cs_ast_class fcl;
    cs_ast_class strcl;

    if (fld->type != csft_ast_class) return cs_false;
    fcl = fld->_.ast_class;
    if (cs_ast_class_lookup("c:+", &strcl) != CS_SUCCESS) return cs_false;

    return cs_ast_class_is_subclass(fcl,strcl);
}
/* ... */
csres = cs_ast_pattern_compile("(%P :1 (c:+))", &pattern, &patprob, &paterr, my_closure,NULL);
is equivalent to
/* ... */
csres = cs_ast_pattern_compile("(c:+ :1 (c:+))", &pattern, &patprob, &paterr);
Notes
  • Pattern variables are not bound inside closure patterns.
  • The first argument to the function is always a cs_ast_field, which might encapsulate an AST, an AST_CLASS, or a primitive.

Primitive

String Specification "prim"
where prim is #t, #f, something that strtol() can parse, something that strtod() can parse, or a string literal.

String literals must be surrounded with double quotes, and which can use any of the following escape sequences:

  • \b, \n, \r, \t, \e,
  • octal character codes of the form \0123,
  • \x for any character x (example: \").

If you are specifying patterns in a string literal, you will need to double-escape:

cs_ast_pattern_compile( "(c:string :value \"\\"Hello,\\" he said.\")", ... );

Alternately, you can use single quotes for string literals to reduce the backslashes:

cs_ast_pattern_compile( "(c:string :value '\"Hello,\" he said.')", ... ); 
Pattern Any NUMBER, SYMBOL, BOOLEAN, CHARACTER, or STRING.
Matches A primitive that is equal to it.
Example "b" is a primitive, and is also a pattern matching that primitive.

AST

String Specification "(o :f1 pat1 ... :fk patk)"
Pattern (o :f1 pat1 ... :fk patk)
Matches An AST A, where:
o is a pattern matching A's AST_CLASS. Note that this can be any class of which A is an instance, not just the most specific one.
each :fi is the name of a child or attribute of A, or is one of the special keywords :?-, :?#, :?@.
each pi is a pattern, such that
  • If :fi names a child or attribute of A, then that child or attribute matches pi.
  • If :fi is the special keyword :?- then either
    • there exists some child or attribute of A that matches the pattern pi ,
      or
    • A has no children and no attributes.
  • If :fi is the special keyword :?# then either
    • there exists some child of A that matches the pattern pi (it must be a child, not an attribute),
      or
    • A has no children.
  • If :fi is the special keyword :?@ then either
    • there exists some attribute of A that matches the pattern pi (it must be an attribute, not a child),
      or
    • A has no attributes.
k is at least zero.
Notes
  • Any number of children and attributes may be omitted from the pattern without preventing a match.
  • Patterns of the form (o :f 1 p1 ... :fk pk) can only match a full-fledged AST.

Unwrap

Pattern (unwrap patwrap patmatch)
Matches An AST A, where:
patwrap is a pattern that has ?wrapped-ast as a subpattern.
patmatch is a pattern that matches the result of unwrapping A zero or more times.
OR:
patwrap is a pattern that has ?wrapped-ast as a subpattern.
patmatch is a pattern that does not match the result of unwrapping A any number of times.

We define the result of unwrapping A to be the AST bound to wrapped-ast by patwrap. An unwrapping is therefore only possible if patwrap matches A.

Examples
pattern... ...matches...
(c:= :1 (?- :type (unwrap 
  (cc:typeref :type ?wrapped-ast)
  (cc:pointer))))
an assignment to any pointer type, even if that type is defined through a C typedef (or a chain of typedefs).
(c:= :2 (unwrap 
  (c:cast :2 ?wrapped-ast) 
  (c:integer :1 0)))
an assignment to integer 0, where 0 may be wrapped in any number of casts. For example,
x = (char *)(void *)(int **)0;
or
y = 0x0;
Notes
  • This pattern does not match any target that is not an AST.

More AST Sections

General information:

ASTs Overview of Abstract Syntax Trees in CodeSonar.
AST Class Index Index page for the AST class descriptions.
Examples of Normalized C/C++ ASTs Some sample C statements and their corresponding ASTs.
Tutorial Annotated example plug-ins (in each supported API language) that use AST patterns to implement a custom check.

Specific API languages:

C++ classes ast, ast_class, ast_pattern, ast_ordinal, ast_iterator
Python classes ast, ast_class, ast_pattern, ast_ordinal, ast_iterator
C C API: AST Functions and Types
 

To report problems with this documentation, please visit https://support.codesecure.com/.