.. _expressions: # RiskScape expression language RiskScape uses an expression language to allow models to be customized, for example for filtering or aggregating datasets. The expression language is similar to something like a spreadsheet formula or the various bits that make up SQL. An expression can: * filter a dataset - `building.height_m > 10` * compute new values - `round(hazard_intensity * 0.24 * road.replacement_cost)` * apply a risk modelling function - `damage_ratio(building, tsunami)` * be used to group values for aggregation - `region, risk` * be used to apply an aggregation function - `sum(loss)` You can play with expressions using the `riskscape expression` command, e.g. `riskscape expression eval 'ceil(1 + 0.7)'` will print 2 to the console .. tip:: This page describes the syntax and semantics of the RiskScape expression language. You may prefer to go through the :ref:`expression_tutorial` tutorial first, to learn about expressions through practical examples. ## Language definition ### Constants The language supports declaring various simple constants, such as: * Integer - `456` - mapped internally to Java's Long type - a 64bit signed integer * Floating - `0.212321` - mapped internally to Java's Double type - a 64bit signed floating point number. Can be entered with scientific notation, e.g `5.27e-10`. * Text - `'this is some text'` - an arbitrary length string of text, surrounded by single quotes. Single quotes can be inserted in to a string by escaping them with a backslash, e.g. `'I don\'t like quotes'` ### Identifiers Identifiers are a special kind of string that are used to identify various objects in an expression. An identifier is any unquoted word that begins with a letter and contains only letters, numbers and underscores (`_`). Identifiers with other characters, or ones that match keywords (such as `and`, `or` or `as`) are valid, but must be quoted with double quotes, e.g. `"My interesting thing (from space)"` and `"foo:bar"` are valid identifiers. Depending on how and where identifiers are used, they can: * Identify an attribute on the input data, e.g. `asset.cost` - `asset` and `cost` are both identifiers. In this example the dot operator `.` is used to access the nested attribute `cost` that belongs to `asset` * Identify a function - `my_function()` * Identify a named argument within a function - `calc_risk(x: 12, mean: 52)` where `x` and `mean` are identifiers ### Lists A fixed length list can be declared to create an ordered list of values. Some functions use lists as input, or you may want to produce a list in your output. A list is declared in an expression like so - `[1, 1, 2, 3, 5, 8, 13]`. Elements are surrounded by square brackets and separated by commas. Whitespace between the elements is not necessary, but improves readability. A list can be filled with anything you like, but the type of the list will depend on what you fill it with. The type of thing inside the list is referred to as the contained type * `[1.0, 1.1, 2.1]` has type `List(Floating)` * `[1, 2.0, 'hello', cost, my_function(1, impact)]` has type `List(Anything)` NB: If any or all elements of the list are nullable, then the the contained type will be nullable. ### Tuples A tuple in RiskScape is an ordered, named, typed list of values. You can think of a tuple as a row in a database table. Members of tuples in RiskScape are accessed using identifiers, e.g. `cost`. It is worth noting that all expressions are evaluated against a tuple, with the tuple being a particular row in the dataset that is being evaluated at a particular point. For example: * When setting up a bookmark, `map-attribute` expressions are evaluated against "raw" rows of the dataset being bookmarked * When filtering rows, the filter expression is evaluated against whatever data is in the pipeline at that point, e.g. `(asset.cost_dollars > 100000) and (loss.total_loss > 0)` * As well as being evaluated by an expression, a tuple can be declared by an expression. Tuple expressions can use `keyword` or `as` syntax e.g. * `{height: asset.height, count: 0, cost_dollars: asset.cost_cents / 100}` * `{asset.height as height, 0 as count, asset.cost_cents / 100 as cost_dollars}` ### Functions As you have seen in the previous examples, expressions can call RiskScape functions by referring to their ID within your project. RiskScape comes by default with some common functions for dealing with numbers, text and geometry. You can query the built-in functions available using the `riskscape function list` command. To see all functions, use `riskscape function list --all`. To view a particular category of function, such as all `maths` functions, use `riskscape function list --category maths`. For more help, see `riskscape function list --help`. An expression can call a function by giving its ID as an identifier, followed by a bracketed list of arguments, e.g. a function that: * Takes some arguments - `min(1, 2)` * Takes no arguments - `rand()` * Takes named arguments - `norm_cdf(mean: 1.2, stddev: 0.2, x: hazard.intensity)` * Takes the result of another function as an argument - `min(1, round(damage_ratio * 10))` Any user-defined functions can be used in your expressions just like the built-in functions. See [functions](functions.html) for more information on user-defined RiskScape functions. ### Operators Operators are things like `+`, `-` and `<`. They represent some abstract mathematical operation that depends on the type of the things being operated on. RiskScape comes with some default operators and rules for applying them to expressions, but note that these behaviours can be affected by 3rd party plugins. RiskScape only supports binary operators, that is, operators that apply to two inputs, for example `1 + 2` where `1` and `2` are the inputs and `+` is the operator. By default, all operators are supported for the number types (Floating and Integer). If these types are mixed, (e.g. `1 + 2.3`) then the integer is converted to a floating number. If either operand is of nullable type, then the result of the expression is also nullable. If either operand is null then the expression will return null. If an operation is not supported for the given input types, then the expression will not be valid and can not be evaluated, e.g. `1 + [1]` would give an error like `Could not find an operator function for operation 'PLUS' for types '[Integer, List[Integer]]'` Technically an operator is actually a function with a more convenient expression format. The [Function Resolution](#functionresolution) rules all apply to operators. ### Operator precedence RiskScape applies mathematical operators with the following precedence: * bracketed expressions * exponentiation * division * multiplication * addition/subtraction * numeric comparisons (<,>...) * binary comparisons (and/or) For example the following expressions are equivalent (resulting in: 16.0): * `3 * 10 / 5 + 10` * `((3 * (10 / 5)) + 10` ## Interaction with the type system The RiskScape expression language is strongly typed with type inference. That is, each bit of data flowing through your model is associated with a type, and functions and operators will only evaluate if the given arguments can be made to match the types supported by the function or operator. Type inference means that, most of the time, you do not need to state the types of things in your expressions, they are calculated dynamically. ### Type inference and realization When an expression is declared by a user, it is not yet 'realized' with any type information - only once that expression is 'realized' with an input type does type inference happen and the expression can be checked whether it can be evaluated or not. This realization typically happens when a model is validated, right before execution starts. As an example, consider the following expression: ```none 1 + (asset.cost) ``` While we know 1 is an integer, when RiskScape sees this expression, it doesn't know the type of `asset.cost`, nor does it know if those attributes exist in the dataset we are ultimately going to be evaluating this expression against. When this expression is brought together in to a model with input data and realized, we infer the type of the expression by "filling in the gaps" - in the example given this means looking up `asset.cost` to see if it exists in the input type, and then using its type to determine whether plus is supported - more on this in the next sections. In our example, `asset.cost` is an integer, then the expression with its inferred types would look like: ```none (1):Integer + (asset.cost):Integer ``` An operator exists for adding two integers, and so realization succeeds. .. _functionresolution: ### Function resolution When a function call is realized in an expression, RiskScape does the following things to resolve the function against the expected argument types: * First, a function is looked up from your project using the identifier, e.g. `my_risk_function(asset)` will look for a function with the ID `my_risk_function`. If none exists, realization will fail. * If the arguments to the function match exactly, then the function matches. * If the types that the function takes are `broader` than the function's arguments (also know as covariance), then the function still matches. For example, `Anything` is broader than `Text`, and `Text` is broader than `WithinSet(Text, 'cat', 'dog', 'pig')`. So a function that takes type `Anything` will accept `'cat'` as an argument. * If the function requires a `Floating` argument and an `Integer` is provided, then the function matches and the `Integer` argument is converted to `Floating`. * If there are missing arguments but they are optional (they are nullable), then the function matches. * If any of the given arguments are Nullable, but the function does not accept nullable arguments, then the function matches, but the return type is adjusted to be nullable. If the function is called with missing arguments, then the function won't apply and nothing is returned. * If none of these apply, the function does not match and won't be evaluated. #### Overloaded functions Some RiskScape functions are said to be "overloaded" - this means there are multiple versions of the same function that accept a different set of types. For example, a `length` function might be overloaded if it works with both `List` and `Text` types. Overloaded functions follow the same function resolution steps, except that each alternative is checked against the resolution steps listed above in order until one matches. https://gitlab.catalyst.net.nz/riskscape/riskscape/issues/71 #### Realizable functions A realizable function is one that can adapt to the list of argument types it is given to calculate a return type. The argument types a realizable function advertises are not really used - they exist for documentation reasons only. It is up to each function to attempt to adapt itself to the given input types and return an implementation of the function that best fits the inputs. Once this is done, the function is then matched using the steps above. https://gitlab.catalyst.net.nz/riskscape/riskscape/issues/71 See :ref:`types` for a more detailed explanation of RiskScape's Type system.