RiskScape expression language

RiskScape uses an expression language to allow models to be customized, for example for filtering or aggregating datasets. The expression language is similar to something like a spreadsheet formula or the various bits that make up SQL.

An expression can:

filter a dataset - building.height_m > 10
compute new values - round(hazard_intensity * 0.24 * road.replacement_cost)
apply a risk modelling function - damage_ratio(building, tsunami)
be used to group values for aggregation - region, risk
be used to apply an aggregation function - sum(loss)

You can play with expressions using the riskscape expression command, e.g. riskscape expression eval 'ceil(1 + 0.7)' will print 2 to the console

Tip

This page describes the syntax and semantics of the RiskScape expression language. You may prefer to go through the How to write RiskScape expressions tutorial first, to learn about expressions through practical examples.

Language definition

Constants

The language supports declaring various simple constants, such as:

Integer - 456 - mapped internally to Java’s Long type - a 64bit signed integer
Floating - 0.212321 - mapped internally to Java’s Double type - a 64bit signed floating point number. Can be entered with scientific notation, e.g 5.27e-10.
Text - 'this is some text' - an arbitrary length string of text, surrounded by single quotes. Single quotes can be inserted in to a string by escaping them with a backslash, e.g. 'I don\'t like quotes'

Identifiers

Identifiers are a special kind of string that are used to identify various objects in an expression. An identifier is any unquoted word that begins with a letter and contains only letters, numbers and underscores (_).

Identifiers with other characters, or ones that match keywords (such as and, or or as) are valid, but must be quoted with double quotes, e.g. "My interesting thing (from space)" and "foo:bar" are valid identifiers.

Depending on how and where identifiers are used, they can:

Identify an attribute on the input data, e.g. asset.cost - asset and cost are both identifiers. In this example the dot operator . is used to access the nested attribute cost that belongs to asset
Identify a function - my_function()
Identify a named argument within a function - calc_risk(x: 12, mean: 52) where x and mean are identifiers

Lists

A fixed length list can be declared to create an ordered list of values. Some functions use lists as input, or you may want to produce a list in your output. A list is declared in an expression like so - [1, 1, 2, 3, 5, 8, 13]. Elements are surrounded by square brackets and separated by commas. Whitespace between the elements is not necessary, but improves readability.

A list can be filled with anything you like, but the type of the list will depend on what you fill it with. The type of thing inside the list is referred to as the contained type

[1.0, 1.1, 2.1] has type List(Floating)
[1, 2.0, 'hello', cost, my_function(1, impact)] has type List(Anything)

NB: If any or all elements of the list are nullable, then the the contained type will be nullable.

Tuples

A tuple in RiskScape is an ordered, named, typed list of values. You can think of a tuple as a row in a database table. Members of tuples in RiskScape are accessed using identifiers, e.g. cost. It is worth noting that all expressions are evaluated against a tuple, with the tuple being a particular row in the dataset that is being evaluated at a particular point. For example:

When setting up a bookmark, map-attribute expressions are evaluated against “raw” rows of the dataset being bookmarked
When filtering rows, the filter expression is evaluated against whatever data is in the pipeline at that point, e.g. (asset.cost_dollars > 100000) and (loss.total_loss > 0)
As well as being evaluated by an expression, a tuple can be declared by an expression. Tuple expressions can use keyword or as syntax e.g.
- {height: asset.height, count: 0, cost_dollars: asset.cost_cents / 100}
- {asset.height as height, 0 as count, asset.cost_cents / 100 as cost_dollars}

Functions

As you have seen in the previous examples, expressions can call RiskScape functions by referring to their ID within your project. RiskScape comes by default with some common functions for dealing with numbers, text and geometry.

You can query the built-in functions available using the riskscape function list command. To see all functions, use riskscape function list --all. To view a particular category of function, such as all maths functions, use riskscape function list --category maths. For more help, see riskscape function list --help.

An expression can call a function by giving its ID as an identifier, followed by a bracketed list of arguments, e.g. a function that:

Takes some arguments - min(1, 2)
Takes no arguments - rand()
Takes named arguments - norm_cdf(mean: 1.2, stddev: 0.2, x: hazard.intensity)
Takes the result of another function as an argument - min(1, round(damage_ratio * 10))

Any user-defined functions can be used in your expressions just like the built-in functions. See functions for more information on user-defined RiskScape functions.

Operators

Operators are things like +, - and <. They represent some abstract mathematical operation that depends on the type of the things being operated on. RiskScape comes with some default operators and rules for applying them to expressions, but note that these behaviours can be affected by 3rd party plugins. RiskScape only supports binary operators, that is, operators that apply to two inputs, for example 1 + 2 where 1 and 2 are the inputs and + is the operator.

By default, all operators are supported for the number types (Floating and Integer). If these types are mixed, (e.g. 1 + 2.3) then the integer is converted to a floating number. If either operand is of nullable type, then the result of the expression is also nullable. If either operand is null then the expression will return null.

If an operation is not supported for the given input types, then the expression will not be valid and can not be evaluated, e.g. 1 + [1] would give an error like Could not find an operator function for operation 'PLUS' for types '[Integer, List[Integer]]'

Technically an operator is actually a function with a more convenient expression format. The Function Resolution rules all apply to operators.

Operator precedence

RiskScape applies mathematical operators with the following precedence:

bracketed expressions
exponentiation
division
multiplication
addition/subtraction
numeric comparisons (<,>…)
binary comparisons (and/or)

For example the following expressions are equivalent (resulting in: 16.0):

3 * 10 / 5 + 10
((3 * (10 / 5)) + 10

Interaction with the type system

The RiskScape expression language is strongly typed with type inference. That is, each bit of data flowing through your model is associated with a type, and functions and operators will only evaluate if the given arguments can be made to match the types supported by the function or operator. Type inference means that, most of the time, you do not need to state the types of things in your expressions, they are calculated dynamically.

Type inference and realization

When an expression is declared by a user, it is not yet ‘realized’ with any type information - only once that expression is ‘realized’ with an input type does type inference happen and the expression can be checked whether it can be evaluated or not. This realization typically happens when a model is validated, right before execution starts.

As an example, consider the following expression:

1 + (asset.cost)

While we know 1 is an integer, when RiskScape sees this expression, it doesn’t know the type of asset.cost, nor does it know if those attributes exist in the dataset we are ultimately going to be evaluating this expression against. When this expression is brought together in to a model with input data and realized, we infer the type of the expression by “filling in the gaps” - in the example given this means looking up asset.cost to see if it exists in the input type, and then using its type to determine whether plus is supported - more on this in the next sections.

In our example, asset.cost is an integer, then the expression with its inferred types would look like:

(1):Integer + (asset.cost):Integer

An operator exists for adding two integers, and so realization succeeds.

Function resolution

When a function call is realized in an expression, RiskScape does the following things to resolve the function against the expected argument types:

First, a function is looked up from your project using the identifier, e.g. my_risk_function(asset) will look for a function with the ID my_risk_function. If none exists, realization will fail.
If the arguments to the function match exactly, then the function matches.
If the types that the function takes are broader than the function’s arguments (also know as covariance), then the function still matches. For example, Anything is broader than Text, and Text is broader than WithinSet(Text, 'cat', 'dog', 'pig'). So a function that takes type Anything will accept 'cat' as an argument.
If the function requires a Floating argument and an Integer is provided, then the function matches and the Integer argument is converted to Floating.
If there are missing arguments but they are optional (they are nullable), then the function matches.
If any of the given arguments are Nullable, but the function does not accept nullable arguments, then the function matches, but the return type is adjusted to be nullable. If the function is called with missing arguments, then the function won’t apply and nothing is returned.
If none of these apply, the function does not match and won’t be evaluated.

Overloaded functions

Some RiskScape functions are said to be “overloaded” - this means there are multiple versions of the same function that accept a different set of types.

For example, a length function might be overloaded if it works with both List and Text types. Overloaded functions follow the same function resolution steps, except that each alternative is checked against the resolution steps listed above in order until one matches.

https://gitlab.catalyst.net.nz/riskscape/riskscape/issues/71

Realizable functions

A realizable function is one that can adapt to the list of argument types it is given to calculate a return type.

The argument types a realizable function advertises are not really used - they exist for documentation reasons only. It is up to each function to attempt to adapt itself to the given input types and return an implementation of the function that best fits the inputs. Once this is done, the function is then matched using the steps above.

https://gitlab.catalyst.net.nz/riskscape/riskscape/issues/71

See Types for a more detailed explanation of RiskScape’s Type system.

Advanced

A function call is a type of RiskScape expression. However, you can also turn a RiskScape expression into your very own function. This is somewhat like an expression ‘macro’ - you can take a verbose expression that may be repeated a lot, and turn it into a function that’s easier to call from your pipeline code. See Expression language functions for more details.