.. _expressions:

# RiskScape expression language

RiskScape uses an expression language to allow models to be customized, for example for
filtering or aggregating datasets.  The expression language is similar to something like
a spreadsheet formula or the various bits that make up SQL.

An expression can:

* filter a dataset - `building.height_m > 10`
* compute new values - `round(hazard_intensity * 0.24 * road.replacement_cost)`
* apply a risk modelling function - `damage_ratio(building, tsunami)`
* be used to group values for aggregation - `region, risk`
* be used to apply an aggregation function - `sum(loss)`

You can play with expressions using the `riskscape expression` command, e.g.
`riskscape expression eval 'ceil(1 + 0.7)'` will print 2 to the console

.. tip::
    This page describes the syntax and semantics of the RiskScape expression language.
    You may prefer to go through the :ref:`expression_tutorial` tutorial first,
    to learn about expressions through practical examples.

## Language definition

### Constants

The language supports declaring various simple constants, such as:

* Integer - `456` -  mapped internally to Java's Long type - a 64bit signed integer
* Floating - `0.212321` - mapped internally to Java's Double type - a 64bit signed
  floating point number. Can be entered with scientific notation, e.g `5.27e-10`.
* Text - `'this is some text'` - an arbitrary length string of text, surrounded by
  single quotes.  Single quotes can be inserted in to a string by escaping them with a backslash,
  e.g. `'I don\'t like quotes'`

### Identifiers

Identifiers are a special kind of string that are used to identify
various objects in an expression.  An identifier is any unquoted word that begins with a
letter and contains only letters, numbers and underscores (`_`).

Identifiers with other characters, or ones that match keywords (such as `and`, `or` or `as`)
are valid, but must be quoted with double quotes, e.g.
`"My interesting thing (from space)"` and `"foo:bar"` are valid identifiers.

Depending on how and where identifiers are used, they can:

* Identify an attribute on the input data, e.g. `asset.cost` - `asset` and `cost` are
  both identifiers.  In this example the dot operator `.` is used to access the nested
  attribute `cost` that belongs to `asset`
* Identify a function -  `my_function()`
* Identify a named argument within a function - `calc_risk(x: 12, mean: 52)` where `x` and `mean`
  are identifiers

### Lists

A fixed length list can be declared to create an ordered list of values.  Some functions
use lists as input, or you may want to produce a list in your output.  A list is declared
in an expression like so - `[1, 1, 2, 3, 5, 8, 13]`.  Elements are surrounded by square brackets
and separated by commas.  Whitespace between the elements is not necessary,
but improves readability.

A list can be filled with anything you like, but the type of the
list will depend on what you fill it with.  The type of thing inside the list is referred
to as the contained type

 * `[1.0, 1.1, 2.1]` has type `List(Floating)`
 * `[1, 2.0, 'hello', cost, my_function(1, impact)]` has type `List(Anything)`

NB: If any or all elements of the list are nullable, then the the contained type will be nullable.

### Tuples

A tuple in RiskScape is an ordered, named, typed list of values.  You can think of a tuple as a row
in a database table.  Members of tuples in RiskScape are accessed using identifiers, e.g. `cost`.
It is worth noting that all expressions are evaluated against a tuple, with the tuple being a particular row
in the dataset that is being evaluated at a particular point.  For example:

* When setting up a bookmark, `map-attribute` expressions are evaluated against "raw" rows of the dataset
  being bookmarked

* When filtering rows, the filter expression is evaluated against whatever data is in the pipeline
  at that point, e.g. `(asset.cost_dollars > 100000) and (loss.total_loss > 0)`

* As well as being evaluated by an expression, a tuple can be declared by an expression.
  Tuple expressions can use `keyword` or `as` syntax e.g.

  * `{height: asset.height, count: 0, cost_dollars: asset.cost_cents / 100}`
  * `{asset.height as height, 0 as count, asset.cost_cents / 100 as cost_dollars}`

### Functions

As you have seen in the previous examples, expressions can call RiskScape functions by referring to their ID within your
project.  RiskScape comes by default with some common functions for dealing with numbers, text and geometry.

You can query the built-in functions available using the `riskscape function list` command.
To see all functions, use `riskscape function list --all`.
To view a particular category of function, such as all `maths` functions, use `riskscape function list --category maths`.
For more help, see `riskscape function list --help`.

An expression can call a function by giving its ID as an identifier, followed by a bracketed list of arguments, e.g.  a
function that:

* Takes some arguments - `min(1, 2)`
* Takes no arguments - `rand()`
* Takes named arguments - `norm_cdf(mean: 1.2, stddev: 0.2, x: hazard.intensity)`
* Takes the result of another function as an argument - `min(1, round(damage_ratio * 10))`

Any user-defined functions can be used in your expressions just like the built-in functions.  See [functions](functions.html)
for more information on user-defined RiskScape functions.

### Operators

Operators are things like `+`, `-` and `<`.  They represent some abstract mathematical operation that depends on the
type of the things being operated on.  RiskScape comes with some default operators and rules for applying them to
expressions, but note that these behaviours can be affected by 3rd party plugins.  RiskScape only supports binary
operators, that is, operators that apply to two inputs, for example `1 + 2` where `1` and `2` are the inputs and `+` is
the operator.

By default, all operators are supported for the number types (Floating and Integer).  If these types are mixed,
(e.g. `1 + 2.3`) then the integer is converted to a floating number. If either operand is of nullable type, then the
result of the expression is also nullable. If either operand is null then the expression will return null.

If an operation is not supported for the given input types, then the expression will not be valid and can not be
evaluated, e.g. `1 + [1]` would give an error like `Could not find an operator function for operation 'PLUS' for types
'[Integer, List[Integer]]'`

Technically an operator is actually a function with a more convenient expression format. The
[Function Resolution](#functionresolution) rules all apply to operators.


### Operator precedence

RiskScape applies mathematical operators with the following precedence:

* bracketed expressions
* exponentiation
* division
* multiplication
* addition/subtraction
* numeric comparisons (<,>...)
* binary comparisons (and/or)

For example the following expressions are equivalent (resulting in: 16.0):

* `3 * 10 / 5 + 10`
* `((3 * (10 / 5)) + 10`

## Interaction with the type system

The RiskScape expression language is strongly typed with type inference.  That is, each bit of data flowing through your
model is associated with a type, and functions and operators will only evaluate if the given arguments can be made to
match the types supported by the function or operator.  Type inference means that, most of the time, you do not need to state the types of things in
your expressions, they are calculated dynamically.

### Type inference and realization

When an expression is declared by a user, it is not yet 'realized' with any type information - only once that expression
is 'realized' with an input type does type inference happen and the expression can be checked whether it can be
evaluated or not.  This realization typically happens when a model is validated, right before execution starts.

As an example, consider the following expression:

```none
1 + (asset.cost)
```

While we know 1 is an integer, when RiskScape sees this expression, it doesn't know the type of `asset.cost`, nor does
it know if those attributes exist in the dataset we are ultimately going to be evaluating this expression against.  When
this expression is brought together in to a model with input data and realized, we infer the type of the expression by
"filling in the gaps" - in the example given this means looking up `asset.cost` to see if it exists in the input type,
and then using its type to determine whether plus is supported - more on this in the next sections.

In our example, `asset.cost` is an integer, then the expression with its inferred types would look like:

```none
(1):Integer + (asset.cost):Integer
```

An operator exists for adding two integers, and so realization succeeds.

.. _functionresolution:

### Function resolution

When a function call is realized in an expression, RiskScape does the following things to resolve the
function against the expected argument types:

* First, a function is looked up from your project using the identifier, e.g. `my_risk_function(asset)`
  will look for a function with the ID `my_risk_function`.  If none exists, realization will fail.
* If the arguments to the function match exactly, then the function matches.
* If the types that the function takes are `broader`  than the function's arguments (also know as covariance), then
  the function still matches.  For example, `Anything` is broader than `Text`, and `Text` is broader than
  `WithinSet(Text, 'cat', 'dog', 'pig')`. So a function that takes type `Anything` will accept `'cat'` as
  an argument.
* If the function requires a `Floating` argument and an `Integer` is provided, then the function matches
  and the `Integer` argument is converted to `Floating`.
* If there are missing arguments but they are optional (they are nullable), then the function matches.
* If any of the given arguments are Nullable, but the function does not accept nullable arguments, then
  the function matches, but the return type is adjusted to be nullable. If the function is called
  with missing arguments, then the function won't apply and nothing is returned.
* If none of these apply, the function does not match and won't be evaluated.

#### Overloaded functions

Some RiskScape functions are said to be "overloaded" - this means there are multiple versions of the same function
that accept a different set of types.

For example, a `length` function might be overloaded if it works with
both `List` and `Text` types.  Overloaded functions follow the same function resolution steps, except that
each alternative is checked against the resolution steps listed above in order until one matches.

https://gitlab.catalyst.net.nz/riskscape/riskscape/issues/71

#### Realizable functions

A realizable function is one that can adapt to the list of argument types it is given to calculate a return type.

The argument types a realizable function advertises are not really used - they exist for documentation reasons
only.  It is up to each function to attempt to adapt itself to the given input types and return an implementation of
the function that best fits the inputs.  Once this is done, the function is then matched using the steps above.

https://gitlab.catalyst.net.nz/riskscape/riskscape/issues/71

See :ref:`types` for a more detailed explanation of RiskScape's Type system.