.. _glossary:
# Glossary

This glossary is organized into sections, so that you can see how a specific term fits into the bigger picture of how RiskScape operates.
In particular, the sections cover:
- Terms used to describe the model workflow in RiskScape.
- RiskScape-specific terms that describe more generic functionality, such as what a RiskScape function actually is, or how RiskScape is configured.
- Generic computing and GIS terms that might be helpful for understanding the domain in which RiskScape operates.

## RiskScape terms

Risk assessment is '_an assessment of the nature and extent of risk by analysing potential hazards and evaluating existing conditions of
exposure and vulnerability to determine likely consequences_'
([NEMA definition](https://www.civildefence.govt.nz/assets/Uploads/publications/National-Disaster-Resilience-Strategy/National-Disaster-Resilience-Strategy-10-April-2019.pdf)).

RiskScape is a software application designed to model these likely consequences.

The software workflow is based around: _Exposure_ + _Hazard_ = _Consequence_. We define these terms as:

.. glossary::

Consequence
  The outcome of hazards interacting with exposed elements at risk.
  The consequence will vary depending on the intensity of the hazard and the vulnerability of the element impacted.
  For example, consequences might be building damage or monetary loss.
  The consequence is derived through applying a vulnerability or fragility function (called the _Risk Function_ in RiskScape)
  that is defined by the modeller.

[Exposure](https://www.undrr.org/terminology/exposure)
  The elements at risk. These may be people, infrastructure, buildings, land use or other tangible or intangible
  assets located in hazard-prone areas. Measures of exposure can include the number of people or types of assets
  in an area and their attributes (e.g. construction types, replacement values).

[Hazard](https://www.undrr.org/terminology/hazard)
  A process, phenomenon or human activity that may cause loss of life, injury or other health impacts,
  property damage, social and economic disruption or environmental degradation.
  A hazard affects an _Exposure_ to produce a _Consequence_.

.. _model_workflow:

## RiskScape model workflow

The overall RiskScape modelling workflow can be conceptualized by the following processing phases
(each phase is described in further detail below):

.. mermaid::

    graph TD
      RD["Input Risk Data<br/><br/>The layers used by the model<br/>(exposure, hazard, etc)"];
      RD --> PRE["Geoprocessing<br /><br/>Apply geometry operations<br/> (e.g. cutting) to the exposure<br/>layer elements before any<br/>risk analysis takes place"];
      PRE --> SS["Spatial Sampling<br/><br/>Match the exposure-layer<br/>elements against data values<br/>(e.g. hazard intensity) in the<br/>other layer(s)"];
      SS --> CI["Consequence Analysis<br/><br/>Apply a function to each<br/>exposure-layer element and<br/>hazard intensity to derive<br/>a consequence"];
      CI --> POST["Post-Processing<br/><br/>For example, calculate the<br/>AEP for probabilistic models<br/>or produce loss curves"];
      POST --> REP["Reporting<br/><br/>Collate and summarize the<br/>results, select which attributes<br/> (columns) to include, and <br/>then save to file"];

The term we use for this overall process is:

Model Pipeline
  A data processing pipeline built around a pre-defined, customizable framework and
  represents the overall processing workflow in RiskScape.
  Executing (or 'running') the pipeline performs risk modelling analysis of the input risk data.

The following sections define the phases in the model pipeline, along with any associated terminology, in more detail.

### Input risk data

This involves loading the input data files provided by the user into RiskScape.
These input data layers are generally specified as RiskScape _Bookmarks_.
(Imagine your file system is a book, your ‘bookmarks’ tell RiskScape what to use and how to use it)

Area Layer
  An optional geographical dataset representing areas of interest.
  The area information gets associated with _Exposure layer_ elements for use in the _Model Pipeline_ or for reporting _Consequence_ information.
  Example area layers might be suburb or district boundaries or census areas.

Attribute
  If you think of your input data as a table in a spreadsheet, each column represents an attribute.
  Attributes give the input data its structure.
  They have a type associated with them, such as string, integer, or geometry.

Exposure layer
  A geographical dataset representing the elements at risk being modelled and their attributes.
  This may be people, infrastructure, buildings, land use and other tangible or intangible human assets.

Hazard layer
  A geographical dataset representing a _Hazard_ footprint of varying intensity.

Resource layer
  An optional geographical dataset representing information that is not represented as either a _Hazard_ or _Exposure_.
  For example, supplementary soil type information might be supplied to the _Consequence Function_ via a separate _Resource layer_.

Tuple
  If you think of your input data as a table in a spreadsheet, each row represents a tuple.
  A tuple is an ordered, named, typed list of values.
  RiskScape goes through the _Exposure layer_ input data and processes it one tuple at a time.
  The tuple is then transformed as it moves through the data processing pipeline - attributes from
  other input layers are added, values may be manipulated, and so on.

### Geoprocessing

Optional processing that uses _Geoprocessing Functions_ to transform the input data geometry in some way.
This processing essentially produces a new _Exposure layer_, and is applied before any risk analysis takes place.
For example, you could take roading data and cut the roads into 10-metre pieces before applying your risk analysis.

Geoprocessing Functions
  _Built-in functions_ that perform common geometry processing operations.
  For example, the `segment` function will cut a line or polygon so it is smaller in size,
  whereas `buffer` will add a defined area around a point, line or polygon.
  For more details on the functions available, refer to :ref:`geoprocessing_functions`.

### Spatial Sampling

The process of determining the intensity or intensities (if any) of the _Hazard_ for each element at risk in the _Exposure layer_,
based on its geospatial location.

[Coverage](https://en.wikipedia.org/wiki/Coverage_data)
  A coverage is part of the inner workings of how RiskScape matches an element at risk against another geospatial layer.
  RiskScape takes an underlying data source, such as the _Hazard layer_, and turns it into an index
  that it can lookup by any given geometry.
  Because the indexed layer could be _Raster_, _Vector_,  or some other kind of geospatial data source,
  the underlying data may be stored in a number of different ways.
  Turning the layer into a coverage gives RiskScape a common way of accessing any spatial data.

Hazard Intensity Measure
  A characteristic of a hazardous process, phenomenon or human activity.
  This is a value representing the intensity of a _Hazard_ at a specific geospatial location.
  It may be a single value (e.g. 'flood depth') or a composite value (e.g. 'flood depth' _and_ 'flood velocity').

Sampling
  The geospatial process of determining the _Hazard Intensity Measure(s)_ that correspond to the location of a given
  _Exposure layer_ element.
  RiskScape builds what is called a _Coverage_ around the _Hazard layer_, which allows RiskScape to evaluate
  the _Hazard layer_ data against a given point (e.g. the centroid of the exposure feature),
  or against a given geometry (e.g. finding all intersecting geometry between the exposure and the hazard layer).

### Consequence Analysis

To undertake consequence analysis, the user defines what _Consequence Function_ needs to be applied for the analysis
they are undertaking. The _Consequence Function_ is applied to each _Exposure layer_ element, along with any
corresponding _Hazard Intensity Measure_ and _Resource_. This produces a _Consequence layer_.

Consequence Function
  A _Function_ that allows consequences (such as severity of damage or monetary loss) to be estimated based
  on the magnitude of hazard which the element is exposed to, and the potential or lack thereof to resist impact.
  Vulnerability functions, damage functions, fragility curves and thresholds are all examples of consequence functions.
  The consequence function is generally user-defined and will be specific to what is being modelled.
  This function may use other _Maths functions_ to calculate the consequence.

Consequence Layer
  A combined layer that contains attributes of the _Exposure_, _Hazard_, _Area_, and _Resource layers_,
  as well as the resulting _Consequences_.

Maths Functions
  _Built-in functions_ that perform common statistical operations.
  For more details on the available functions, refer to :ref:`builtin_maths_functions`.

### Post-processing

Depending on what is being modelled, RiskScape may perform additional processing on the _Consequence layer_.
For example, when the _Hazard_ is a probabilistic dataset, RiskScape may also produce an Annual Exceedance Probability (AEP) table.

### Reporting

The results in the _Consequence layer_ can be specifically filtered, aggregated or sorted before the
_Output Risk Data_ file(s) are saved.

Aggregation
  RiskScape allows the user to collate and summarize the results of the _Consequence Analysis_.
  Examples of aggregation might be total loss by suburb (_Area layer_), or maximum damage for different construction-types.
  The aggregation process groups together rows of data (_Tuples_) in the _Consequence layer_ by a set of attributes or by a _RiskScape Expression_.
  An _Aggregate Function_, such as `sum` or `max`, is then applied across the grouped rows.
  If you have used SQL before, this is similar to the GROUP BY clause.

Aggregate Function
  A special type of RiskScape _Function_ that can be applied across _Tuples_ (rows).
  Aggregate functions produce a summary result for a given attribute.
  Examples of aggregate functions are `sum` or `count`.
  Refer to :ref:`aggregate_functions`: for a full list.

Filter
  A filter can remove unwanted _Tuples_ (rows) from the _Consequence layer_ data.
  Only the tuples where a given boolean _RiskScape Expression_ holds to be true will be retained.

Output Risk Data
  The final data that gets saved to your file system.
  This is the _Consequence layer_ data after any required filtering, sorting, and aggregation operations have been applied.
  The output risk data may be saved as a CSV file or in Shapefile format, depending on the data it contains and how the
  user wishes to view it (CSV loads easily into spreadsheet applications, whereas Shapefiles load easily into other GIS applications).

Sort
  Orders the data in the _Consequence layer_ based on a given set of attributes or by a _RiskScape Expression_.

Select
  Manipulates the _Attributes_ (columns) in the _Consequence layer_ data.
  This can remove unwanted attributes from the final output, modify or rename existing attributes, or
  even add new attributes to the results.

## General RiskScape terms

The following terms define _generic_ functionality in RiskScape that are used to
configure and run _Model Pipelines_.

Bookmark
  Bookmarks tell RiskScape where your input data sources are located and how to use them.
  They can also be written to associate additional _Metadata_ with the data.
  For example, a bookmark can specify the CRS for the underlying data, or describe how the data is structured (i.e. its _Type_).
  Bookmarks allow data sources to be easily used by RiskScape _Model Pipelines_.
  Note that the underlying data source is usually a file, but this is not always the case
  (the data could be hosted in the WFS web service, for example).

Built-in Function
  RiskScape provides built-in functions for common tasks, such as geoprocessing and maths operations.

Function
  A RiskScape function is a self-contained piece of code used to execute a small processing task
  as part of a larger _Pipeline_ or _Model Pipeline_ data processing workflow.
  RiskScape functions are user-facing: the user can list all possible functions available to use.
  A function can be applied to given data from any _RiskScape Expression_ - in programming terms
  this is referred to as _calling_ the function, and looks similar to using a spreadsheet
  formula, e.g. `square_root(9)`.

Home directory
  An alternative to _Project files_. Instead of specifying in a single _INI file_ all the _Project_
  information that RiskScape should use, a home directory contains multiple different INI files
  with this information.

Project
  A project is a collection of all the information RiskScape needs in order to perform the desired risk analysis.
  This may include:
  - _Types_ that describe how the risk data is structured.
  - _User-defined Functions_ that are used by the risk analysis.
  - _Bookmarks_ for the available input data files.
  - _Models_ that define a specific set of :ref:`model_workflow` operations.

Project file
  A single _INI file_ that contains all the RiskScape _Project_ information.

Pipeline
  A pipeline defines a set of data processing steps, where data (_Tuples_) are passed between the steps
  and transformed along the way. 
  RiskScape pipelines are generic and can technically be used for any type of data processing, but
  are most commonly used as _Model Pipelines_.

Pipeline Step
  An individual data-processing action within a _Pipeline_.
  Steps have a pre-defined action, such as _Sort_, and accept parameters that can customize their behaviour, such as the `by` expression to use.
  A step processes every _Tuple_ that flows through the pipeline.
  The output tuples from one step become the input tuples for the next step.
  Note that each _phase_ in the :ref:`model_workflow` can correspond to one or many pipeline steps.

RiskScape
  A software application for multi-hazard impact and risk analysis.
  The RiskScape application consists of a core _RiskScape Engine_ as well as additional _RiskScape Plugins_.

RiskScape Engine
  The core RiskScape software component that performs _Model Pipeline_ execution. 

RiskScape Plugins
  Java software components that can be incorporated into RiskScape at runtime to extend its modelling functionality.
  For example, plugins can add additional _Functions_ or _Pipeline Step_ actions suitable to a specific modelling domain.
  Plugins make RiskScape extensible, and allows users to customize its behaviour by writing their own Java components
  that 'plug in' to RiskScape.

RiskScape Expression Language
  A custom language used as the 'glue' when building _Model Pipelines_, and _Classifier Format_.
  The RiskScape Expression Language is similar to the formulas that can be entered into spreadsheet cells or used in SQL statements.
  A RiskScape expression is a simple statement that can be used to: 
  - perform a basic maths operation, e.g. `(hazard_intensity / 4) * road.replacement_cost`
  - call a _RiskScape Function_, e.g. `log(512, base: 2)`
  - declare a new _Type_, such as a _Struct_: `{ foo: 12.3, bar: 'xyz' }`
  - evaluate a boolean condition, e.g. `building.height_m > 10.0`

Struct
  A set of attributes or information that can represent an entity such as an _Exposure_, _Hazard Intensity Measure_, _Consequence_, or _Resource_.

Type
  RiskScape types define how the underlying data should be interpreted as it passes between _Pipeline Steps_ or _Functions_.
  Types define the attributes or information representing an _Exposure_, _Hazard Intensity Measure_, _Consequence_, or _Resource_.
  Types are used to define what arguments a _Function_ expects.
  RiskScape provides built-in types, such as Integer or Text, that can be used to build up more complex types, such as _Structs_.

User-defined Function
  RiskScape users can define their own _Functions_ using either _Jython_ or _Classifier Format_.

## Comparison to general risk modelling terms

If you come from a risk-modelling background, this section maps the terms
you are familiar with to the terms used in RiskScape _Model Pipelines_.

Deterministic Modelling
  A model that determines the risk from a single hazard event or scenario where no randomness is involved.

Exposure Function
  A _Consequence Function_ that simply determines whether or a given _Exposure_ was affected by the _Hazard_.
  A simple example is the `is_exposed` _Built-in Function_.

Fragility Function
  A _Consequence Function_ that estimates the probability (P)
  that a damage or impact state for an _Exposure_ will be reached or exceeded for a given _Hazard Intensity Measure_.

Impact
  In this context, can be thought of the same as _Consequence_.
  This is what is modelled for each event or scenario.

Probabilistic Model
  A _Model Pipeline_ that determines the risk from multiple _Hazard_ events or scenarios where randomness is involved.

Stochastic Model
  A _Model Pipeline_ that determines the risk from a _Hazard_ event or scenario where randomness is involved,
  i.e. a volcanic eruption of a single magnitude might be simulated for multiple environmental conditions.

Multi-Hazard Model
  A _Model Pipeline_ that determines risk when multiple _Hazard layers_ are involved.
  A _Hazard intensity measure_ is sampled from each _Hazard layer_ and the combined set of values is passed to the _Consequence Function or Functions_.

Cascading Multi-Hazard Model
  A _Model Pipeline_ that determines _Consequences_ when multiple _Hazard layers_ are involved,
  each one representing a successive hazard event scenario.
  RiskScape treats this the same as a _Multi-Hazard Model_ - how the successive hazards influence
  the _Consequence_ is controlled by how you write your _Consequence Functions and Pipeline_.

Vulnerability
  The conditions which increase the susceptibility of an _Exposure_ to direct or indirect _Impact_ from a _Hazard_
  can be encapsulated by your _Consequence Function_.
  How you model vulnerability is determined by how you write your _Consequence Function_.

## General GIS and computing terms

These are general terms that don't hold any special meaning within RiskScape,
but may provide useful context for understanding how the RiskScape software operates.

CLI
  Command Line Interface.

CI (Continuous Integration)  
  A development practice where the RiskScape development team build and test code automatically in a continuous fashion.

CRS (Coordinate Reference System)
  A coordinate reference system (or spatial reference system) is a coordinate-based local, regional
  or global system used to locate geographical entities.
  A CRS defines a specific map projection, as well as transformations between different spatial reference systems. 

CSV (Comma-Separated Values)
  A comma-separated values file (`.csv`) is a delimited text file that uses a comma to separate values.

GeoTools
  An [Open Source Java library](https://docs.geotools.org/) used by RiskScape for geoprocessing operations.

GeoTIFF
  A GeoTIFF (`.tif`) is OGC Implementation Standard GeoTIFF based on the TIFF format and is used as an interchange format for georeferenced raster imagery.

GIS (Geographic Information System)
  A framework that provides the ability to capture, manage and analyse spatial and geographic data.

GitLab
  [Platform/framework](https://about.gitlab.com/what-is-gitlab) used for managing software projects,
  such as CI, code review, and issue-tracking.

INI file
  An [INI file](https://en.wikipedia.org/wiki/INI_file) is a text-based file format that is commonly used to store basic configuration information.

[Java](https://www.java.com)
  An object-oriented, high-level, general-purpose programming language. 

[JSON](https://en.wikipedia.org/wiki/JSON)
  JavaScript Object Notation. A text-based file format commonly used to serialize data.

[Jython](https://www.jython.org)
  An Java implementation of the Python language. Jython allows seamless integration between Java and Python code.

Metadata
  A set of data that describes and gives information about other data.

OGC
  The [Open Geospatial Consortium](https://www.ogc.org) standards organization.

[Python](https://www.python.org)
  An interpreted, high-level, general-purpose programming language. 

Raster
  A matrix of cells (or pixels) organized into rows and columns (or a grid) where each cell contains a value representing information.

Raster file
  GIS file representing geographical data as a _Raster_.

Relation
  A set of _Tuples_. Whereas a Tuple is a row in a database table, a Relation is the entire table.

Shapefile
  The shapefile (`.shp`) format is a geospatial _Vector_ data format for GIS software, including RiskScape.

SRS (Spatial Reference System)
  See CRS.

Swift
  [OpenStack Object Storage](https://wiki.openstack.org/wiki/Swift). A way of loading data files into RiskScape from an OpenStack Cloud.

Vector
  A graphical representation of vertices and paths.

Vector file
  GIS file representing geographical data as vertices and paths displayed as points, lines and polygons (areas) depicting real-world features.

WFS (Web Feature Service)
  The OGC Interface Standard that allow requests for geographical features across the web using platform-independent calls.

WCS (Web Coverage Service)
  The OGC Interface Standard for the retrieval of coverage files (e.g. raster) across the web using platform-independent calls.