# Probabilistic modelling

RiskScape support for probabilistic modelling is currently limited to Pipelines.

Unlike deterministic modelling, there is no wizard to help you generate a pipeline. Instead, we have included some documented examples of writing probabilistic pipeline models.

These examples are aimed at users who already have a solid understanding of pipelines. You should complete the How to write advanced pipelines tutorial first, if you have not done so already.

This documentation is also aimed at researchers who already have a sound understanding of applying probabilistic concepts to hazard modelling.

Note

This documentation covers producing an event-loss table that contains a single *total* loss
for each *event*. Whereas with deterministic modelling in RiskScape, we have looked at
*individual* losses for each *element-at-risk*. By changing the RiskScape pipeline, it is
possible to produce finer-grain loss outputs for probabilistic models, however, these pipelines
are generally much more memory intensive to run.

## Terminology

In general, probabilistic modelling refers to a loss model that deals with some uncertainty present
in the model. In RiskScape terminology, we will use the terms *probabilistic model* and *scenario
model* to describe two different kinds of probabilistic modelling.

*Probabilistic model*: A model where a loss is calculated for*many independent*events, in order to derive probabilistic outputs, such as annualized losses and exceedance curves.*Scenario model*: A model where a loss is calculated for a*single theoretical*event, where there is uncertainty in how the event ‘plays out’, e.g. how will ground motion spread from the epicentre of an earthquake, or perhaps how various ground conditions on the day of the event will affect the way inundation spreads from a broken stop bank.

## Building a model

There are two main parts to a probabilistic model pipeline:

Generating the event-loss table, i.e. determining a single total loss for each event.

Calculating the probabilistic results, such as the AEP (Annual Exceedance Probability).

How you structure each part of your probabilistic model pipeline depends a lot on your hazard event dataset.

### Generating an event loss table

Central to any probabilistic model in RiskScape is an output called an event-loss table, which calculates the total loss from each event.

A probabilistic model typically involves a larger number of calculations than a deterministic one. Because of this, there is no one-size-fits-all approach to generating an event loss table - it depends somewhat on the input data you are using. For example, a directory with 100 GeoTIFF files will need to be processed differently to a NetCDF file with 10,000 hazard intensity readings at each site.

Pick one of the following approaches that best suits your data:

Site-based hazard data: hazard intensities for

*all*events are organized around specific sites (i.e. fixed geospatial points). This is generally the case for NetCDF or HDF5 data, where the hazard data is grid-based and a single hazard file covers all the events.Multi-file hazard data: hazard intensities for

*individual*events are grouped together, so it makes sense to process each event one at a time. Use this approach when the events are spread out over multiple hazard files, such as a set of GeoTIFF files.

Note

Both these approaches require that RiskScape can load your entire exposure-layer into memory all at once, as RiskScape needs to build an index from your exposure-layer data. This means if you have a large exposure-layer, you may be constrained by the system RAM you have available.

Once you have a pipeline that produces an event-loss table, you can then use it to calculate the probabilistic results, such as the AEP (Annual Exceedance Probability).

### Calculating the probabilistic results

Choose the approach below that best matches your input dataset, and click on the link for more details.

Event-based: each event in the input dataset is treated as having an equal-probability within the model itself. This is sometimes called a Monte-Carlo simulation.

Weighted event-based: each event in the input dataset already has an event probability or occurrence rate associated with it. A weighted event-based model provides good coverage of the range of possible events, without requiring the sheer number of events of a Monte-Carlo simulation.

Hazard-based: the input dataset contains a smaller set of events that all relate to the same hazard scenario or area of interest. Each event has a rate of occurrence (or return period) and is already ranked by monotonically increasing losses. For example, the hazard input files might be a 10-year flood, 50-year flood, 100-year flood, etc.

Tip

You can also use the union step to combine different results together.
For example, if you model the probabilistic loss for the *same* exposure-layer against several
*different* hazard sources (e.g. flood, cyclone, sea-level rise), then you could produce
a combined AEP across all hazards.

### Worked pipeline examples

Here is a recap of the pages available to help you build a probabilistic pipeline, based on the type of hazard data and probabilistic model you are using.