Hazard-based probabilistic models

Background

For a hazard-based model in RiskScape, the exceedance probability (or return period) of each event in the model is already known (or estimated). For example, you have a set of hazard files for a range of 10-year to 1000-year flood events.

This differs from other probabilistic models, such as event-based models, where we derive the exceedance probability from running the model itself.

The main benefit of the hazard-based approach is that you only need a small number of events to create a probabilistic output. This is useful when producing the many thousands of hazard-layers required for an event-based model is not feasible.

A hazard-based probabilistic model requires that:

  • each event in the model has a known recurrence interval (or return period), which is present in the hazard-layer data (or metadata).

  • the model’s loss function must produce monotonically increasing losses with respect to decreasing hazard frequency. In other words, less frequent events must produce a more severe loss for each and every element-at-risk that is exposed to the hazard.

Return periods and exceedance probabilities

A hazard-layer in a hazard-based model may represent the probability of an event using either a return period or an exceedance probability. This guide prefers exceedance probabilities, as the maths is clearer, but they can be converted using some simple maths.

An exceedance probability is assumed to be given as a decimal number between 0 and 1. This is interpreted as the probability of an event of this intensity or greater of occurring in any given period (typically years). An exceedance probability can be related to a return period using a probability mass function, and for this example we assume the function for the Poisson distribution is appropriate.

When using Poisson, we use the return period as our rate, a.k.a lambda (λ). A return period can be converted to an exceedance probability (EP) with the following formula:

\[EP = 1 - e^{\lambda \times T}\]

We can compute this in a RiskScape pipeline like so:

# NB this input() step is here for illustrative purposes only. You only
# need the next select() step bit when adding this to your own pipeline
input(value: {
    return_period: 1 / 10000, # 1 in 10,000 year event
    time_period: 1 # this is an annual return period
})
  ->
select({
    *,
    exceedance_probability: 1 - exp((0 - return_period) * time_period)
})

Reciprocal ARI

Often, exceedance probabilities are calculated from a return period (RP) with a simple approximation :

\[\frac{1} {RP}\]

But this is a poorer approximation than using the above Poisson based calculation. This table shows how the approximation method gets less accurate as the probability increases.

probability

reciprocal ARI

Poisson ARI

0.001%

100000

99999.49

1%

1000

999

25%

4

3.48

50%

2

1.44

Model parameters

A hazard-based model has two key parameters that affect its probabilistic outputs:

  • time_period: The time period used to relate probabilities to return periods - won’t always be used.

  • max_loss: An upper bound on a maximum loss from the event - used for AAL calculations

  • iterations: The number of trapezoid slices to construct for estimating our annual average loss

Event loss table

Refer to Generating an event loss table for how to produce a total loss for each event, based on the hazard data you are using.

Your event-loss data should also contain an attribute that represents the return period or exceedance probability of the event. We have assumed this attribute is called exceedance_probability in our pipeline examples. For example, the group() step that produces the event-loss table might look like this:

group(
    select: {
      sum(loss) as total_loss,
      event.id  as eventid,
      event.ep  as exceedance_probability
    },
    by: event
) as event_loss_table

Note

Before the event_loss_table group step is a good point to check that your loss function is monotonic. Sorting the results at this point by exposure and loss and saving them should show that they are also sorted by (decreasing) return period. If this is not true, then your loss function is not monotonic with respect to the hazard layer and the model will not give sensible results.

Annual Exceedance Probability

Given that the exceedance probability is already present from the hazard layer, you can change the previous example to sort the pipeline by exceedance_probability instead of by eventid and this becomes your Annual Exceedance Probability table.

Annualized Average Loss

With a hazard-based probabilistic model, there are not enough data points to use the same numerical method for computing an AAL as we did for an event-based or weighted-event-based model. Instead, this method relies on using the trapezoid rule for estimating the area under the ‘curve’ created by plotting each event’s exceedance probability and the loss calculated for that event.

For example, given this EP table:

event

exceedance probability

loss

1

0.1

$1000

2

0.01

$10000

3

0.001

$100000

From this, we can plot the following curve, and using the trapezoid method we can estimate the area under the curve, which represents our annualized loss.

../../_images/aep-trapezoid-integration.svg

The graph above shows the shaded area (in log-log scale) under the ‘curve’ plotted from the data points in the table above. Taking a ‘slice’ between two points on the curve results in a trapezoid shape. In the diagram above a single slice is shown in blue.

If we take multiple slices between the minimum loss (i.e. $1) and the maximum loss (i.e. the max_loss model parameter), it will cover the entire area under the curve (which is the AAL). So to calculate the AAL, we simply sum the area of all the trapezoid slices.

In our model, the number of trapezoid slices is specified by the iterations parameter. Smaller slices (i.e. a larger iterations value) will match the curve more closely, producing a more accurate result. However, you may be limited by the number of data points (i.e. events) that are plotted on the curve.

Pipeline

To perform this annualized average loss calculation in RiskScape, we first use fit_curve to fit a continuous function from our event loss dataset and, then use the trapz function to integrate the curve.

Note

Strictly speaking, we are not fitting a curve, we are building a continuous function from the data points, but this is done in the same manner as some of the true curve-fitting methods, such as power law or linear.

group(
  select: {
    loss_curve: fit_curve(
      x-value: loss
      y-value: exceedance_probability,
      # a continuous fit doesn't do any fitting per-se, but returns a continuous function
      # using the points produced by our event loss data - this is good enough for our example
      fitters: {'continuous'}
    )
  }
)
->
select(
  aal: trapz(
    function: loss_curve,
    a: 1.0,
    b: $max_loss,
    iterations: $iterations
  )
)
->
save('average-loss', format: 'csv')

In this example we need to know the maximum loss (i.e. $max_loss) ahead of time as a model. This can be computed from your event loss table with the following pipeline:

event_loss_table
->
group(select: {max(loss)})