Hazard-based probabilistic models

Background

For a hazard-based model in RiskScape, the exceedance probability (or return period) of each event in the model is already known (or estimated). For example, you have a set of hazard files for a range of 10-year to 1000-year flood events.

This differs from other probabilistic models, such as event-based models, where we derive the exceedance probability from running the model itself.

The main benefit of the hazard-based approach is that you only need a small number of events to create a probabilistic output. This is useful when producing the many thousands of hazard-layers required for an event-based model is not feasible.

A hazard-based probabilistic model requires that:

each event in the model has a known recurrence interval (or return period), which is present in the hazard-layer data (or metadata).
the model’s loss function must produce monotonically increasing losses with respect to decreasing hazard frequency. In other words, less frequent events must produce a more severe loss for each and every element-at-risk that is exposed to the hazard.

Return periods and exceedance probabilities

A hazard-layer in a hazard-based model may represent the probability of an event using either a return period or an exceedance probability. This guide prefers exceedance probabilities, as the maths is clearer, but they can be converted using some simple maths.

An exceedance probability is assumed to be given as a decimal number between 0 and 1. This is interpreted as the probability of an event of this intensity or greater of occurring in any given period (typically years). An exceedance probability can be related to a return period using a probability mass function, and for this example we assume the function for the Poisson distribution is appropriate.

When using Poisson, we use the return period as our rate, a.k.a lambda (λ). A return period can be converted to an exceedance probability (EP) with the following formula:

\[EP = 1 - e^{\lambda \times T}\]

We can compute this in a RiskScape pipeline like so:

# NB this input() step is here for illustrative purposes only. You only
# need the next select() step bit when adding this to your own pipeline
input(value: {
    return_period: 1 / 10000, # 1 in 10,000 year event
    time_period: 1 # this is an annual return period
})
  ->
select({
    *,
    exceedance_probability: 1 - exp((0 - return_period) * time_period)
})

Reciprocal ARI

Often, exceedance probabilities are calculated from a return period (RP) with a simple approximation :

\[\frac{1} {RP}\]

But this is a poorer approximation than using the above Poisson based calculation. This table shows how the approximation method gets less accurate as the probability increases.

probability	reciprocal ARI	Poisson ARI
0.001%	100000	99999.49
0.1%	1000	999
25%	4	3.48
50%	2	1.44

Event loss table

Refer to Generating an event loss table for how to produce a total loss for each event, based on the hazard data you are using.

Your event-loss data should also contain an attribute that represents the return period or exceedance probability of the event. We have assumed this attribute is called exceedance_probability in our pipeline examples. For example, the group() step that produces the event-loss table might look like this:

group(
    select: {
      sum(loss) as total_loss,
      event.id  as eventid,
      event.ep  as exceedance_probability
    },
    by: event
) as event_loss_table

Note

Before the event_loss_table group step is a good point to check that your loss function is monotonic. Sorting the results at this point by exposure and loss and saving them should show that they are also sorted by (decreasing) return period. If this is not true, then your loss function is not monotonic with respect to the hazard layer and the model will not give sensible results.

Annual Exceedance Probability

Given that the exceedance probability is already present from the hazard layer, you can change the previous example to sort the pipeline by exceedance_probability instead of by eventid and this becomes your Annual Exceedance Probability table.

Annualized Average Loss

With a hazard-based probabilistic model, there are not enough data points to use the same numerical method for computing an AAL as we did for an event-based or weighted-event-based model. Instead, this method is based on the trapezoid rule for estimating the area under the ‘curve’ created by plotting each event’s exceedance probability and the loss calculated for that event.

Warning

The quality of the AAL calculated using the trapezoid method is dependent on the event data you have available. In particular, the number of events, and the upper and lower limit of AEP used. Refer to the Discussion for more details.

For example, given this EP table:

event	exceedance probability	loss
1	0.1	$1000
2	0.01	$10000
3	0.001	$100000

From this, we can plot the following curve, and using the trapezoid method we can estimate the area under the curve, which represents the AAL.

../../_images/aep-trapezoid-integration.svg

The graph above shows the shaded area (in log-log scale) under the ‘curve’ plotted from the data points in the table above. Taking a ‘slice’ between two points on the curve results in a trapezoid shape. In the diagram above a single slice is shown in blue.

The trapezoid method simply takes multiple slices between the minimum EP (i.e. 0.001) and the maximum EP (i.e. 0.1) and sums them together. This gives us the area under the curve between our first and last event data-points.

However, we are still missing two parts of the curve - the left-hand side between EP 0.0 and 0.001, and the right-hand side between EP 0.1 and 1.0 (or whenever the loss becomes $0). We cannot do anything about the missing right-hand side of the curve - this relies on having hazard data that is as close to a zero-loss event as possible. For the missing left-hand side, we know the losses are monotonically increasing, so the losses between 0.0 and 0.001 must be greater than or equal to the maximum known loss ($100,000). So essentially, we can add an extra data-point (EP=0.0, loss=$100,000) to our curve.

The trapezoid method simply involves taking the difference in x value (exceedance probability) between two data points and multiplying it by the average y value (loss). The following table shows trapezoid integration applied to our example three events, plus the additional EP=0.0 data-point we have added:

EP difference	loss average	trapezoid slice
0.1 - 0.01	($10,000 + $1,000) / 2	0.09 * 5500 = 495.0
0.01 - 0.001	($100,000 + $10,000) / 2	0.009 * 55000 = 495.0
0.001 - 0.0	($100,000 + $100,000) / 2	0.001 * 100000 = 100.0

The sum of the trapezoid slices is $495 + $495 + $100 = $1090.

Because the EP=0.0 ‘trapezoid’ slice is actually a rectangle, it can be simplified to just the maximum loss multiplied by the minimum EP. With this additional EP=0.0 term, the AAL calculation of the loss curve area is equal to:

\[AAL = f_{a}L_{a} + \sum_{RP_{n}=a}^{N} \Delta{f} \frac{L_{RP_{n}}+L_{RP_{n+1}}}{2}\]

You can see the equivalent equation being used as the first AAL implementation method in the article Improved building-specific flood risk assessment and implications of depth-damage function selection.

The following Python code produces the same result by using the numpy trapz() function. Note we explicitly add in the extra EP=0.0 term here.

import numpy
losses = [100000, 10000, 1000]
EPs = [0.001, 0.01, 0.1]
numpy.trapz(y=[max(losses)]+losses, x=[0.0]+EPs)

Note

The more event data points you have, the more accurate the AAL will become. Three events are used in this example purely for simplicity, but an AAL from only three data points is very likely to either under- or over-estimate the true annual cost.

Pipeline

To perform this annualized average loss calculation in RiskScape, we simply use the aal_trapz() aggregation function.

group(
  select: {
    aal_trapz(loss: loss, exceedance-probability: exceedance_probability) as AAL
  })
->
save('average-loss', format: 'csv')

Discussion

Using trapezoid integration to calculate the AAL essentially takes selected data-points on a loss/EP curve and ‘connects the dots’. So the accuracy of the resulting AAL is very case-dependent. It depends on the number of AEP data-points you have and the shape of the underlying loss/EP curve.

For example, if the loss/EP curve is linear, then you do not need as many points. For a non-linear loss/EP curve, the more data-points you have, the better it approximates the shape of the curve. Having more event data points should result in a more accurate AAL, as more of the area under the curve is known.

Ideally, you want a few data-points near the loss/EP pairing where loss goes above zero. This is because the higher-probability events can have a large effect on the overall AAL.

Note

RiskScape will only do trapezoid integration between EP=0.0 and the maximum EP (i.e. the most frequent event). RiskScape does not attempt to extrapolate beyond the minimum loss at all, i.e. between the minimum loss and $0.

The following table contains some example event data and shows what effect using different subsets of events can have on the overall AAL.

EP	Example 1 losses	Example 2 losses	Example 3 losses	Example 4 losses	Example 5 losses
0.4	$1m		$1m
0.2	$7m	$7m
0.1	$11m		$11m		$11m
0.05	$15m	$15m
0.02	$19m		$19m	$19m	$19m
0.01	$24m	$24m		$24m	$24m
0.005	$31m		$31m
0.002	$42m	$42m		$42m
0.001	$49m		$49m	$49m	$49m
AAL	$3.42m	$2.78m	$3.58m	$0.57m	$1.79m

Over- and under-estimating the AAL

Trapezoid integration approximates the ‘real’ loss/EP curve from a limited number of data-points, and so it may underestimate or overestimate the real AAL. It depends somewhat on the hazard, the data-points available, and what the real loss/EP curve looks like.

Typically the AAL will be underestimated if your event data-points miss out the start of the loss/EP curve (i.e. the lowest losses). For example, if the first real loss occurs for a 10-year event, but your hazard data only starts at a 100-year event, then a significant part of the loss/EP curve will not be included in the AAL.

When your data-points have good coverage of the extent of the loss/EP curve, they can still misrepresent the shape of the curve, which would then over-estimate the AAL. This happens because the event data-points are connected in a straight-line, rather than a smooth curve. The straight line will tend to sit above the ‘real’ concave curve, therefore there is more area under the curve, and thus a higher AAL.

Plotting the loss/EP curve

When applying the trapezoid method to hazard-based probabilistic data, RiskScape treats the loss as the y value and the EP as the x value. Sometimes, the converse approach is used, i.e. y is EP and x is loss. This can produce a similar AAL when you have a large number of events.

However, when you have a small number of events, the RiskScape approach of using the EP as x seems to produce more consistent AAL values. For example, say the loss was the probability of one or more fatalities occurring, and you had losses 0.99, 0.99, 0.99 for exceedance probabilities 0.001, 0.01, 0.02. Plotting loss as y and EP as x gives you an AAL of 0.0198, whereas flipping x and y (i.e. loss is x and EP is y) gives you an AAL of zero.