Hazardbased probabilistic models
Background
For a hazardbased model in RiskScape, the exceedance probability (or return period) of each event in the model is already known (or estimated). For example, you have a set of hazard files for a range of 10year to 1000year flood events.
This differs from other probabilistic models, such as eventbased models, where we derive the exceedance probability from running the model itself.
The main benefit of the hazardbased approach is that you only need a small number of events to create a probabilistic output. This is useful when producing the many thousands of hazardlayers required for an eventbased model is not feasible.
A hazardbased probabilistic model requires that:
each event in the model has a known recurrence interval (or return period), which is present in the hazardlayer data (or metadata).
the model’s loss function must produce monotonically increasing losses with respect to decreasing hazard frequency. In other words, less frequent events must produce a more severe loss for each and every elementatrisk that is exposed to the hazard.
Return periods and exceedance probabilities
A hazardlayer in a hazardbased model may represent the probability of an event using either a return period or an exceedance probability. This guide prefers exceedance probabilities, as the maths is clearer, but they can be converted using some simple maths.
An exceedance probability is assumed to be given as a decimal number between 0 and 1. This is interpreted as the probability of an event of this intensity or greater of occurring in any given period (typically years). An exceedance probability can be related to a return period using a probability mass function, and for this example we assume the function for the Poisson distribution is appropriate.
When using Poisson, we use the return period as our rate, a.k.a lambda (λ). A return period can be converted to an exceedance probability (EP) with the following formula:
We can compute this in a RiskScape pipeline like so:
# NB this input() step is here for illustrative purposes only. You only
# need the next select() step bit when adding this to your own pipeline
input(value: {
return_period: 1 / 10000, # 1 in 10,000 year event
time_period: 1 # this is an annual return period
})
>
select({
*,
exceedance_probability: 1  exp((0  return_period) * time_period)
})
Reciprocal ARI
Often, exceedance probabilities are calculated from a return period (RP) with a simple approximation :
But this is a poorer approximation than using the above Poisson based calculation. This table shows how the approximation method gets less accurate as the probability increases.
probability 
reciprocal ARI 
Poisson ARI 

0.001% 
100000 
99999.49 
0.1% 
1000 
999 
25% 
4 
3.48 
50% 
2 
1.44 
Event loss table
Refer to Generating an event loss table for how to produce a total loss for each event, based on the hazard data you are using.
Your eventloss data should also contain an attribute that represents the return period or
exceedance probability of the event.
We have assumed this attribute is called exceedance_probability
in our pipeline examples.
For example, the group()
step that produces the eventloss table might look like this:
group(
select: {
sum(loss) as total_loss,
event.id as eventid,
event.ep as exceedance_probability
},
by: event
) as event_loss_table
Note
Before the event_loss_table
group
step is a good point to check that your loss function is monotonic.
Sorting the results at this point by exposure
and loss
and saving them should show that they
are also sorted by (decreasing) return period. If this is not true, then
your loss function is not monotonic with respect to the hazard layer and the
model will not give sensible results.
Annual Exceedance Probability
Given that the exceedance probability is already present from the hazard layer,
you can change the previous example to sort the pipeline by
exceedance_probability
instead of by eventid
and this becomes your Annual
Exceedance Probability table.
Annualized Average Loss
With a hazardbased probabilistic model, there are not enough data points to use the same numerical method for computing an AAL as we did for an eventbased or weightedeventbased model. Instead, this method is based on the trapezoid rule for estimating the area under the ‘curve’ created by plotting each event’s exceedance probability and the loss calculated for that event.
Warning
The quality of the AAL calculated using the trapezoid method is dependent on the event data you have available. In particular, the number of events, and the upper and lower limit of AEP used. Refer to the Discussion for more details.
For example, given this EP table:
event 
exceedance probability 
loss 

1 
0.1 
$1000 
2 
0.01 
$10000 
3 
0.001 
$100000 
From this, we can plot the following curve, and using the trapezoid method we can estimate the area under the curve, which represents the AAL.
The graph above shows the shaded area (in loglog scale) under the ‘curve’ plotted from the data points in the table above. Taking a ‘slice’ between two points on the curve results in a trapezoid shape. In the diagram above a single slice is shown in blue.
The trapezoid method simply takes multiple slices between the minimum EP (i.e. 0.001) and the maximum EP (i.e. 0.1) and sums them together. This gives us the area under the curve between our first and last event datapoints.
However, we are still missing two parts of the curve  the lefthand side between EP 0.0 and 0.001, and the righthand side between EP 0.1 and 1.0 (or whenever the loss becomes $0). We cannot do anything about the missing righthand side of the curve  this relies on having hazard data that is as close to a zeroloss event as possible. For the missing lefthand side, we know the losses are monotonically increasing, so the losses between 0.0 and 0.001 must be greater than or equal to the maximum known loss ($100,000). So essentially, we can add an extra datapoint (EP=0.0, loss=$100,000) to our curve.
The trapezoid method simply involves taking the difference in x
value (exceedance probability)
between two data points and multiplying it by the average y
value (loss).
The following table shows trapezoid integration applied to our example three events,
plus the additional EP=0.0 datapoint we have added:
EP difference 
loss average 
trapezoid slice 

0.1  0.01 
($10,000 + $1,000) / 2 
0.09 * 5500 = 495.0 
0.01  0.001 
($100,000 + $10,000) / 2 
0.009 * 55000 = 495.0 
0.001  0.0 
($100,000 + $100,000) / 2 
0.001 * 100000 = 100.0 
The sum of the trapezoid slices is $495 + $495 + $100 = $1090.
Because the EP=0.0 ‘trapezoid’ slice is actually a rectangle, it can be simplified to just the maximum loss multiplied by the minimum EP. With this additional EP=0.0 term, the AAL calculation of the loss curve area is equal to:
You can see the equivalent equation being used as the first AAL implementation method in the article Improved buildingspecific flood risk assessment and implications of depthdamage function selection.
The following Python code produces the same result by using the numpy trapz() function. Note we explicitly add in the extra EP=0.0 term here.
import numpy
losses = [100000, 10000, 1000]
EPs = [0.001, 0.01, 0.1]
numpy.trapz(y=[max(losses)]+losses, x=[0.0]+EPs)
Note
The more event data points you have, the more accurate the AAL will become. Three events are used in this example purely for simplicity, but an AAL from only three data points is very likely to either under or overestimate the true annual cost.
Pipeline
To perform this annualized average loss calculation in RiskScape, we simply use
the aal_trapz()
aggregation function.
group(
select: {
aal_trapz(loss: loss, exceedanceprobability: exceedance_probability) as AAL
})
>
save('averageloss', format: 'csv')
Discussion
Using trapezoid integration to calculate the AAL essentially takes selected datapoints on a loss/EP curve and ‘connects the dots’. So the accuracy of the resulting AAL is very casedependent. It depends on the number of AEP datapoints you have and the shape of the underlying loss/EP curve.
For example, if the loss/EP curve is linear, then you do not need as many points. For a nonlinear loss/EP curve, the more datapoints you have, the better it approximates the shape of the curve. Having more event data points should result in a more accurate AAL, as more of the area under the curve is known.
Ideally, you want a few datapoints near the loss/EP pairing where loss goes above zero. This is because the higherprobability events can have a large effect on the overall AAL.
Note
RiskScape will only do trapezoid integration between EP=0.0 and the maximum EP (i.e. the most frequent event). RiskScape does not attempt to extrapolate beyond the minimum loss at all, i.e. between the minimum loss and $0.
The following table contains some example event data and shows what effect using different subsets of events can have on the overall AAL.
EP 
Example 1 losses 
Example 2 losses 
Example 3 losses 
Example 4 losses 
Example 5 losses 

0.4 
$1m 
$1m 

0.2 
$7m 
$7m 

0.1 
$11m 
$11m 
$11m 

0.05 
$15m 
$15m 

0.02 
$19m 
$19m 
$19m 
$19m 

0.01 
$24m 
$24m 
$24m 
$24m 

0.005 
$31m 
$31m 

0.002 
$42m 
$42m 
$42m 

0.001 
$49m 
$49m 
$49m 
$49m 

AAL 
$3.42m 
$2.78m 
$3.58m 
$0.57m 
$1.79m 
Over and underestimating the AAL
Trapezoid integration approximates the ‘real’ loss/EP curve from a limited number of datapoints, and so it may underestimate or overestimate the real AAL. It depends somewhat on the hazard, the datapoints available, and what the real loss/EP curve looks like.
Typically the AAL will be underestimated if your event datapoints miss out the start of the loss/EP curve (i.e. the lowest losses). For example, if the first real loss occurs for a 10year event, but your hazard data only starts at a 100year event, then a significant part of the loss/EP curve will not be included in the AAL.
When your datapoints have good coverage of the extent of the loss/EP curve, they can still misrepresent the shape of the curve, which would then overestimate the AAL. This happens because the event datapoints are connected in a straightline, rather than a smooth curve. The straight line will tend to sit above the ‘real’ concave curve, therefore there is more area under the curve, and thus a higher AAL.
Plotting the loss/EP curve
When applying the trapezoid method to hazardbased probabilistic data,
RiskScape treats the loss as the y
value and the EP as the x
value.
Sometimes, the converse approach is used, i.e. y
is EP and x
is loss.
This can produce a similar AAL when you have a large number of events.
However, when you have a small number of events, the RiskScape approach of using the EP as x
seems to produce more consistent AAL values. For example, say the loss was the probability
of one or more fatalities occurring, and you had losses 0.99
, 0.99
, 0.99
for
exceedance probabilities 0.001
, 0.01
, 0.02
. Plotting loss as y
and EP as x
gives you an AAL of 0.0198
,
whereas flipping x
and y
(i.e. loss is x
and EP is y
) gives you an AAL of zero.