Event-based probabilistic models

Background

An event-based probabilistic model is one where the events are treated as having an equal-probability within the model itself. The model relies on there being enough events in this dataset to model the uncertainty to arrive at a loss curve. This is sometimes described as a Monte-Carlo simulation and relies on ‘brute-force’.

Of particular importance to event-based probabilistic models is the number of notional event sets that the events belong to. Typically this is the number of time periods, say, ‘years’, being modelled and is required for calculating annualized losses. We call this the Stochastic Event Set (SES).

Model parameters

The pipeline examples on this page will assume your model defines two parameters:

$num_ses: the size of the SES, required for Annualized Average Loss calculations.
$loss_levels: the loss-levels you are interested in for reporting the Annual Exceedance Probability.

An example project.ini definition might look like the following. Here the default values for the parameters match our examples.

[model event-based]
framework = pipeline
location = pipeline.txt
# the size of our SES, i.e. 5 years
param.num_ses = 5
# we are interested in whether any given event exceeds the following loss bands or not
param.loss_levels = [100, 250, 500, 750, 1000]

Event loss table

Refer to Generating an event loss table for how to produce a total loss for each event, based on the hazard data you are using.

Stochastic Event Sets

Once you have a total loss for each event, you also need to ensure that each event is assigned to an Stochastic Event Set (SES). This essentially allocates each event to a calendar year that it occurred in.

The SES is important for calculating the Annualized Average Loss, as some calendar years may have multiple events, whereas other years may have no events.

The total number of SES, or years, that the probabilistic model spans should be passed to the pipeline via the num_ses parameter. This might be 100 years, 1000 years, etc.

The following step can be added after your event-loss table is produced, to assign an SES ID at random to each event, i.e. it adds an ses_id attribute to the event-loss data.

event_loss_table
  ->
# assign an arbitrary SES ID (i.e. year) to each event
select({
        *,
        round(random_uniform(0, $num_ses - 1)) as ses_id
})

Note

If your event data already has a SES assigned to each event, you could use this rather than assigning a new SES.

Annualized Average Loss

We have now grouped the losses by event. To calculate the Annualized Average Loss (AAL) we need to group the events by SES, i.e. the calendar year they occurred in.

Note that if no events occurred in any given SES (i.e. year), then that SES ID will not appear in the results. Therefore we have to make sure we have a zero loss present for any years that are missing.

To do this, we use a list because it makes the results slightly easier to work with. First we turn our per-year losses into a list, then we add back any missing years by adding zeroes to the list. The exact year that is missing does not matter for calculating the average, so long as we have the correct number of total years.

RiskScape supports list-based aggregation operations, such as mean() or stddev(), that we can then apply to the list.

Worked example

As a simple example, let’s say we had a 5-year SES and we had loss events for three of those years: [1100, 500, 800]. We will use the riskscape expr eval command so that you can see the working as we go.

First, we work out have many years are missing.

riskscape expr eval "5 - length([1100, 500, 800])"
2

We then need to create list items (of zero) for the two missing years.

riskscape expr eval "map(range(0, 2), x -> 0)"
[0, 0]

Then concatenate the missing years onto our list of losses.

riskscape expr eval "concat([1100, 500, 800], [0, 0])"
[1100, 500, 800, 0, 0]

Finally, we take the standard deviation (and mean) of the resulting list of numbers.

riskscape expr eval "stddev([1100, 500, 800, 0, 0])"
{mean=480.0, stddev=486.8264577855234}

Pipeline

The following pipeline code will calculate the AAL for an event-based probabilistic model.

event_loss_table
  ->
# sum the losses by their SES ID to get yearly losses
group(
    by: ses_id,
    select: {
        sum(total_loss) as per_year_loss
    })
  ->
# turn the yearly losses into a list, so it's easier to work with
group(
    select: {
        to_list(per_year_loss) as yearly_losses
    })
  ->
# work out how many SES IDs we are missing, i.e. years with no losses
select({*,
        $num_ses - length(yearly_losses) as missing_years
    })
  ->
# pad out the list with zero losses to match the total number of SES
select({
        concat(yearly_losses, map(range(0, missing_years), x -> 0)) as losses_all_years
    })
  ->
# use list-based aggregation to get the stddev (this returns the mean too)
select({
        stddev(losses_all_years) as AAL,
    })
  ->
save('average-loss', format: 'csv')

Annual Exceedance Probability

To produce a loss curve, we rank the losses from highest to lowest, then assign a probability of rank divided by number of event sets.

Worked example

Given this event loss table, from an event dataset that models 5 event sets:

event	event set	loss
1	1	$1100
2	3	$500
3	4	$600
4	4	$200

We then count the losses that exceed defined loss levels.

loss level	count
100	4
250	3
500	2
750	1
1000	1

And from that, we compute the rate of exceedance (λ) for a single time period (1).

loss level	count	calculation	λ
100	4	`4 / (5 * 1)`	`0.8`
250	3	`3 / (5 * 1)`	`0.6`
500	2	`2 / (5 * 1)`	`0.4`
750	1	`1 / (5 * 1)`	`0.2`
1000	1	`1 / (5 * 1)`	`0.2`

Assuming that the events are independent, we can use a Poisson distribution to calculate the Annual Exceedance Probability (AEP) by substituting the rate (λ) and time (T) period in to 1 - exp(-λ * T).

loss level	count	λ	calculation	AEP
100	4	`0.8`	`1 - exp(-0.8 * 1)`	0.55
250	3	`0.6`	`1 - exp(-0.6 * 1)`	0.45
500	2	`0.4`	`1 - exp(-0.4 * 1)`	0.33
750	1	`0.2`	`1 - exp(-0.2 * 1)`	0.18
1000	1	`0.2`	`1 - exp(-0.2 * 1)`	0.18

A return period, or Average Recurrence Interval (ARI), can also be calculated from the rate.

loss level	count	λ	calculation	ARI
100	4	`0.8`	`1 / 0.8`	1.25
250	3	`0.6`	`1 / 0.6`	1.66
500	2	`0.4`	`1 / 0.4`	2.5
750	1	`0.2`	`1 / 0.2`	5
1000	1	`0.2`	`1 / 0.2`	5

Pipeline

The following pipeline code will produce an AEP table for an event-based probabilistic model.

First, we match every loss-level we are interested in to every event-loss result.

We then count how many events exceed each loss-level, aggregating back to the original set of loss-levels.

We then use the rate_of_exceedance to calculate the AEP and return period.

event_loss_table
  ->
# match up event-loss to every loss-level (as a list, which we then unroll)
select({*, $loss_levels as loss_level})
  ->
unnest([loss_level])
  ->
# aggregate by loss-level across all events and count how many events exceeded the loss-level
group(
    by: loss_level,
    select: {
      loss_level,
      count(total_loss >= loss_level) as count,
      count(total_loss >= loss_level) / $num_ses as rate_of_exceedance,
   })
  ->
# convert the rate of exceedance to a probability
select({*,
        annual_exceedance_probability: 1 - exp((0.0 - rate_of_exceedance) * 1),
        return_period: 1 / rate_of_exceedance
     })
  ->
sort('loss_level')
  ->
save('exceedance-table', format: 'csv')