.. _event_based_probabilistic:

# Event-based probabilistic models

## Background

An event-based probabilistic model is one where the events are treated as having
an equal-probability within the model itself. The model relies on there being enough events
in this dataset to model the uncertainty to arrive at a loss curve.  This is sometimes described
as a Monte-Carlo simulation and relies on 'brute-force'.

Of particular importance to event-based probabilistic models is the number of notional event sets that the events belong to.  Typically this
is the number of time periods, say, 'years', being modelled and is required for calculating annualized losses.
We call this the Stochastic Event Set (SES).

### Model parameters

The pipeline examples on this page will assume your model defines two parameters: 
- `$num_ses`: the size of the SES, required for Annualized Average Loss calculations.
- `$loss_levels`: the loss-levels you are interested in for reporting the Annual Exceedance Probability.

An example `project.ini` definition might look like the following.
Here the default values for the parameters match our examples.

```ini

[model event-based]
framework = pipeline
location = pipeline.txt
# the size of our SES, i.e. 5 years
param.num_ses = 5
# we are interested in whether any given event exceeds the following loss bands or not 
param.loss_levels = [100, 250, 500, 750, 1000]
```

## Event loss table

Refer to :ref:`probabilistic_event_loss` for how to produce a total loss for each event,
based on the hazard data you are using.

### Stochastic Event Sets

Once you have a total loss for each event, you also need to ensure that each event
is assigned to an Stochastic Event Set (SES).
This essentially allocates each event to a calendar year that it occurred in.

The SES is important for calculating the Annualized Average Loss, as some calendar years
may have multiple events, whereas other years may have no events.

The total number of SES, or years, that the probabilistic model spans should be passed to the
pipeline via the `num_ses` parameter. This might be 100 years, 1000 years, etc.

The following step can be added after your event-loss table is produced,
to assign an SES ID at random to each event, i.e. it adds an `ses_id` attribute to the event-loss data.

```none
event_loss_table
  ->
# assign an arbitrary SES ID (i.e. year) to each event
select({
        *,
        round(random_uniform(0, $num_ses - 1)) as ses_id
})
```

.. note::
    If your event data already has a SES assigned to each event, you could use this rather
    than assigning a new SES. 

## Annualized Average Loss

We have now grouped the losses by event. 
To calculate the Annualized Average Loss (AAL) we need to group the events by SES, i.e. the calendar year they occurred in.

Note that if _no_ events occurred in any given SES (i.e. year), then that SES ID will not appear in the results.
Therefore we have to make sure we have a _zero_ loss present for any years that are missing.

To do this, we use a list because it makes the results slightly easier to work with.
First we turn our per-year losses into a list, then we add back any missing years by adding zeroes to the list.
The _exact_ year that is missing does not matter for calculating the average, so long as we have the correct number of total years.

RiskScape supports list-based aggregation operations, such as `mean()` or `stddev()`, that we can then apply to the list.

### Worked example

As a simple example, let's say we had a 5-year SES and we had loss events for three of those years: `[1100, 500, 800]`.
We will use the `riskscape expr eval` command so that you can see the working as we go.

First, we work out have many years are missing.

```none
riskscape expr eval "5 - length([1100, 500, 800])"
2
```

We then need to create list items (of zero) for the two missing years.

```none
riskscape expr eval "map(range(0, 2), x -> 0)"
[0, 0]
```

Then concatenate the missing years onto our list of losses.

```none
riskscape expr eval "concat([1100, 500, 800], [0, 0])"
[1100, 500, 800, 0, 0]
```

Finally, we take the standard deviation (and mean) of the resulting list of numbers.

```none
riskscape expr eval "stddev([1100, 500, 800, 0, 0])"
{mean=480.0, stddev=486.8264577855234}
```

### Pipeline

The following pipeline code will calculate the AAL for an event-based probabilistic model.

```none
event_loss_table
  ->
# sum the losses by their SES ID to get yearly losses
group(
    by: ses_id,
    select: {
        sum(total_loss) as per_year_loss
    })
  ->
# turn the yearly losses into a list, so it's easier to work with
group(
    select: {
        to_list(per_year_loss) as yearly_losses
    })
  ->
# work out how many SES IDs we are missing, i.e. years with no losses
select({*,
        $num_ses - length(yearly_losses) as missing_years
    })
  ->
# pad out the list with zero losses to match the total number of SES
select({
        concat(yearly_losses, map(range(0, missing_years), x -> 0)) as losses_all_years
    })
  ->
# use list-based aggregation to get the stddev (this returns the mean too)
select({
        stddev(losses_all_years) as AAL,
    })
  ->
save('average-loss', format: 'csv')
```

.. _event_based_AEP:

## Annual Exceedance Probability

To produce a loss curve, we rank the losses from highest to lowest,
then assign a probability of rank divided by number of event sets.

### Worked example

Given this event loss table, from an event dataset that models 5 event sets:

| event | event set | loss  |
| ----- | --------- | ----  |
| 1     | 1         | $1100 |
| 2     | 3         | $500  |
| 3     | 4         | $600  |
| 4     | 4         | $200  |

We then count the losses that exceed defined loss levels.

| loss level | count |
| ---------- | ----- |
| 100        | 4     |
| 250        | 3     |
| 500        | 2     |
| 750        | 1     |
| 1000       | 1     |

And from that, we compute the rate of exceedance (λ) for a single time period (1).

| loss level | count | calculation         | λ       |
| ---------- | ----- | ------------------- | ------- |
| 100        | 4     | `4 / (5 * 1)`       | `0.8`   |
| 250        | 3     | `3 / (5 * 1)`       | `0.6`   |
| 500        | 2     | `2 / (5 * 1)`       | `0.4`   |
| 750        | 1     | `1 / (5 * 1)`       | `0.2`   |
| 1000       | 1     | `1 / (5 * 1)`       | `0.2`   |

Assuming that the events are independent,
we can use a Poisson distribution to calculate the Annual Exceedance Probability (AEP) by substituting
the rate (λ) and time (T) period in to  `1 - exp(-λ * T)`.

| loss level | count | λ       | calculation          | AEP  |
| ---------- | ----- | ------- | -------------------- | ---  |
| 100        | 4     | `0.8`   | `1 - exp(-0.8 * 1)`  | 0.55 |
| 250        | 3     | `0.6`   | `1 - exp(-0.6 * 1)`  | 0.45 |
| 500        | 2     | `0.4`   | `1 - exp(-0.4 * 1)`  | 0.33 |
| 750        | 1     | `0.2`   | `1 - exp(-0.2 * 1)`  | 0.18 |
| 1000       | 1     | `0.2`   | `1 - exp(-0.2 * 1)`  | 0.18 |

A return period, or Average Recurrence Interval (ARI), can also be calculated from the rate.

| loss level | count | λ       | calculation          | ARI  |
| ---------- | ----- | ------- | -------------------- | ---  |
| 100        | 4     | `0.8`   | `1 / 0.8`            | 1.25 |
| 250        | 3     | `0.6`   | `1 / 0.6`            | 1.66 |
| 500        | 2     | `0.4`   | `1 / 0.4`            | 2.5  |
| 750        | 1     | `0.2`   | `1 / 0.2`            | 5    |
| 1000       | 1     | `0.2`   | `1 / 0.2`            | 5    |

### Pipeline

The following pipeline code will produce an AEP table for an event-based probabilistic model.

First, we match every loss-level we are interested in to every event-loss result.

We then count how many events exceed each loss-level, aggregating back to the original set of loss-levels.

We then use the `rate_of_exceedance` to calculate the AEP and return period.

```none
event_loss_table
  -> 
# match up event-loss to every loss-level (as a list, which we then unroll)
select({*, $loss_levels as loss_level})
  ->
unnest([loss_level])
  ->
# aggregate by loss-level across all events and count how many events exceeded the loss-level
group(
    by: loss_level,
    select: {
      loss_level,
      count(total_loss >= loss_level) as count,
      count(total_loss >= loss_level) / $num_ses as rate_of_exceedance,
   })
  ->
# convert the rate of exceedance to a probability
select({*,
        annual_exceedance_probability: 1 - exp((0.0 - rate_of_exceedance) * 1),
        return_period: 1 / rate_of_exceedance
     })
  ->
sort('loss_level')
  ->
save('exceedance-table', format: 'csv')
```