.. _event_based_probabilistic: # Event-based probabilistic models ## Background An event-based probabilistic model is one where the events are treated as having an equal-probability within the model itself. The model relies on there being enough events in this dataset to model the uncertainty to arrive at a loss curve. This is sometimes described as a Monte-Carlo simulation and relies on 'brute-force'. Of particular importance to event-based probabilistic models is the number of notional event sets that the events belong to. Typically this is the number of time periods, say, 'years', being modelled and is required for calculating annualized losses. We call this the Stochastic Event Set (SES). ### Model parameters The pipeline examples on this page will assume your model defines two parameters: - `$num_ses`: the size of the SES, required for Annualized Average Loss calculations. - `$loss_levels`: the loss-levels you are interested in for reporting the Annual Exceedance Probability. An example `project.ini` definition might look like the following. Here the default values for the parameters match our examples. ```ini [model event-based] framework = pipeline location = pipeline.txt # the size of our SES, i.e. 5 years param.num_ses = 5 # we are interested in whether any given event exceeds the following loss bands or not param.loss_levels = [100, 250, 500, 750, 1000] ``` ## Event loss table Refer to :ref:`probabilistic_event_loss` for how to produce a total loss for each event, based on the hazard data you are using. ### Stochastic Event Sets Once you have a total loss for each event, you also need to ensure that each event is assigned to an Stochastic Event Set (SES). This essentially allocates each event to a calendar year that it occurred in. The SES is important for calculating the Annualized Average Loss, as some calendar years may have multiple events, whereas other years may have no events. The total number of SES, or years, that the probabilistic model spans should be passed to the pipeline via the `num_ses` parameter. This might be 100 years, 1000 years, etc. The following step can be added after your event-loss table is produced, to assign an SES ID at random to each event, i.e. it adds an `ses_id` attribute to the event-loss data. ```none event_loss_table -> # assign an arbitrary SES ID (i.e. year) to each event select({ *, round(random_uniform(0, $num_ses - 1)) as ses_id }) ``` .. note:: If your event data already has a SES assigned to each event, you could use this rather than assigning a new SES. ## Annualized Average Loss We have now grouped the losses by event. To calculate the Annualized Average Loss (AAL) we need to group the events by SES, i.e. the calendar year they occurred in. Note that if _no_ events occurred in any given SES (i.e. year), then that SES ID will not appear in the results. Therefore we have to make sure we have a _zero_ loss present for any years that are missing. To do this, we use a list because it makes the results slightly easier to work with. First we turn our per-year losses into a list, then we add back any missing years by adding zeroes to the list. The _exact_ year that is missing does not matter for calculating the average, so long as we have the correct number of total years. RiskScape supports list-based aggregation operations, such as `mean()` or `stddev()`, that we can then apply to the list. ### Worked example As a simple example, let's say we had a 5-year SES and we had loss events for three of those years: `[1100, 500, 800]`. We will use the `riskscape expr eval` command so that you can see the working as we go. First, we work out have many years are missing. ```none riskscape expr eval "5 - length([1100, 500, 800])" 2 ``` We then need to create list items (of zero) for the two missing years. ```none riskscape expr eval "map(range(0, 2), x -> 0)" [0, 0] ``` Then concatenate the missing years onto our list of losses. ```none riskscape expr eval "concat([1100, 500, 800], [0, 0])" [1100, 500, 800, 0, 0] ``` Finally, we take the standard deviation (and mean) of the resulting list of numbers. ```none riskscape expr eval "stddev([1100, 500, 800, 0, 0])" {mean=480.0, stddev=486.8264577855234} ``` ### Pipeline The following pipeline code will calculate the AAL for an event-based probabilistic model. ```none event_loss_table -> # sum the losses by their SES ID to get yearly losses group( by: ses_id, select: { sum(total_loss) as per_year_loss }) -> # turn the yearly losses into a list, so it's easier to work with group( select: { to_list(per_year_loss) as yearly_losses }) -> # work out how many SES IDs we are missing, i.e. years with no losses select({*, $num_ses - length(yearly_losses) as missing_years }) -> # pad out the list with zero losses to match the total number of SES select({ concat(yearly_losses, map(range(0, missing_years), x -> 0)) as losses_all_years }) -> # use list-based aggregation to get the stddev (this returns the mean too) select({ stddev(losses_all_years) as AAL, }) -> save('average-loss', format: 'csv') ``` .. _event_based_AEP: ## Annual Exceedance Probability To produce a loss curve, we rank the losses from highest to lowest, then assign a probability of rank divided by number of event sets. ### Worked example Given this event loss table, from an event dataset that models 5 event sets: | event | event set | loss | | ----- | --------- | ---- | | 1 | 1 | $1100 | | 2 | 3 | $500 | | 3 | 4 | $600 | | 4 | 4 | $200 | We then count the losses that exceed defined loss levels. | loss level | count | | ---------- | ----- | | 100 | 4 | | 250 | 3 | | 500 | 2 | | 750 | 1 | | 1000 | 1 | And from that, we compute the rate of exceedance (λ) for a single time period (1). | loss level | count | calculation | λ | | ---------- | ----- | ------------------- | ------- | | 100 | 4 | `4 / (5 * 1)` | `0.8` | | 250 | 3 | `3 / (5 * 1)` | `0.6` | | 500 | 2 | `2 / (5 * 1)` | `0.4` | | 750 | 1 | `1 / (5 * 1)` | `0.2` | | 1000 | 1 | `1 / (5 * 1)` | `0.2` | Assuming that the events are independent, we can use a Poisson distribution to calculate the Annual Exceedance Probability (AEP) by substituting the rate (λ) and time (T) period in to `1 - exp(-λ * T)`. | loss level | count | λ | calculation | AEP | | ---------- | ----- | ------- | -------------------- | --- | | 100 | 4 | `0.8` | `1 - exp(-0.8 * 1)` | 0.55 | | 250 | 3 | `0.6` | `1 - exp(-0.6 * 1)` | 0.45 | | 500 | 2 | `0.4` | `1 - exp(-0.4 * 1)` | 0.33 | | 750 | 1 | `0.2` | `1 - exp(-0.2 * 1)` | 0.18 | | 1000 | 1 | `0.2` | `1 - exp(-0.2 * 1)` | 0.18 | A return period, or Average Recurrence Interval (ARI), can also be calculated from the rate. | loss level | count | λ | calculation | ARI | | ---------- | ----- | ------- | -------------------- | --- | | 100 | 4 | `0.8` | `1 / 0.8` | 1.25 | | 250 | 3 | `0.6` | `1 / 0.6` | 1.66 | | 500 | 2 | `0.4` | `1 / 0.4` | 2.5 | | 750 | 1 | `0.2` | `1 / 0.2` | 5 | | 1000 | 1 | `0.2` | `1 / 0.2` | 5 | ### Pipeline The following pipeline code will produce an AEP table for an event-based probabilistic model. First, we match every loss-level we are interested in to every event-loss result. We then count how many events exceed each loss-level, aggregating back to the original set of loss-levels. We then use the `rate_of_exceedance` to calculate the AEP and return period. ```none event_loss_table -> # match up event-loss to every loss-level (as a list, which we then unroll) select({*, $loss_levels as loss_level}) -> unnest([loss_level]) -> # aggregate by loss-level across all events and count how many events exceeded the loss-level group( by: loss_level, select: { loss_level, count(total_loss >= loss_level) as count, count(total_loss >= loss_level) / $num_ses as rate_of_exceedance, }) -> # convert the rate of exceedance to a probability select({*, annual_exceedance_probability: 1 - exp((0.0 - rate_of_exceedance) * 1), return_period: 1 / rate_of_exceedance }) -> sort('loss_level') -> save('exceedance-table', format: 'csv') ```