.. _intermediate_models:

# Going beyond simple RiskScape models

## Before we start

This tutorial is aimed at users who are familiar with building simple RiskScape models and want to explore more complex modelling concepts.
We expect that you:

- Have completed _all_ the introductory tutorials on using RiskScape, that is :ref:`getting-started`, :ref:`wizard-how-to`, and :ref:`project-tutorial`.
- Have a basic understanding of geospatial data and risk analysis.
- Have a GIS application (e.g. QGIS) installed that you can use to view shapefiles.

.. note::
    Parts of this tutorial require that you have the CPython plugin enabled in your ``settings.ini`` file.
    Refer to :ref:`these instructions <cpython-impl>` for more details on how to setup CPython.

The aim of this tutorial is to understand some of the intricacies of modelling polygon and line-string geometry,
so that you can build accurate loss models in RiskScape.

## Getting started

### Setup

Click [here](../intermediate-modelling.zip) to download the example project we will use in this guide.
Unzip the file into a new directory, e.g. `C:\RiskScape_Projects\`.

Open a command prompt and `cd` to the directory where you unzipped the files, e.g.

```none
cd C:\RiskScape_Projects\intermediate-modelling
```

You will use this command prompt to run the RiskScape commands in this tutorial.

The unzipped project contains a few sub-directories:
- `project-tutorial\data` contains the input data files we will use in this tutorial.
  This data is similar to the :ref:`Upolu tsunami data <Upolu_data>` that we used in the previous tutorials.
- `project-tutorial\functions` contains the Python functions we will use in our models.

.. note::
    This input data was provided by `NIWA <https://niwa.co.nz>`_ and is based on published research papers.
    The data files have been adapted slightly for this tutorial.

The `intermediate-modelling` directory also contains a `project.ini` file that
contains some pre-existing models we will use to demonstrate different modelling techniques.

## Background

### Model workflow

The following diagram provides a detailed view of the RiskScape model workflow (click to enlarge the image).

.. image:: ../diagrams/model_framework.svg
    :target: ../_images/model_framework.svg
    :alt: Flowchart showing the different processing phases of the RiskScape model workflow

Hopefully the _phases_ of the model workflow look familiar to you by now.
We can see from this diagram that there are several choices in each phase that affect how the model will behave.

.. note::
    This diagram introduces a new *Risk Analysis* phase.
    This phase mainly applies to probabilistic models, which currently are not supported in the wizard.
    For advanced users, there are documented walkthroughs that show you how to write a probabilistic model from scratch.

In this tutorial we will look more at how choices around _Spatial Sampling_ and _Consequence Analysis_ can affect
the results that you get out of your model.

## Spatial sampling

As you may recall, the spatial sampling phase geospatially combines together the various input data layers in the RiskScape model.
For any given element-at-risk, the spatial sampling determines the intensity (if any) of the hazard that affects it,
or the region that the building falls into.

In the previous :ref:`wizard-how-to` tutorial, our choices in the wizard were simple because we were only dealing
with the centroid _point_ of the buildings.

When have polygon or line-string exposure-layer geometry, things become slightly more complicated.
The hazard-layer may intersect the same element-at-risk multiple times,
so we may get multiple hazard intensity measurements that correspond to the same building.
We have to make a choice about what hazard intensity measurement we want to use,
and this choice will affect the results that come out of our model.

.. note::
    The following examples will focus on spatially sampling a hazard-layer grid, or *raster* data.
    But the same concepts also apply when your hazard-layer is *vector* data,
    for example if you are using a shapefile for your hazard-layer rather than a GeoTIFF.

### Centroid sampling

The following diagram shows the grey outline of a building footprint
and where it overlaps with a flood inundation grid.
The blue squares represent the hazard grid, where the colour denotes a varying intensity of flood depth.

.. image:: ../diagrams/sample_centroid.svg
    :target: ../_images/sample_centroid.svg
    :width: 75%
    :alt: Here, the hazard intensity measure is taken from the exact centroid, or centre-point, of the building footprint.

In this case we have taken the hazard intensity measure at the _centroid_ of the building.

Let's try using centroid sampling in a simple model that counts the total number of buildings
exposed to tsunami inundation, as well as the average and maximum inundation depth.
Run the following command:

```none
riskscape model run sample-centroid
```

It will produce an `event-impact.csv` results file that should look like this:

```none
Results,Number_Exposed,Total_buildings,Max_hazard,Average_hazard
Total,1483,6260,5.786825180053711,1.7566953226270823
```

Using the centroid of the geometry is a simple, efficient, and predictable sampling technique.
However, its main limitation is it does not detect when a hazard partially intersects a building,
but does not intersect at the centroid.

.. note::
    Centroid sampling uses the *geometric* centroid of the element-at-risk.
    If you have a curved line or a multi-polygon, then the geometric centroid may actually fall
    *outside* the polygon or line-string.

### Any intersection (closest)

A useful sampling approach is to detect _any_ intersection between the hazard-layer grid or geometry and the element-at-risk.

For example, the following diagram shows a building footprint that intersects the hazard grid, but _not_ at the centroid.

.. image:: ../diagrams/sample_closest.svg
    :target: ../_images/sample_closest.svg
    :width: 75%
    :alt: Here, the closest available hazard intensity measure to the building's centroid is used.

In this case we have taken the hazard intensity measure that is _closest_ to the element-at-risk's centroid.
We call this approach 'closest' sampling.

Your project file contains a `sample-closest` model, which is exactly the same as the `sample-centroid` model,
except for the sampling technique it uses.
Run the following command to see how closest sampling compares to centroid sampling:

```none
riskscape model run sample-closest
```

The `event-impact.csv` results file that it produces should look like this:

```none
Results,Number_Exposed,Total_buildings,Max_hazard,Average_hazard
Total,1568,6260,5.786825180053711,1.6731144939454234
```
You can see that more buildings are now detected as being exposed to the tsunami,
Also, notice that the `Max_hazard` value hasn't changed at all, because we're still sampling from the centroid location for 95% of the buildings.

Using the closest intersecting geometry is a simple approach and a good default to use.
The main limitation with this approach is that it is somewhat _arbitrarily_ picking the hazard intensity measure.
For example, the hazard intensity at the building's centroid will not necessarily be the _maximum_ hazard intensity that impacts the building.

### All intersections

The final sampling technique RiskScape provides is returning _all_ the hazard intensity measurements that intersect the building.
In our building footprint example, returning _all_ the hazard intensity measures would look like this:

.. image:: ../diagrams/sample_intersections.svg
    :target: ../_images/sample_intersections.svg
    :width: 75%
    :alt: Here, all available hazard intensity measures that overlap the building's footprint are used.

.. note::
    The other sampling techniques have so far all returned a *single* hazard intensity measure for each element-at-risk.
    Whereas this approach now means we potentially have *multiple* hazard intensity measures for a single building.

Let's say we are interested in the *maximum* hazard intensity measure, i.e. the highest inundation depth.
The `sample-all-intersections` model uses 'all-intersections' sampling and then picks out the maximum hazard value.
Try running the model using the following command:

```none
riskscape model run sample-all-intersections
```

The `event-impact.csv` results file that it produces should look like this:

```none
Results,Number_Exposed,Total_buildings,Max_hazard,Average_hazard
Total,1568,6260,7.131937503814697,1.9376521151208757
```

You may notice that this model detects the exact same number of exposed buildings as the `sample-closest` model did.
However, now the `Max_hazard` value is considerably higher - this highlights that the hazard value at the building's centroid
is not necessarily the worst hazard value to impact the building.

Picking the maximum hazard intensity measure is a form of _aggregation_.
In RiskScape terms, we say the model has 'aggregated the hazard'.

.. note::
    This model actually applies aggregation to the hazard value twice.
    First, the model takes the *max* hazard intensity measurement that impacts *each* building.
    Then, it finds the ``Max_hazard`` and ``Average_hazard`` values across *all* buildings.

Instead of using the *maximum* inundation depth to impact each building, you could instead use
the average (mean) depth, the median, or use a percentile such as the ninety-fifth percentile.
This decision depends a lot on what you are modelling.

#### Examining the raw results

Using _all-intersections_ sampling introduces more complexity to the model.
RiskScape can save the raw results from the _Consequence Analysis_ phase in order to 'show its workings'.

Try running the `sample-all-intersections` model again using the following command:

```none
riskscape model run sample-all-intersections -p "analysis.save-raw-results=true"
```

This time it saves the raw results to a `raw-analysis.shp` output file.
Open this file in QGIS.

The `raw-results.shp` file shows you every intersection between buildings in the exposure-layer and the hazard-layer.
It includes the sampled hazard intensity for every building intersection (i.e. the `sampled` attribute),
as well as the actual `hazard` value used in the final results.

Only the portions of the exposed buildings that intersect the hazard are included in the shapefile.

.. note::
    Although the hazard-layer effectively has a 10m grid resolution, the pixel size in the GeoTIFF is actually 5m.
    So the building segments correspond to a 5m grid here, rather than a 10m grid.

### Buffer distance

The _closest_ and _all-intersections_ sampling techniques also allow you to specify a _buffer_ distance
when spatially matching.

This effectively applies a margin of error to the spatial sampling operation.
It behaves as if you buffered (i.e. enlarged) the exposure-layer geometry by a given distance in metres.

For example, we can detect buildings that do not directly intersect the tsunami hazard grid,
but fall within one metre of the inundation. Try this out by running the following command:

```none
riskscape model run sample-closest -p "sample.hazards-buffer=1"
```

The additional `-p` parameter here told RiskScape to use a buffer distance of 1 metre around the building's geometry when sampling the hazard-layer.
This should produce the following `event-impact.csv` results file:

```none
Results,Number_Exposed,Total_buildings,Max_hazard,Average_hazard
Total,1580,6260,5.786825180053711,1.6614069632337063
```

You can see that the model now detected an extra 12 buildings that were within one metre of the tsunami inundation.

We can also use a buffer distance when using the 'all-intersections' sampling technique.
Try running the following command.

```none
riskscape model run sample-all-intersections -p "sample.hazards-buffer=1"
```

This should produce the following `event-impact.csv` results file:

```none
Results,Number_Exposed,Total_buildings,Max_hazard,Average_hazard
Total,1580,6260,7.141936302185059,1.9916315742685824
```

Again, you can see the buffering has detected an additional 12 exposed buildings.
Also note that the maximum and average hazard values have also increased, as
we are now using a larger exposed building footprint to sample against the hazard-layer.

.. tip::
    You can also use a buffer when a building falls *outside* the regional boundaries in your area-layer (i.e. ``sample.areas-buffer``).
    In general you will want to use a small buffer distance with your hazard-layer (e.g. 1m) and a large buffer with your area-layer (e.g. 1km).

## Consequence Analysis

When we use the _all-intersections_ sampling technique, each element-at-risk can have _multiple_ hazard intensity measures
associated with it. However, our Event Impact Table needs to have a _single_ consequence for each element-at-risk.

During the consequence analysis phase of the model workflow,
we have a choice over how we condense multiple hazard intensities down into a single consequence.
In order to look at how this choice can influence our model's results, we are going to use the
following `Samoa_Building_Fragility_DS5` function in our model:

```python
def function(building, hazard_depth):
    DS_5_Prob = 0.0
    construction = building["Cons_Frame"]

    if hazard_depth is not None and hazard_depth > 0:
        if construction in ['Masonry', 'Steel']:
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.39, 0.4)
        elif construction in ['Reinforced_Concrete', 'Reinforced Concrete']:
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.86, 0.94)
        else: # 'Timber' or unknown
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.1, 0.28)

    return DS_5_Prob

def log_normal_cdf(x, mean, stddev):
    # this uses the built-in RiskScape 'lognorm_cdf' function
    return functions.get('lognorm_cdf').call(x, mean, stddev)
```

This function is similar to the case study function we looked at in the :ref:`project-tutorial` tutorial.
However, instead of calculating a range of damage states, this function only calculates the
probability that the building will be in the worst damage state (`DS5`, or complete structural collapse).

### Aggregating by hazard

As we saw in the earlier `sample-all-intersections` model, one approach is to aggregate
the hazard intensity measures for each building. This means we end up with a single hazard intensity (e.g. the max)
that we can then pass to our Python function.

Try running the `aggregate-hazard` model using the following command:

```
riskscape model run aggregate-hazard
```

When a building has multiple hazard intensity measures associated with it,
this model uses the average, or _mean_, inundation depth for that building.

The model then aggregates the results (the probability of being in `DS5`) for all exposed buildings
and reports the percentiles (25, 50, and 75).
The `event-impact.csv` file that gets produced should look like this:

```none
Results,Number_Exposed,DS5_Percentile.P25,DS5_Percentile.P50,DS5_Percentile.P75
Total,1568,0.03664936501736071,0.5600526961725251,0.955723150400208
```

This shows that the bottom quartile of exposed buildings have a probability of 0.037 or less of complete structural collapse.
Whereas the top quartile has a probability of 0.956 or greater of complete collapse.

### Aggregating by consequence

When we aggregate the hazard intensity measures, we always end up with a _single_ hazard intensity.
So we are only calling our Python function _once_ for each building.

An alternative approach is to pass _every_ hazard intensity measure to our Python function and aggregate the consequences we get back.
In other words, we may call our Python function _multiple_ times for the same building.

For example, if we had the hazard intensity measurements of 0.8m, 1.0m, and 1.1m for a timber building,
then the `DS5` consequences we get back would be 0.12, 0.36, and 0.49.
RiskScape would then aggregate the consequence values, for example, to take the max (0.49) or the mean (0.32).

Try running the `aggregate-consequence` model now using the following command:

```none
riskscape model run aggregate-consequence
```

The model should produce an `event-impact.csv` file that looks like this:

```none
Results,Number_Exposed,DS5_Percentile.P25,DS5_Percentile.P50,DS5_Percentile.P75
Total,1568,0.056198939562588174,0.544483500859628,0.9462498383368769
```

Notice the results are different to our `aggregate-hazard` model.
By averaging the consequence (damage state) for each building, we have dragged the bottom quartile up and
the top quartile down.

### Which approach is better?

Whether the 'aggregate by hazard' or 'aggregate by consequence' approach is better for your model depends somewhat on what you are modelling.

If you are taking the _maximum_ value in either case (i.e. the max hazard intensity or the max damage),
then there may be no difference between the two approaches. It depends on how your Python function behaves.

For example, try running the following two commands so that the models aggregate using the _max_ value rather than the _mean_.

```none
riskscape model run aggregate-hazard -p "analysis.aggregate-hazards-function=max"
riskscape model run aggregate-consequence -p "analysis.aggregate-consequences-function=max"
```

Both the `event-impact.csv` files produced should now look identical:

```none
Results,Number_Exposed,DS5_Percentile.P25,DS5_Percentile.P50,DS5_Percentile.P75
Total,1568,0.10388536291081885,0.7276381000219061,0.9777036072350695
```

This is because our `Samoa_Building_Fragility_DS5` function is deterministic,
so the same hazard intensity will always result in the same consequence.
So it does not matter here if we use the max hazard intensity, or use all hazard intensities and then take the max damage state for each building.

.. tip::
    If you are unsure which approach to use, in general the 'aggregate by hazard' is conceptually simpler to understand.
    'Aggregate by consequence' can be useful for modelling damage to a larger element-at-risk, such as farmland or roads.

### Limitations

The 'aggregate by hazard' approach does not work well when the sampled hazard-layer contains multiple different hazard readings,
such as flood depth _and_ flood velocity. In this case, the wizard model cannot tell if you are interested in the maximum _depth_ or the maximum _velocity_.

The 'aggregate by consequence' approach does not work well when you have a complex consequence, i.e. your Python function returns a struct.
Again, when there are multiple attributes present in a struct, the model does not know _which_ attribute it should use for the maximum.

.. tip::
    You can work around these limitations by creating your own custom *pipeline* models.
    We will look at what a pipeline model is shortly.

### Scaling losses

One benefit of the 'aggregate by consequences' approach is that it lets you _scale_ the losses produced by your model
based on the amount that the element-at-risk was exposed to the hazard.

Generally, your Python function will calculate a loss for the _whole_ element-at-risk.
However, for a large element-at-risk, such as farmland or a road, not all of it may have been exposed to the hazard.

For example, say we have farmland that was _partially_ exposed to tsunami inundation.
Our Python function can calculate a loss based on the hazard intensity and the total cost of the farmland,
but it does not know the _total area_ of the farmland that was damaged.
Our Python function just has to assume that the whole farmland was damaged to the same extent.

Let's say that for a single farm we had the following hazard intensities and our Python code calculated the following losses.

| hazard intensity | loss    | exposed_ratio |
| ---------------- | ------- | ------------- |
| 1.0              | $100000 |     0.25      |
| 2.0              | $200000 |     0.1       |

We could use 'aggregate by consequences' and take the _mean_ loss, and end up with an $150,000 loss for that farm.
However, only 35% of the total farmland was exposed to the hazard.

Whereas RiskScape could scale the losses for us, based on the area exposed, e.g.

| hazard intensity | loss    | exposed_ratio | scaled loss |
| ---------------- | ------- | ------------- | ----------- |
| 1.0              | $100000 |     0.25      |  $25000     |
| 2.0              | $200000 |     0.1       |  $20000     |

Based on that, we could then take the _total_ (sum) loss of $45,000 for the farm.

.. tip::
    An alternative way to handle this situation can be to cut the element-at-risk into smaller pieces *before* your model runs.
    RiskScape has *geoprocessing* features in the wizard to do this, which we will learn about in a later tutorial.

## CPython

RiskScape supports two different Python implementations: :ref:`Jython and CPython <jython_vs_cpython>`.

.. tip::
    CPython is what most people consider regular Python.
    If you have used Python before on your computer, then you were most likely using CPython.

Jython comes built-in as part of the RiskScape software, so it gets used by default.
Alternatively, RiskScape can execute your model function using CPython (i.e. regular Python),
but this requires that Python is already installed on your system.

The main difference between Jython and CPython is how they import other Python packages.
We have been mostly dealing with simple Python functions in our examples, so this hasn't mattered too much.

However, the `Samoa_Building_Fragility_DS5` function used in our `aggregate-hazard` model
calculates the [log-normal CDF](https://en.wikipedia.org/wiki/Log-normal_distribution) using a Jython-specific approach.

Instead, let's try using the following `CPython_Building_Fragility_DS5` function,
which uses the `scipy` Python package to calculate the log-normal CDF.

```python
from scipy.stats import lognorm
import math

def function(building, hazard_depth):
    DS_5_Prob = 0.0
    construction = building["Cons_Frame"]

    if hazard_depth is not None and hazard_depth > 0:
        if construction in ['Masonry', 'Steel']:
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.39, 0.4)
        elif construction in ['Reinforced_Concrete', 'Reinforced Concrete']:
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.86, 0.94)
        else: # 'Timber' or unknown
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.1, 0.28)

    return DS_5_Prob

def log_normal_cdf(x, mean, stddev):
    return lognorm(s=stddev, scale=math.exp(mean)).cdf(x)
```

.. note::
    In order to run this next example, you need to have setup RiskScape to use CPython
    (as per :ref:`these instructions <cpython-impl>`), and
    you need the ``scipy`` Python package installed.
    Alternatively, you could skip this section if you are not interested in using CPython.

Now, try using the `CPython_Building_Fragility_DS5` function in a model using the following command:

```none
riskscape model run aggregate-hazard -p "analysis.function=CPython_Building_Fragility_DS5"
```

The `event-impact.csv` file produced should contain the exact same results as when you previously ran the model using Jython:

```none
Results,Number_Exposed,DS5_Percentile.P25,DS5_Percentile.P50,DS5_Percentile.P75
Total,1568,0.03664936501736079,0.560052696172525,0.9557231504002079
```

This is because the log-normal CDF calculation is exactly the same.
The only difference is this time we used the `scipy` Python package
to calculate it, whereas previously we used the built-in `lognorm_cdf` RiskScape function.

.. tip::
    Using CPython can make it easier to test your Python code, as you can execute your Python function directly,
    outside of the RiskScape model, by using the ``if __name__ == '__main__':`` `Python idiom <https://docs.python.org/3/library/__main__.html#name-main>`_.

### Being explicit about the Python implementation

If you plan on sharing your RiskScape Python functions with others,
it pays to be explicit if you know that the function will only work with CPython, or only work with Jython.

If you look at the `project.ini` file you can see we explicitly use `framework = jython` and `framework = cpython`
when we define the functions.

```ini
[function Samoa_Building_Fragility_DS5]
description = determines the probability of a building being in Damage State 5 (complete structural collapse)
location = functions/Samoa_Building_Fragility_DS5.py
argument-types = [ building: struct(Cons_Frame: text), hazard_depth: nullable(floating) ]
return-type = floating
framework = jython

[function CPython_Building_Fragility_DS5]
description = Same as Samoa_Building_Fragility_DS5, but in CPython
location = functions/CPython_Building_Fragility_DS5.py
argument-types = [ building: struct(Cons_Frame: text), hazard_depth: nullable(floating) ]
return-type = floating
framework = cpython
```

.. tip::
    If your Python function is generic and will work in either Jython or CPython, then you
    do not have to worry about defining the Python ``framework``.

.. _model_batch:

## Running the same model repeatedly

As you will know by now, you can use the `-p` CLI option (i.e. `--parameter`) to make a one-off change to a model's parameter.
However, it can be cumbersome to use this approach to vary the same parameter lots of times.
For example, if you have *dozens* of hazard-layers that you want to run through your model.

Fortunately, there is a `riskscape model batch` command that lets us run the same model repeatedly,
while varying a single parameter.
Instead of using the `-p PARAMETER_NAME=NEW_VALUE` syntax, with this command we use
`--vary-parameter=PARAMETER_NAME --vary-input=NEW_VALUES`.

For example, to vary the hazard-layer input data for the model you would use `--vary-parameter=input-hazards.layer`.
For `--vary-input`, you could then specify a directory containing the hazard files.
This could include a *wildcard* pattern, such as `data/*.tif`, which will match all GeoTIFF files in the `data` directory.

There are four different GeoTIFF files in this project's `data` sub-directory.
Try entering the following command to run each GeoTIFF through the `aggregate-hazard` model.

```none
riskscape model batch aggregate-hazard --vary-parameter=input-hazards.layer --vary-input=data/*.tif --output=output/{}
```

This will run the `aggregate-hazard` model *four* times, once for each GeoTIFF file.
It will produce four `event-impact.csv` results files, which should look something like this:

```none
file:/C:/RiskScape_Projects/intermediate-modelling/output/Hossen_A/event-impact.csv
file:/C:/RiskScape_Projects/intermediate-modelling/output/Lay_C/event-impact.csv
file:/C:/RiskScape_Projects/intermediate-modelling/output/MaxEnv_All_Scenarios_10m/event-impact.csv
file:/C:/RiskScape_Projects/intermediate-modelling/output/MaxEnv_All_Scenarios_50m/event-impact.csv
```

Take a look at each file and see what difference the hazard-layer has made to the number of exposed buildings,
and the resulting damage state.

.. note::
    We have specified an ``--output`` directory in this example.
    The ``{}`` corresponds to the input value the model is using, i.e. the name of the GeoTIFF file.
    This means that each ``event-impact.csv`` results file gets written to a separate directory,
    and the directory name will correspond to the hazard-layer used.

As well as varying input files, you can also vary other parameters for your model.
In this case, you would put the values to vary in a text file, with each value on its own line.
For example, `buffers.txt` contains the following values:

```none
1
2
5
10
```

Try using these metre distance values for the `sample.hazards-buffer` parameter by entering the following command:

```none
riskscape model batch aggregate-hazard --vary-parameter=sample.hazards-buffer --vary-input=buffers.txt --output=output/buffer-{}m
```

Take a look at the `event-impact.csv` files produced and see what happens to `Number_exposed` as the buffer distance increases.

.. tip::
    If you want to run RiskScape models repeatedly, but the ``riskscape model batch`` command does not quite do what you would like,
    then you could also try :ref:`shell_scripting`.

## Pipelines

RiskScape is primarily a geospatial data processing engine.
The wizard models we have been using provide a standard framework that organizes that data processing for us.

The underlying data processing is called a RiskScape _pipeline_.
A pipeline is simply a series of data-processing steps that are chained together.
These steps are described in a special RiskScape pipeline language, which is based around simple _expressions_.

When RiskScape runs a wizard model, it first converts the wizard answers into pipeline code, and then executes the pipeline.
For example, the `aggregate-hazard` model results in the following pipeline code:

```none
input(relation: 'data/Buildings_SE_Upolu.shp', name: 'exposure') as exposures_input
 -> select({exposure}) as exposures
 -> join(on: true) as exposures_join_hazards
 -> select({*, sample(geometry: exposure, coverage: hazard) as hazard_sampled}) as "sample_hazard_layer"
 -> select({*, map(hazard_sampled, h -> h.sampled) as hazard})
 -> select({*, map(hazard_sampled, geom -> measure(geom) / measure(exposure)) as exposed_ratio})
 -> select({*}) as sampled
 -> select({*, hazard: mean(hazard)})
 -> select({*, consequence: map(hazard, hv -> Samoa_Building_Fragility_DS5(exposure, hv))})
 -> select({*}) as raw_results
 -> select({*}) as event_impact_table
input(value: bookmark('data/MaxEnv_All_Scenarios_10m.tif'), name: 'hazard') as hazards_input
 -> select({hazard as hazard}) as hazards
 -> exposures_join_hazards.rhs
event_impact_table
 -> select({*}) as "report_event-impact"
 -> filter(hazard > 0)
 -> group(by: {'Total' as Results}, select: {Results, count(exposure) as Number_Exposed, percentiles(consequence, [25, 50, 75]) as DS5_Percentile})
 -> save(name: 'event-impact') as "save_event-impact"
```

Advanced users can execute their own pipeline-based models directly.
You can try this now by running the `aggregate-hazard` model's pipeline code using the following command:

```none
riskscape pipeline evaluate pipeline.txt
```

This behaves the same as if you ran the `aggregate-hazard` model, and it produces the exact same `event-impact.csv` results:

```none
Results,Number_Exposed,DS5_Percentile.P25,DS5_Percentile.P50,DS5_Percentile.P75
Total,1568,0.03664936501736071,0.5600526961725251,0.955723150400208
```

.. note::
    Every time you run a model, RiskScape records the pipeline code it used as a ``pipeline.txt`` file in the output directory.
    This gives you an alternative way of re-running the exact same pipeline again in the future.

Pipelines give you much finer-grain control over your model than the wizard does.
For example, if the wizard does not support _exactly_ what you want to do,
you could take the pipeline code produced by the wizard and adapt it to suit your situation.

There are advanced tutorials that cover writing RiskScape pipelines in more detail.

.. tip::
    The ``Ctrl + c`` menu in the wizard also lets you see how the pipeline code gets built up.
    As you go through the wizard, you can see how the pipeline changes after you answer each question.

## Recap

Let's review some of the key points we have covered so far:

- _Centroid_ sampling is efficient and simple.
  However, it does not always detect when a hazard intersects an element-at-risk.
  Also, for complex geometry, the geometric centroid may actually fall outside the shape.
- _Closest_ sampling is simple and a good default.
  However, it does not necessarily return the _worst_ hazard intensity measure if accuracy is important to your model.
- _All-intersections_ sampling will reliably find _all_ hazard intensity measures that impact the element-at-risk.
  However, distilling multiple hazard intensity measures down into a single consequence can add complexity to your model.
- Closest and all-intersections sampling also support a buffer distance, to detect hazards that _almost_ intersect your element-at-risk.
- When you use all-intersections you have the choice of _aggregating the hazard_ or _aggregating the consequence_ for each element-at-risk.
  Which approach is better will depend somewhat on what you are modelling.
- When you use the 'aggregate by consequence' approach, you can _scale_ the loss produced based on the area exposed to the hazard.
- RiskScape supports two Python implementations: Jython (the default) and CPython.
  The difference only starts to matter when you want to start importing packages and do more complicated things in your functions.
- RiskScape uses what is called a _pipeline_ to underpin the model processing.
  Pipelines give you much more flexibility over your modelling, but they are much harder to create.

## Extra for experts

If you want to explore the concepts we have covered a little further, you could try the following exercises out on your own.

- Try running the `sample-all-intersections` model using the average, or _mean_, hazard intensity rather than the max,
  i.e. use `-p "analysis.aggregate-hazards-function=mean"`. See what difference this makes to the model's results.
- Try running `riskscape wizard` and build your own model that uses all-intersections sampling and aggregates by hazard.
  Select the option to save the raw analysis.
  See how the `raw_analysis.shp` output compares to the `event_impact_table.shp` produced by the model.
  For now, you could skip the reporting step and just output the Event Impact Table as is.
- Try running `riskscape wizard` again, but this time use aggregate by consequence.
  Select the option to save the raw analysis.
  See how the `raw_analysis.shp` output compares to the `event_impact_table.shp` produced by the model.
- Look at the `functions/CPython_Building_Fragility_DS5.py` file in more detail.
  Notice that the Python file also contains a `if __name__ == '__main__':` block.
  Try executing the file manually with Python and see what results you get.
  This approach lets you easily test and debug your Python code _outside_ of RiskScape.
- In the `project.ini` file, try commenting out the `framework = jython` line for the `Samoa_Building_Fragility_DS5` function.
  Try running the `aggregate-hazard` model again. What sort of error do you get?
- Look in the output directory of one of your models for the `pipeline.txt` file.
  Try using the `riskscape pipeline evaluate` command to re-run the raw pipeline code in the `pipeline.txt` file.
  _Tip_: you will need to run this command from the same directory as your `project.ini` file.