Going beyond simple RiskScape models

Before we start

This tutorial is aimed at users who are familiar with building simple RiskScape models and want to explore more complex modelling concepts. We expect that you:


Parts of this tutorial require that you have the CPython plugin enabled in your settings.ini file. Refer to these instructions for more details on how to setup CPython.

The aim of this tutorial is to understand some of the intricacies of modelling polygon and line-string geometry, so that you can build accurate loss models in RiskScape.

Getting started


Click here to download the example project we will use in this guide. Unzip the file into a new directory, e.g. C:\RiskScape_Projects\.

Open a command prompt and cd to the directory where you unzipped the files, e.g.

cd C:\RiskScape_Projects\intermediate-modelling

You will use this command prompt to run the RiskScape commands in this tutorial.

The unzipped project contains a few sub-directories:

  • project-tutorial\data contains the input data files we will use in this tutorial. This data is similar to the Upolu tsunami data that we used in the previous tutorials.

  • project-tutorial\functions contains the Python functions we will use in our models.


This input data was provided by NIWA and is based on published research papers. The data files have been adapted slightly for this tutorial.

The intermediate-modelling directory also contains a project.ini file that contains some pre-existing models we will use to demonstrate different modelling techniques.


Model workflow

The following diagram provides a detailed view of the RiskScape model workflow (click to enlarge the image).

Flowchart showing the different processing phases of the RiskScape model workflow

Hopefully the phases of the model workflow look familiar to you by now. We can see from this diagram that there are several choices in each phase that affect how the model will behave.


This diagram introduces a new Risk Analysis phase. This phase mainly applies to probabilistic models, which currently are not supported in the wizard. For advanced users, there are documented walkthroughs that show you how to write a probabilistic model from scratch.

In this tutorial we will look more at how choices around Spatial Sampling and Consequence Analysis can affect the results that you get out of your model.

Spatial sampling

As you may recall, the spatial sampling phase geospatially combines together the various input data layers in the RiskScape model. For any given element-at-risk, the spatial sampling determines the intensity (if any) of the hazard that affects it, or the region that the building falls into.

In the previous How to build RiskScape models tutorial, our choices in the wizard were simple because we were only dealing with the centroid point of the buildings.

When have polygon or line-string exposure-layer geometry, things become slightly more complicated. The hazard-layer may intersect the same element-at-risk multiple times, so we may get multiple hazard intensity measurements that correspond to the same building. We have to make a choice about what hazard intensity measurement we want to use, and this choice will affect the results that come out of our model.


The following examples will focus on spatially sampling a hazard-layer grid, or raster data. But the same concepts also apply when your hazard-layer is vector data, for example if you are using a shapefile for your hazard-layer rather than a GeoTIFF.

Centroid sampling

The following diagram shows the grey outline of a building footprint and where it overlaps with a flood inundation grid. The blue squares represent the hazard grid, where the colour denotes a varying intensity of flood depth.

Here, the hazard intensity measure is taken from the exact centroid, or centre-point, of the building footprint.

In this case we have taken the hazard intensity measure at the centroid of the building.

Let’s try using centroid sampling in a simple model that counts the total number of buildings exposed to tsunami inundation, as well as the average and maximum inundation depth. Run the following command:

riskscape model run sample-centroid

It will produce an event-impact.csv results file that should look like this:


Using the centroid of the geometry is a simple, efficient, and predictable sampling technique. However, its main limitation is it does not detect when a hazard partially intersects a building, but does not intersect at the centroid.


Centroid sampling uses the geometric centroid of the element-at-risk. If you have a curved line or a multi-polygon, then the geometric centroid may actually fall outside the polygon or line-string.

Any intersection (closest)

A useful sampling approach is to detect any intersection between the hazard-layer grid or geometry and the element-at-risk.

For example, the following diagram shows a building footprint that intersects the hazard grid, but not at the centroid.

Here, the closest available hazard intensity measure to the building's centroid is used.

In this case we have taken the hazard intensity measure that is closest to the element-at-risk’s centroid. We call this approach ‘closest’ sampling.

Your project file contains a sample-closest model, which is exactly the same as the sample-centroid model, except for the sampling technique it uses. Run the following command to see how closest sampling compares to centroid sampling:

riskscape model run sample-closest

The event-impact.csv results file that it produces should look like this:


You can see that more buildings are now detected as being exposed to the tsunami, Also, notice that the Max_hazard value hasn’t changed at all, because we’re still sampling from the centroid location for 95% of the buildings.

Using the closest intersecting geometry is a simple approach and a good default to use. The main limitation with this approach is that it is somewhat arbitrarily picking the hazard intensity measure. For example, the hazard intensity at the building’s centroid will not necessarily be the maximum hazard intensity that impacts the building.

All intersections

The final sampling technique RiskScape provides is returning all the hazard intensity measurements that intersect the building. In our building footprint example, returning all the hazard intensity measures would look like this:

Here, all available hazard intensity measures that overlap the building's footprint are used.


The other sampling techniques have so far all returned a single hazard intensity measure for each element-at-risk. Whereas this approach now means we potentially have multiple hazard intensity measures for a single building.

Let’s say we are interested in the maximum hazard intensity measure, i.e. the highest inundation depth. The sample-all-intersections model uses ‘all-intersections’ sampling and then picks out the maximum hazard value. Try running the model using the following command:

riskscape model run sample-all-intersections

The event-impact.csv results file that it produces should look like this:


You may notice that this model detects the exact same number of exposed buildings as the sample-closest model did. However, now the Max_hazard value is considerably higher - this highlights that the hazard value at the building’s centroid is not necessarily the worst hazard value to impact the building.

Picking the maximum hazard intensity measure is a form of aggregation. In RiskScape terms, we say the model has ‘aggregated the hazard’.


This model actually applies aggregation to the hazard value twice. First, the model takes the max hazard intensity measurement that impacts each building. Then, it finds the Max_hazard and Average_hazard values across all buildings.

Instead of using the maximum inundation depth to impact each building, you could instead use the average (mean) depth, the median, or use a percentile such as the ninety-fifth percentile. This decision depends a lot on what you are modelling.

Examining the raw results

Using all-intersections sampling introduces more complexity to the model. RiskScape can save the raw results from the Consequence Analysis phase in order to ‘show its workings’.

Try running the sample-all-intersections model again using the following command:

riskscape model run sample-all-intersections -p "analysis.save-raw-results=true"

This time it saves the raw results to a raw-analysis.shp output file. Open this file in QGIS.

The raw-results.shp file shows you every intersection between buildings in the exposure-layer and the hazard-layer. It includes the sampled hazard intensity for every building intersection (i.e. the sampled attribute), as well as the actual hazard value used in the final results.

Only the portions of the exposed buildings that intersect the hazard are included in the shapefile.


Although the hazard-layer effectively has a 10m grid resolution, the pixel size in the GeoTIFF is actually 5m. So the building segments correspond to a 5m grid here, rather than a 10m grid.

Buffer distance

The closest and all-intersections sampling techniques also allow you to specify a buffer distance when spatially matching.

This effectively applies a margin of error to the spatial sampling operation. It behaves as if you buffered (i.e. enlarged) the exposure-layer geometry by a given distance in metres.

For example, we can detect buildings that do not directly intersect the tsunami hazard grid, but fall within one metre of the inundation. Try this out by running the following command:

riskscape model run sample-closest -p "sample.hazards-buffer=1"

The additional -p parameter here told RiskScape to use a buffer distance of 1 metre around the building’s geometry when sampling the hazard-layer. This should produce the following event-impact.csv results file:


You can see that the model now detected an extra 12 buildings that were within one metre of the tsunami inundation.

We can also use a buffer distance when using the ‘all-intersections’ sampling technique. Try running the following command.

riskscape model run sample-all-intersections -p "sample.hazards-buffer=1"

This should produce the following event-impact.csv results file:


Again, you can see the buffering has detected an additional 12 exposed buildings. Also note that the maximum and average hazard values have also increased, as we are now using a larger exposed building footprint to sample against the hazard-layer.


You can also use a buffer when a building falls outside the regional boundaries in your area-layer (i.e. sample.areas-buffer). In general you will want to use a small buffer distance with your hazard-layer (e.g. 1m) and a large buffer with your area-layer (e.g. 1km).

Consequence Analysis

When we use the all-intersections sampling technique, each element-at-risk can have multiple hazard intensity measures associated with it. However, our Event Impact Table needs to have a single consequence for each element-at-risk.

During the consequence analysis phase of the model workflow, we have a choice over how we condense multiple hazard intensities down into a single consequence. In order to look at how this choice can influence our model’s results, we are going to use the following Samoa_Building_Fragility_DS5 function in our model:

def function(building, hazard_depth):
    DS_5_Prob = 0.0
    construction = building["Cons_Frame"]

    if hazard_depth is not None and hazard_depth > 0:
        if construction in ['Masonry', 'Steel']:
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.39, 0.4)
        elif construction in ['Reinforced_Concrete', 'Reinforced Concrete']:
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.86, 0.94)
        else: # 'Timber' or unknown
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.1, 0.28)

    return DS_5_Prob

def log_normal_cdf(x, mean, stddev):
    # this uses the built-in RiskScape 'lognorm_cdf' function
    return functions.get('lognorm_cdf').call(x, mean, stddev)

This function is similar to the case study function we looked at in the Creating a RiskScape project tutorial. However, instead of calculating a range of damage states, this function only calculates the probability that the building will be in the worst damage state (DS5, or complete structural collapse).

Aggregating by hazard

As we saw in the earlier sample-all-intersections model, one approach is to aggregate the hazard intensity measures for each building. This means we end up with a single hazard intensity (e.g. the max) that we can then pass to our Python function.

Try running the aggregate-hazard model using the following command:

riskscape model run aggregate-hazard

When a building has multiple hazard intensity measures associated with it, this model uses the average, or mean, inundation depth for that building.

The model then aggregates the results (the probability of being in DS5) for all exposed buildings and reports the percentiles (25, 50, and 75). The event-impact.csv file that gets produced should look like this:


This shows that the bottom quartile of exposed buildings have a probability of 0.037 or less of complete structural collapse. Whereas the top quartile has a probability of 0.956 or greater of complete collapse.

Aggregating by consequence

When we aggregate the hazard intensity measures, we always end up with a single hazard intensity. So we are only calling our Python function once for each building.

An alternative approach is to pass every hazard intensity measure to our Python function and aggregate the consequences we get back. In other words, we may call our Python function multiple times for the same building.

For example, if we had the hazard intensity measurements of 0.8m, 1.0m, and 1.1m for a timber building, then the DS5 consequences we get back would be 0.12, 0.36, and 0.49. RiskScape would then aggregate the consequence values, for example, to take the max (0.49) or the mean (0.32).

Try running the aggregate-consequence model now using the following command:

riskscape model run aggregate-consequence

The model should produce an event-impact.csv file that looks like this:


Notice the results are different to our aggregate-hazard model. By averaging the consequence (damage state) for each building, we have dragged the bottom quartile up and the top quartile down.

Which approach is better?

Whether the ‘aggregate by hazard’ or ‘aggregate by consequence’ approach is better for your model depends somewhat on what you are modelling.

If you are taking the maximum value in either case (i.e. the max hazard intensity or the max damage), then there may be no difference between the two approaches. It depends on how your Python function behaves.

For example, try running the following two commands so that the models aggregate using the max value rather than the mean.

riskscape model run aggregate-hazard -p "analysis.aggregate-hazards-function=max"
riskscape model run aggregate-consequence -p "analysis.aggregate-consequences-function=max"

Both the event-impact.csv files produced should now look identical:


This is because our Samoa_Building_Fragility_DS5 function is deterministic, so the same hazard intensity will always result in the same consequence. So it does not matter here if we use the max hazard intensity, or use all hazard intensities and then take the max damage state for each building.


If you are unsure which approach to use, in general the ‘aggregate by hazard’ is conceptually simpler to understand. ‘Aggregate by consequence’ can be useful for modelling damage to a larger element-at-risk, such as farmland or roads.


The ‘aggregate by hazard’ approach does not work well when the sampled hazard-layer contains multiple different hazard readings, such as flood depth and flood velocity. In this case, the wizard model cannot tell if you are interested in the maximum depth or the maximum velocity.

The ‘aggregate by consequence’ approach does not work well when you have a complex consequence, i.e. your Python function returns a struct. Again, when there are multiple attributes present in a struct, the model does not know which attribute it should use for the maximum.


You can work around these limitations by creating your own custom pipeline models. We will look at what a pipeline model is shortly.

Scaling losses

One benefit of the ‘aggregate by consequences’ approach is that it lets you scale the losses produced by your model based on the amount that the element-at-risk was exposed to the hazard.

Generally, your Python function will calculate a loss for the whole element-at-risk. However, for a large element-at-risk, such as farmland or a road, not all of it may have been exposed to the hazard.

For example, say we have farmland that was partially exposed to tsunami inundation. Our Python function can calculate a loss based on the hazard intensity and the total cost of the farmland, but it does not know the total area of the farmland that was damaged. Our Python function just has to assume that the whole farmland was damaged to the same extent.

Let’s say that for a single farm we had the following hazard intensities and our Python code calculated the following losses.

hazard intensity









We could use ‘aggregate by consequences’ and take the mean loss, and end up with an $150,000 loss for that farm. However, only 35% of the total farmland was exposed to the hazard.

Whereas RiskScape could scale the losses for us, based on the area exposed, e.g.

hazard intensity



scaled loss









Based on that, we could then take the total (sum) loss of $45,000 for the farm.


An alternative way to handle this situation can be to cut the element-at-risk into smaller pieces before your model runs. RiskScape has geoprocessing features in the wizard to do this, which we will learn about in a later tutorial.


RiskScape supports two different Python implementations: Jython and CPython.


CPython is what most people consider regular Python. If you have used Python before on your computer, then you were most likely using CPython.

Jython comes built-in as part of the RiskScape software, so it gets used by default. Alternatively, RiskScape can execute your model function using CPython (i.e. regular Python), but this requires that Python is already installed on your system.

The main difference between Jython and CPython is how they import other Python packages. We have been mostly dealing with simple Python functions in our examples, so this hasn’t mattered too much.

However, the Samoa_Building_Fragility_DS5 function used in our aggregate-hazard model calculates the log-normal CDF using a Jython-specific approach.

Instead, let’s try using the following CPython_Building_Fragility_DS5 function, which uses the scipy Python package to calculate the log-normal CDF.

from scipy.stats import lognorm
import math

def function(building, hazard_depth):
    DS_5_Prob = 0.0
    construction = building["Cons_Frame"]

    if hazard_depth is not None and hazard_depth > 0:
        if construction in ['Masonry', 'Steel']:
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.39, 0.4)
        elif construction in ['Reinforced_Concrete', 'Reinforced Concrete']:
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.86, 0.94)
        else: # 'Timber' or unknown
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.1, 0.28)

    return DS_5_Prob

def log_normal_cdf(x, mean, stddev):
    return lognorm(s=stddev, scale=math.exp(mean)).cdf(x)


In order to run this next example, you need to have setup RiskScape to use CPython (as per these instructions), and you need the scipy Python package installed. Alternatively, you could skip this section if you are not interested in using CPython.

Now, try using the CPython_Building_Fragility_DS5 function in a model using the following command:

riskscape model run aggregate-hazard -p "analysis.function=CPython_Building_Fragility_DS5"

The event-impact.csv file produced should contain the exact same results as when you previously ran the model using Jython:


This is because the log-normal CDF calculation is exactly the same. The only difference is this time we used the scipy Python package to calculate it, whereas previously we used the built-in lognorm_cdf RiskScape function.


Using CPython can make it easier to test your Python code, as you can execute your Python function directly, outside of the RiskScape model, by using the if __name__ == '__main__': Python idiom.

Being explicit about the Python implementation

If you plan on sharing your RiskScape Python functions with others, it pays to be explicit if you know that the function will only work with CPython, or only work with Jython.

If you look at the project.ini file you can see we explicitly use framework = jython and framework = cpython when we define the functions.

[function Samoa_Building_Fragility_DS5]
description = determines the probability of a building being in Damage State 5 (complete structural collapse)
location = functions/Samoa_Building_Fragility_DS5.py
argument-types = [ building: struct(Cons_Frame: text), hazard_depth: nullable(floating) ]
return-type = floating
framework = jython

[function CPython_Building_Fragility_DS5]
description = Same as Samoa_Building_Fragility_DS5, but in CPython
location = functions/CPython_Building_Fragility_DS5.py
argument-types = [ building: struct(Cons_Frame: text), hazard_depth: nullable(floating) ]
return-type = floating
framework = cpython


If your Python function is generic and will work in either Jython or CPython, then you do not have to worry about defining the Python framework.

Running the same model repeatedly

As you will know by now, you can use the -p CLI option (i.e. --parameter) to make a one-off change to a model’s parameter. However, it can be cumbersome to use this approach to vary the same parameter lots of times. For example, if you have dozens of hazard-layers that you want to run through your model.

Fortunately, there is a riskscape model batch command that lets us run the same model repeatedly, while varying a single parameter. Instead of using the -p PARAMETER_NAME=NEW_VALUE syntax, with this command we use --vary-parameter=PARAMETER_NAME --vary-input=NEW_VALUES.

For example, to vary the hazard-layer input data for the model you would use --vary-parameter=input-hazards.layer. For --vary-input, you could then specify a directory containing the hazard files. This could include a wildcard pattern, such as data/*.tif, which will match all GeoTIFF files in the data directory.

There are four different GeoTIFF files in this project’s data sub-directory. Try entering the following command to run each GeoTIFF through the aggregate-hazard model.

riskscape model batch aggregate-hazard --vary-parameter=input-hazards.layer --vary-input=data/*.tif --output=output/{}

This will run the aggregate-hazard model four times, once for each GeoTIFF file. It will produce four event-impact.csv results files, which should look something like this:


Take a look at each file and see what difference the hazard-layer has made to the number of exposed buildings, and the resulting damage state.


We have specified an --output directory in this example. The {} corresponds to the input value the model is using, i.e. the name of the GeoTIFF file. This means that each event-impact.csv results file gets written to a separate directory, and the directory name will correspond to the hazard-layer used.

As well as varying input files, you can also vary other parameters for your model. In this case, you would put the values to vary in a text file, with each value on its own line. For example, buffers.txt contains the following values:


Try using these metre distance values for the sample.hazards-buffer parameter by entering the following command:

riskscape model batch aggregate-hazard --vary-parameter=sample.hazards-buffer --vary-input=buffers.txt --output=output/buffer-{}m

Take a look at the event-impact.csv files produced and see what happens to Number_exposed as the buffer distance increases.


If you want to run RiskScape models repeatedly, but the riskscape model batch command does not quite do what you would like, then you could also try Shell scripting.


RiskScape is primarily a geospatial data processing engine. The wizard models we have been using provide a standard framework that organizes that data processing for us.

The underlying data processing is called a RiskScape pipeline. A pipeline is simply a series of data-processing steps that are chained together. These steps are described in a special RiskScape pipeline language, which is based around simple expressions.

When RiskScape runs a wizard model, it first converts the wizard answers into pipeline code, and then executes the pipeline. For example, the aggregate-hazard model results in the following pipeline code:

input(relation: 'data/Buildings_SE_Upolu.shp', name: 'exposure') as exposures_input
 -> select({exposure}) as exposures
 -> join(on: true) as exposures_join_hazards
 -> select({*, sample(geometry: exposure, coverage: hazard) as hazard_sampled}) as "sample_hazard_layer"
 -> select({*, map(hazard_sampled, h -> h.sampled) as hazard})
 -> select({*, map(hazard_sampled, geom -> measure(geom) / measure(exposure)) as exposed_ratio})
 -> select({*}) as sampled
 -> select({*, hazard: mean(hazard)})
 -> select({*, consequence: map(hazard, hv -> Samoa_Building_Fragility_DS5(exposure, hv))})
 -> select({*}) as raw_results
 -> select({*}) as event_impact_table
input(value: bookmark('data/MaxEnv_All_Scenarios_10m.tif'), name: 'hazard') as hazards_input
 -> select({hazard as hazard}) as hazards
 -> exposures_join_hazards.rhs
 -> select({*}) as "report_event-impact"
 -> filter(hazard > 0)
 -> group(by: {'Total' as Results}, select: {Results, count(exposure) as Number_Exposed, percentiles(consequence, [25, 50, 75]) as DS5_Percentile})
 -> save(name: 'event-impact') as "save_event-impact"

Advanced users can execute their own pipeline-based models directly. You can try this now by running the aggregate-hazard model’s pipeline code using the following command:

riskscape pipeline evaluate pipeline.txt

This behaves the same as if you ran the aggregate-hazard model, and it produces the exact same event-impact.csv results:



Every time you run a model, RiskScape records the pipeline code it used as a pipeline.txt file in the output directory. This gives you an alternative way of re-running the exact same pipeline again in the future.

Pipelines give you much finer-grain control over your model than the wizard does. For example, if the wizard does not support exactly what you want to do, you could take the pipeline code produced by the wizard and adapt it to suit your situation.

There are advanced tutorials that cover writing RiskScape pipelines in more detail.


The Ctrl + c menu in the wizard also lets you see how the pipeline code gets built up. As you go through the wizard, you can see how the pipeline changes after you answer each question.


Let’s review some of the key points we have covered so far:

  • Centroid sampling is efficient and simple. However, it does not always detect when a hazard intersects an element-at-risk. Also, for complex geometry, the geometric centroid may actually fall outside the shape.

  • Closest sampling is simple and a good default. However, it does not necessarily return the worst hazard intensity measure if accuracy is important to your model.

  • All-intersections sampling will reliably find all hazard intensity measures that impact the element-at-risk. However, distilling multiple hazard intensity measures down into a single consequence can add complexity to your model.

  • Closest and all-intersections sampling also support a buffer distance, to detect hazards that almost intersect your element-at-risk.

  • When you use all-intersections you have the choice of aggregating the hazard or aggregating the consequence for each element-at-risk. Which approach is better will depend somewhat on what you are modelling.

  • When you use the ‘aggregate by consequence’ approach, you can scale the loss produced based on the area exposed to the hazard.

  • RiskScape supports two Python implementations: Jython (the default) and CPython. The difference only starts to matter when you want to start importing packages and do more complicated things in your functions.

  • RiskScape uses what is called a pipeline to underpin the model processing. Pipelines give you much more flexibility over your modelling, but they are much harder to create.

Extra for experts

If you want to explore the concepts we have covered a little further, you could try the following exercises out on your own.

  • Try running the sample-all-intersections model using the average, or mean, hazard intensity rather than the max, i.e. use -p "analysis.aggregate-hazards-function=mean". See what difference this makes to the model’s results.

  • Try running riskscape wizard and build your own model that uses all-intersections sampling and aggregates by hazard. Select the option to save the raw analysis. See how the raw_analysis.shp output compares to the event_impact_table.shp produced by the model. For now, you could skip the reporting step and just output the Event Impact Table as is.

  • Try running riskscape wizard again, but this time use aggregate by consequence. Select the option to save the raw analysis. See how the raw_analysis.shp output compares to the event_impact_table.shp produced by the model.

  • Look at the functions/CPython_Building_Fragility_DS5.py file in more detail. Notice that the Python file also contains a if __name__ == '__main__': block. Try executing the file manually with Python and see what results you get. This approach lets you easily test and debug your Python code outside of RiskScape.

  • In the project.ini file, try commenting out the framework = jython line for the Samoa_Building_Fragility_DS5 function. Try running the aggregate-hazard model again. What sort of error do you get?

  • Look in the output directory of one of your models for the pipeline.txt file. Try using the riskscape pipeline evaluate command to re-run the raw pipeline code in the pipeline.txt file. Tip: you will need to run this command from the same directory as your project.ini file.