Creating a RiskScape project

Before we start

This tutorial is aimed at new users who want to start creating their own projects. Projects need to be setup first, before you can build and run your own risk models in RiskScape. We expect that you:

  • Have completed the How to build RiskScape models guide and are familiar with building and running a RiskScape model.

  • Have a basic understanding of geospatial data and risk analysis.

  • Have some basic Python knowledge, or a willingness to learn.

The aim of this tutorial is to get you familiar with creating projects, bookmarks, and functions in RiskScape, so that you can build models on your own.

Getting started

Setup

Click here to download the example project we will use in this guide. Unzip the file into the top-level directory where you keep your RiskScape projects. For example, if your top-level projects directory is C:\RiskScape_Projects\, then your unzipped directory will be C:\RiskScape_Projects\project-tutorial.

Open a command prompt and cd to the directory where you unzipped the files, e.g.

cd C:\RiskScape_Projects\project-tutorial

You will use this command prompt to run the RiskScape commands in this tutorial.

The unzipped project contains a few sub-directories:

  • project-tutorial\data contains the input data files we will use in this tutorial. This data is similar to the Upolu tsunami data that we used in the previous tutorials.

  • project-tutorial\functions contains Python files we will import as RiskScape functions.

  • project-tutorial\models contains some pre-built models we will use to test our project as we go along.

Note

This input data was provided by NIWA, as well as the PCRAFI (Pacific Risk Information System) website. The data files have been adapted slightly for this tutorial.

There is also an initial project-tutorial\project.ini file that we will modify. Open this project.ini file in Notepad (or your preferred text editor).

Background

Project INI files

You may have noticed from previous tutorials that RiskScape gets all its configuration information from a project.ini file. This tells RiskScape things like what models can be run, and what input data should be used in the models.

The project.ini file is in the INI format and can be modified in any plain-text editor, such as Notepad or gedit.

INI files contain key-value pairs, which are organized into sections. Square brackets are used to indicate the start of a section. A simple INI section might look something like this:

[section my-id]
key-one = some value
key-two = 2.0

In the previous tutorials, we have used INI files to save our model’s parameters. For example:

[model basic-exposure]
description = Simple example of a RiskScape model
framework = wizard
input-exposures.layer = data/Buildings_SE_Upolu.shp
input-hazards.layer = data/MaxEnv_All_Scenarios_50m.tif
sample.hazards-by = CLOSEST
analysis.function = is_exposed

Here the section starts with model, indicating that we are defining a RiskScape model, followed by the ID of the model (basic-exposure). The lines that follow store the settings for the model’s parameters as key-value pairs.

In addition to models, the project.ini also stores details about what goes into the models. These are:

  • The input data files to use, which RiskScape calls bookmarks.

  • Python functions that will determine the impact the hazard has on each element-at-risk.

We will look into how to configure each of these in more detail.

Tip

The idea behind the project.ini file is that it provides a way to organize your RiskScape models, much like a work-space, so that you can keep related models (i.e. ones that use similar data or functions) together. Completely unrelated models can go in a separate project.ini file in another directory.

Bookmarks

A RiskScape bookmark identifies a file that can be used as an input layer in a model. Imagine your file system is a book - your bookmarks tell RiskScape what to use and how to use it.

A simple bookmark

Let’s look at a simple example. Add the following to your project.ini file.

[bookmark Samoa_electoral_boundaries]
location = data/Samoa_constituencies.shp

Each RiskScape bookmark has an ID, which is the text that follows [bookmark ...]. In this case, the bookmark ID is Samoa_electoral_boundaries. All RiskScape bookmarks must also have a location, which specifies the input data to read.

Note

Your bookmark’s ID can contain spaces, e.g. [bookmark cool file]. However, this makes some RiskScape commands slightly harder to use. You will need to enclose the bookmark ID in double-quotes when you use it on the command line, e.g. riskscape bookmark info "cool file"

Save the project.ini file and enter the following command in your terminal to check that RiskScape now knows about the new bookmark.

riskscape bookmark list

You should see output similar to the following:

+--------------------------+-----------+----------------------------------------------------------------------------+
|id                        |description|location                                                                    |
+--------------------------+-----------+----------------------------------------------------------------------------+
|Samoa_electoral_boundaries|           |file:///C:/RiskScape_Projects/project-tutorial/data/Samoa_constituencies.shp|
+--------------------------+-----------+----------------------------------------------------------------------------+

Tip

You can add an optional description key for most things in the project.ini file. The description is purely to help you keep track of what each model/bookmark/function does. You can also add comments to the INI file by using # at the start of the line.

Enter the following command into your terminal to view more detailed information about the bookmark:

riskscape bookmark info Samoa_electoral_boundaries

You should see output similar to the following:

"Samoa_electoral_boundaries"
  Description :
  Location    : file:///C:/RiskScape_Projects/project-tutorial/data/Samoa_constituencies.shp
  Attributes  :
    the_geom[MultiPolygon[crs=EPSG:4326]]
    fid[Integer]
    NAME_1[Text]
    Region[Text]
  Axis-order  : long,lat / X,Y / Easting,Northing
  CRS code    : EPSG:4326
  CRS (full)  : GEOGCS["WGS 84",
  DATUM["World Geodetic System 1984",
    SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]],
    AUTHORITY["EPSG","6326"]],
  PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]],
  UNIT["degree", 0.017453292519943295],
  AXIS["Geodetic longitude", EAST],
  AXIS["Geodetic latitude", NORTH],
  AUTHORITY["EPSG","4326"]]
  Summarizing...
  Row count   : 43
  Bounds      : EPSG:4326 [-172.8041 : -171.3977 East, -14.0772 : -13.4398 North] (original)

This output is quite technical-looking, but it tells us a few useful things:

  • The attributes that are present in the input data, i.e. the_geom, fid, NAME_1, and Region. It also shows us what type of data each attribute holds, e.g. fid is an Integer whereas NAME_1 is a Text string.

  • The Coordinate Reference System (CRS) (i.e. EPSG:4326 or WGS 84) and axis-order (i.e. long,lat) of the geometry.

  • The number of rows of data the file holds (i.e. Row count).

  • The geographic bounds of the data.

Note

The CRS is important part of the input data, which we will learn more about. Conveniently, the CRS information for shapefiles is already all defined in a .prj file (i.e. Samoa_constituencies.prj), so we don’t have to worry about specifying a CRS for the bookmark.

Manipulating the input data

The main benefit of bookmarks is that they tell RiskScape how to load the data into the model.

When you are working with Shapefiles, GeoTIFFs, and ESRI Grid (i.e. .asc) files, most of what RiskScape needs to know is already encoded into the file format. However, even with these file formats, bookmarks still allow you to manipulate the input data in useful ways.

Let’s look at a example of this in action. Your project comes with a exposure-by-region model, which is already defined in the models/models_exposure-by-region.ini file:

[model exposure-by-region]
framework = wizard
description = Produces a total count of buildings in each region exposed to tsunami inundation
input-exposures.layer = data/Buildings_SE_Upolu.shp
input-exposures.geoprocess = false
input-hazards.layer = data/MaxEnv_All_Scenarios_50m.tif
input-areas.layer = Samoa_electoral_boundaries
input-areas.geoprocess = false
sample.hazards-by = CLOSEST
analysis.function = is_exposed
report-event-impact.filter = consequence = 1
report-event-impact.group-by[0] = area
report-event-impact.aggregate[0] = count(*) as Exposed_buildings
report-event-impact.select[0] = area.Region as Region
report-event-impact.select[1] = Exposed_buildings

This model counts the number of exposed buildings by region (using our Samoa_electoral_boundaries bookmark as the area-layer), similar to models we have used in previous tutorials.

Run this model now, by entering the following command:

riskscape model run exposure-by-region

It should produce a output/exposure-by-region/TIMESTAMP/event-impact.csv results file, where TIMESTAMP is the current date/time, e.g. 2022-01-13T17_38_2. We can use the more "FILENAME" command to quickly look at a text file’s contents from the terminal, such as the event-impact.csv file produced here, e.g.

more "output/exposure-by-region/TIMESTAMP/event-impact.csv"

Tip

Forward slashes in file-paths generally work OK in the Windows Command Prompt, as long as you surround them in double-quotes, e.g. "output/some-file.csv". This means you can copy-paste the results filename from the URI that RiskScape displays. Simply select the text and use Ctrl + c and Ctrl + v to copy-paste in the Windows Command Prompt.

The event-impact.csv file should contain the following:

Region,Exposed_buildings
,10
Aleipata Itupa i Lalo,526
Aleipata Itupa i Luga,340
Falealili,749
Lepa,288
Lotofaga,146

Now let’s say we wanted a slightly different regional breakdown of the results. The area-layer is just a parameter to the model, so RiskScape will let us replace the parameter with a different file.

Try running the following command to use the data/ws_districts.shp file as our area-layer.

riskscape model run exposure-by-region -p "input-areas.layer=data/ws_districts.shp"

This time, instead of running our model, RiskScape gives us an error:

There was a problem with the parameters for wizard model
  - Failed to load the saved model. Some parameters specified may be invalid. If you have
    altered parameters manually, try going through the interactive wizard again
    - Problems found with 'report-event-impact.select' parameter
      - Failed to validate 'select({area.Region as Region, Exposed_buildings})' step ...
        - Failed to validate expression '{area.Region as Region, Exposed_buildings}' ...
          - Could not find 'area.Region' among [area.the_geom, area.fid, area.District, Exposed_buildings]

Troubleshooting RiskScape errors

RiskScape errors are often nested like this. The top problem describes the high-level operation that failed, and the subsequent problems then drill-down into more and more specific context about what went wrong.

Let’s look at these errors in more detail and try to work out what went wrong:

  • The first error tell us there was a problem loading the saved model, possibly related to the model parameters that we used.

  • The next error says the problem was specifically with the report-event-impact.select parameter. We didn’t actually change that parameter at all. In our model, that parameter looks like this:

    report-event-impact.select[0] = area.Region as Region
    
  • The next two errors specify the pipeline step and expression that failed. We will learn more about these concepts in subsequent tutorials.

  • The final error tells us that the area.Region does not exist. Only the area.the_geom, area.fid, and area.District attributes are present in the model.

So, what went wrong? The attributes that are available in a RiskScape model depend on what input data the model uses. In this case, it appears that our original area-layer has a Region attribute, but our new area-layer does not.

Let’s confirm this by taking a closer look at our new area-layer. Enter the following command:

riskscape bookmark info "data/ws_districts.shp"

You can see from the output that the file does not contain a Region attribute, although it does have a District attribute instead, i.e.

Location : file:///C:/RiskScape_Projects/project-tutorial/data/data/ws_districts.shp
  Attributes  :
    the_geom[MultiPolygon[crs=EPSG:4326]]
    fid[Integer]
    District[Text]
...

Tip

In many cases, bookmarks and file paths can be used interchangeably in RiskScape. For example, here we passed a file path directly to the riskscape bookmark info command. This means you can use file paths as model parameters without necessarily creating bookmarks.

Consistent input data

In order to reuse the same model with different input files, some attributes in the input data (in this case, the Region attribute) will need to be consistent across the files.

The naive approach would be to manually rename the attribute in the input data, and re-save the shapefile. However, this can be cumbersome and error-prone if you need to do it often.

RiskScape bookmarks can solve the problem for us.

Let’s create a new bookmark for this second area-layer shapefile. Add the following to your project.ini file and save it.

[bookmark Samoa_districts]
location = data/ws_districts.shp
set-attribute.Region = District

The last line is setting a new attribute called Region, which will hold whatever value is in the District attribute. Enter the following command to see what the bookmark data looks like now:

riskscape bookmark info Samoa_districts

You should see that there is now a new Region attribute in the output. The original District attribute is still also present.

"Samoa_districts"
  Description :
  Location    : file:///C:/RiskScape_Projects/project-tutorial/data/data/ws_districts.shp
  Attributes  :
    the_geom[MultiPolygon[crs=EPSG:4326]]
    fid[Integer]
    District[Text]
    Region[Text]
...

Now enter the following command to use our new bookmark in the model.

riskscape model run exposure-by-region -p "input-areas.layer=Samoa_districts"

This time the model runs successfully because all the attributes it needs are present in the input data.

Note

In this case we simply copied an existing attribute in the input data, but you can manipulate the data in more complicated ways. For example, you could convert imperial units into the metric system using: set-attribute.metres = feet / 3.281

Filtering

Let’s just take a quick look at the event-impact.csv results file that the last riskscape model run command produced. Use more "output/MODEL/TIMESTAMP/event-impact.csv" to look at the results, e.g.

more "output/exposure-by-region/2022-01-13T17_38_25/event-impact.csv"
Region,Exposed_buildings
Aleipata Itupa i Lalo,507
Aleipata Itupa i Luga,339
Falealili,749
Lepa,283
Lotofaga,146
Marine Area,35

If you look carefully, you will notice there is a ‘Marine Area’ region now present in the results. Our model now thinks some buildings are located in the sea, which is not ideal.

Often area-layer shapefiles will contain polygons that denote bodies of water, however, we generally want to ignore these areas in our model.

Bookmarks also let us filter the input data so that only certain rows of data are included in the model. We can specify a true/false condition, and only input data that satisfies that condition will be used in the model.

In your project.ini file, add the following line to your Samoa_districts bookmark, and save the file.

filter = Region != 'Marine Area'

Your bookmark should now look like this:

[bookmark Samoa_districts]
location = data/ws_districts.shp
set-attribute.Region = District
filter = Region != 'Marine Area'

Note

We are using a != condition here, because we want to exclude a specific row of data, i.e. include everything except the ‘Marine Area’ row of data.

Now try using the updated area-layer bookmark in your model by running the following command:

riskscape model run exposure-by-region -p "input-areas.layer=Samoa_districts"

Take a look at the event-impact.csv file that the model produces. It should look like this:

more "output/exposure-by-region/2022-01-13T18_05_00/event-impact.csv"
Region,Exposed_buildings
,19
Aleipata Itupa i Lalo,518
Aleipata Itupa i Luga,341
Falealili,749
Lepa,286
Lotofaga,146

The ‘Marine Area’ is no longer present in the results, although we do have 19 buildings that were not matched to any region now.

If you look carefully, you will notice that 35 buildings were previously matched to the ‘Marine Area’, but now only 19 buildings have no region. This is because some buildings (16) were straddling a regional boundary.

We use ‘closest’ spatially matching for the area-layer. When a building intersects two regions, we assign it to the region that’s closest to the building’s centroid. When we removed the ‘Marine Area’, it meant that 16 buildings now only intersected one region instead of two.

We could potentially use the sample.areas-buffer model parameter here to assign all buildings to a region, like we did in the previous tutorial.

Tip

The bookmark filter parameter essentially works the same as the ‘filter’ geoprocessing option in the wizard. Using the wizard can make it easier to build filter expressions.

Problematic input data

Dealing with real world data can sometimes be a little messy. Let’s look at some examples of how RiskScape deals with problematic data.

In the data/ sub-directory, there is also a problematic.shp file. Try run the following command to use it as the model’s area-layer.

riskscape model run exposure-by-region -p "input-areas.layer=data/problematic.shp"

You should see an error message like this:

15:29:14.642 [main] WARN  n.o.r.e.d.r.FeatureSourceBookmarkResolver - No crs could be parsed
  for feature source from file:///C:/RiskScape_Projects/project-tutorial/data/problematic.shp,
  falling back to generic 2d
There was a problem with the parameters for wizard model
  - Could not apply the answer to the 'input-areas.layer' parameter to your model
    - The given Geom type does not contain the required spatial meta-data (i.e. CRS). This
      could be because the input data comes from a CSV file and 'crs-name' needs to be set
      in the bookmark

The error tells us that RiskScape could not read the CRS information for this shapefile. If you look closely at the data/ sub-directory, you will see that the .prj file that contains all the shapefile’s CRS information is actually missing, i.e. there is no problematic.prj file.

Tip

In Windows Command Prompt, you can use the dir command to get a list of any matching files in a directory, e.g. dir data\problematic.prj

Let’s try doing what the error suggests and create a bookmark with crs-name set. We know the CRS for this file should be EPSG:4326, or WGS 84, so add the following to your project.ini file and save it.

[bookmark problematic]
location = data/problematic.shp
crs-name = EPSG:4326

Now, try running the following command to use the new bookmark in the model:

riskscape model run exposure-by-region -p "input-areas.layer=problematic"

This time the model runs to completion. However, we still see some warnings about invalid input data displayed:

WARNING: An invalid row of input data has been skipped
  - An invalid geometry which cannot be fixed automatically has been detected. Caused by:
    Invalid Coordinate at or near point (NaN, -172.03240134903). Refer to the Geometry
    reference in the RiskScape documentation for tips on how to avoid this. The row
    containing this geometry was: {fid=999, Region=Bad geo…}

WARNING: Problems found with 'problematic' bookmark in location
  file:///C:/RiskScape_Projects/project-tutorial/data/problematic.shp
  - Invalid geometry has been detected and fixed automatically. Refer to the Geometry
    reference in the RiskScape documentation for tips on how to avoid this. The record
    containing this geometry was: {fid=1, Region=Marine …}

These warnings tell us that RiskScape encountered invalid geometry in the input data.

The first message tells us that a row of input data was skipped because it contained invalid geometry. This means that this particular row of input data was omitted from our model.

The second message also deals with invalid geometry, but this time RiskScape fixed the geometry for us and continued to use it in the model.

Note

Under the Reference Guides in RiskScape’s documentation, there is a page on Geometry that contains more details about Invalid geometry.

If you wanted to, you can control what RiskScape does in these situations using bookmark parameters:

  • The skip-invalid bookmark parameter determines what RiskScape should do when an invalid row of input data is detected. By default, the invalid row is simply skipped and RiskScape continues, but this can be changed so that the riskscape model run command stops with an error by using skip-invalid = false.

  • validate-geometry controls whether or not RiskScape validates geometry and attempts to fix it.

Tip

The default bookmark settings should be sufficient for most modelling, so you shouldn’t need to worry too much about changing these bookmark parameters.

Using CSV data

Let’s try another bookmark example. This time we will replace the model’s exposure-layer.

We have a data/Buildings_SE_Upolu_centroids.csv Comma Separated Values (CSV) file that contains building centroid data for south-eastern Upolu. If you use the more command to look at this file, it contains data that looks like the following:

more "data/Buildings_SE_Upolu_centroids.csv"
WKT,ID,Use_Cat,Cons_Frame
POINT (422324.1392684035 8450527.521981074),1360,Outbuilding,Masonry
POINT (422192.23654263915 8450396.489492511),1361,Residential,Masonry
POINT (422204.39138965635 8450380.92939743),1362,Outbuilding,Masonry
POINT (422208.9813466044 8450102.043773355),1607,Residential,Masonry
POINT (422219.40361522196 8450115.30060319),1608,Residential,Masonry
...

Note

The first column of this CSV file contains a WKT attribute that stores geometry information in Well-Known Text (WKT) format.

Try using this CSV file in the model using the following command:

riskscape model run exposure-by-region -p "input-exposures.layer=data/Buildings_SE_Upolu_centroids.csv"

You should see the following error this time:

There was a problem with the parameters for wizard model
  - Could not apply the answer to the 'input-exposures.layer' parameter to your model
    - Geometry attribute required but none found in {WKT=>Text,
      ID=>Text, Use_Cat=>Text, Cons_Frame=>Text}

Each input layer in the RiskScape model needs to contain some form of geometry, but RiskScape couldn’t find any geometry in our exposure-layer input data.

Let’s take a look at the attributes that this CSV file contains by running the following command:

riskscape bookmark info "data/Buildings_SE_Upolu_centroids.csv"

It should produce the following output:

Location : file:///C:/RiskScape_Projects/project-tutorial/data/Buildings_SE_Upolu_centroids.csv
  Attributes  :
    WKT[Text]
    ID[Text]
    Use_Cat[Text]
    Cons_Frame[Text]
  Summarizing...
  Row count   : 6260

Each attribute in this output has a name as well as a data type, which is in the square brackets. So RiskScape can see the WKT attribute in the input data, but it has a Text string type rather than a Geometry type, which is what RiskScape needs.

Note

All the data in a RiskScape model has type information associated with it. With shapefiles, the attribute data types are saved as part of the file format. However, attributes in a CSV file are always Text type by default.

Types

We can use the set-attribute bookmark parameter to change the underlying type of the input data.

Converting CSV attributes into numeric data is pretty simple in RiskScape. It looks similar to using type casts in Python, for example:

# below converts 'year' attribute to an integer (i.e. a whole number)
set-attribute.year = int(year)
# below converts 'cost' into a floating-point number (i.e. with a decimal place)
set-attribute.cost = float(cost)

Here, the int(year) line is an example of a RiskScape expression. It is actually calling the built-in RiskScape int() function, which converts a text-string into an integer.

To turn a WKT string into a geometry type, We can use a built-in RiskScape function called geom_from_wkt. Try adding the following bookmark to your project.ini file and then save it.

[bookmark building_centroids_csv]
location = data/Buildings_SE_Upolu_centroids.csv
set-attribute.geom = geom_from_wkt(WKT)

Note

Instead of WKT, sometimes the input data will contain point geometry, where each coordinate is a separate attribute, e.g. POINT_X and POINT_Y. Instead of geom_from_wkt(WKT), you can use the create_point(POINT_X, POINT_Y) RiskScape function to turn the individual coordinates into geometry.

Run the following command to use the new bookmark in your model:

riskscape model run exposure-by-region -p "input-exposures.layer=building_centroids_csv"

We still get the following error, but we have seen this problem before.

There was a problem with the parameters for wizard model
  - Could not apply the answer to the 'input-exposures.layer' parameter to your model
    - The given Geom type does not contain the required spatial meta-data (i.e. CRS). This
      could be because the input data comes from a CSV file and 'crs-name' needs to be set
      in the bookmark

In this case, we know the geometry data is in the EPSG:32702 CRS. Add a crs-name = EPSG:32702 line to your bookmark so that it looks like this:

[bookmark building_centroids_csv]
location = data/Buildings_SE_Upolu_centroids.csv
set-attribute.geom = geom_from_wkt(WKT)
crs-name = EPSG:32702

Tip

When you have CSV input data, you will always need to specify the set-attribute.geom and crs-name parameters for your bookmark.

Save your project.ini file and try using the updated bookmark in the ‘model run’ command:

riskscape model run exposure-by-region -p "input-exposures.layer=building_centroids_csv"

This time the model should successfully output a results file.

Note

With CSV data you may also have to specify the axis-order that the CRS is in, i.e. whether the coordinates are in lat,long or long,lat order. In this case the EPSG:32702 specification defines an easting, northing (i.e. long,lat) axis order so we don’t need to specify the axis-order manually. The Geometry Reference Guide has more details on Axis/Ordinate Order.

Testing your bookmark

RiskScape provides a way to easily see what your input data will look like when it is used in your model. This is particularly useful when dealing with CSV input data, where it is easy to get the CRS axis ordering wrong.

Using the riskscape bookmark evaluate BOOKMARK_NAME command will produce a shapefile that contains all the changes that your bookmark applies to the input data. This shapefile can then be easily viewed in your preferred GIS application.

You can try this yourself using the building_centroids_csv bookmark in the project.ini file.

riskscape bookmark evaluate building_centroids_csv

Bookmark formats

How RiskScape loads input data depends on the file format that the data is in.

In our bookmark examples so far, RiskScape has determined the file format based on the file extension. However, we can use the format parameter to specify explicitly what file format the data is in.

Try adding the following bookmark to your project.ini file and save it.

[bookmark Te_Araroa]
description = An online map of the Te Araroa trail, NZ
location = https://opendata.arcgis.com/api/v3/datasets/330fe731ff444471a45d88d8b681e53d_0/downloads/data?format=geojson&spatialRefId=4326
format = geojson

This hyperlink points to a map of the Te Araroa walking trail, in GeoJSON format. RiskScape can download remote data and use it in a model, however, we need to explicitly set the bookmark’s format in this case.

Check that RiskScape can load the bookmark’s data by running the following command:

riskscape bookmark info Te_Araroa

It should display output similar to the following:

"Te_Araroa"
  Description : An online map of the Te Araroa trail, NZ
  Location    : https://opendata.arcgis.com/api/v3/datasets/330fe731ff444471a45d88d8b681e53d_0/downloads/data?format=geojson&spatialRefId=4326
  Attributes  :
    geometry[Geom[crs=EPSG:4326]]
    OBJECTID[Integer]
    SEQUENCE[Integer]
    STATUS[Text]
    LENGTH[Floating]
    NAME[Text]
    ISLAND[Text]
    LEGALSTAT[Text]
    complete[Integer]
    Notes[Text]
    Fromkm[Floating]
    Tokm[Floating]
    category[Integer]
    Cycle[Integer]
    walkid[Integer]
    mapName[Text]
    link[Text]
    editor[Text]
    create_dt[Text]
    last_editor[Text]
    last_edit_dt[Text]
    SHAPE_Length[Floating]
  Axis-order  : long,lat / X,Y / Easting,Northing
  CRS code    : EPSG:4326
  CRS (full)  : GEOGCS["WGS84",
  DATUM["WGS84",
    SPHEROID["WGS84", 6378137.0, 298.257223563]],
  PRIMEM["Greenwich", 0.0],
  UNIT["degree", 0.017453292519943295],
  AXIS["Geodetic longitude", EAST],
  AXIS["Geodetic latitude", NORTH],
  AUTHORITY["Web Map Service CRS","84"]]
  Summarizing...
  Row count   : 482
  Bounds      : EPSG:4326 [167.8103 : 175.6674 East, -46.6253 : -34.4267 North] (original)

Supported formats

The file format can affect what bookmark parameters RiskScape will accept. For example, a shapefile bookmark will support some parameters that cannot be used with a GeoTIFF bookmark.

To see a list of supported input formats, use the command:

riskscape format list

To see what parameters a particular bookmark format supports, use the command:

riskscape format info FORMAT_NAME

Functions

Besides bookmarks, the other important piece of information that our project.ini file holds is functions.

Functions are typically written in Python and are used in the Consequence Analysis phase of the model workflow, to determine the impact or consequence that the hazard has on each element-at-risk.

You may recall the following points from the previous tutorial:

  • In general, RiskScape will call your function for each element-at-risk (i.e. building) in your exposure-layer. If your data contains 6,000 buildings, then your function will get called 6,000 times.

  • RiskScape will pass your function two values: the element-at-risk and the hazard intensity measure. We call these the function’s arguments.

  • The function’s return value gets added to the model’s results as the consequence attribute.

Tip

If you are new to Python, or find the idea of RiskScape functions a little intimidating, then there is a simple RiskScape Hello, world exercise you could try first.

A simple function

Currently the exposure-by-region model uses the built-in is_exposed function. This returns 1 if the element-at-risk was exposed to any hazard data, and 0 if not.

Let’s try adding our own version of this function that applies a minimum threshold to the hazard intensity value. In the functions/ sub-directory there is a threshold.py file that contains the following Python code:

THRESHOLD = 0.1 # metres

def function(building, hazard):
    if hazard is None or hazard <= THRESHOLD:
        return 0
    else:
        return 1

Warning

This function is purely for demonstrative purposes and is not based on scientific methodology in any way.

Before we can use this function in our model, we have to tell RiskScape about it in our project.ini file. RiskScape needs to know:

  • where the Python code is located, i.e. its location.

  • what types of arguments the function expects, i.e. its argument-types.

  • what type of data the function returns, i.e. its return-type.

Add the following to your project.ini file and save it.

[function exceeds_threshold]
description = returns 1 if the hazard value exceeds a pre-determined threshold
location = functions/threshold.py
argument-types = [building: anything, hazard: nullable(floating)]
return-type = integer

The building argument type here is anything, which means we can pass any sort of exposure-layer data to our function.

The hazard argument here is nullable, which means a hazard intensity measure might not exist for every element-at-risk. For example, if a building falls outside the hazard bounds, then there will be no hazard intensity measure associated with it. In these cases our function will still be called, but the hazard argument will be nothing (None in Python).

Tip

Using the anything type as a function argument can be a little inefficient for performance, but it is a simple way to get started defining your own RiskScape functions. If your hazard-layer is shapefile data, then you could use the anything type for it too, e.g. hazard: nullable(anything).

Run the following command to check that RiskScape now knows about the function:

riskscape function list

It should display the following:

+------------------+-------------------------------------+------------------------------------+-----------+---------------+
|id                |description                          |arguments                           |return-type|category       |
+------------------+-------------------------------------+------------------------------------+-----------+---------------+
|exceeds_threshold |returns 1 if the hazard value exceeds|[building: Anything, hazard:        |Integer    |UNASSIGNED     |
|                  |a pre-determined threshold           |Nullable[Floating]]                 |           |               |
|                  |                                     |                                    |           |               |
|is_exposed        |Simple function to check if an       |[exposure: Anything, hazard:        |Integer    |RISK_MODELLING |
|                  |element-at-risk is exposed to the    |Nullable[Anything], resource:       |           |               |
|                  |hazard. Returns 1 if the `hazard`    |Nullable[Anything]]                 |           |               |
|                  |argument is present (i.e. not null)  |                                    |           |               |
|                  |and 0 if not. Useful as a placeholder|                                    |           |               |
|                  |function in risk modelling as it     |                                    |           |               |
|                  |accepts any types for exposure,      |                                    |           |               |
|                  |hazard and optional resource.        |                                    |           |               |
+------------------+-------------------------------------+------------------------------------+-----------+---------------+

Now try using this new function in your model by running the following command:

riskscape model run exposure-by-region -p "analysis.function=exceeds_threshold"

It should produce a event-impact.csv file containing the following results.

Region,Exposed_buildings
,10
Aleipata Itupa i Lalo,495
Aleipata Itupa i Luga,309
Falealili,687
Lepa,257
Lotofaga,134

If you look closely, you will see the Exposed_buildings count is now lower, as buildings that were exposed to <= 10cm of tsunami inundation are now excluded from the results.

Tip

Using a threshold function like this might be useful for dealing with hazard data such as rainfall, wind-speed, or Peak Ground Acceleration (PGA). For example, a given element-at-risk might be exposed to hazard data, but the hazard intensity might be too small to cause any real damage.

Exposure-layer arguments

The consequence that your Python function produces can vary depending on what you are modelling. The consequence might be:

  • whether or not the building is exposed to the hazard. This is what we have been modelling so far.

  • the damage state of the building. This can measure the probability that a building will sustain a given level of damage, such as complete structural collapse.

  • the resulting loss. This is the cost to repair or replace the building.

The Python function examples we have covered so far have only used the hazard function argument. Our functions have all ignored the building data that is coming from the exposure-layer, but this data will be useful if we want to calculate the damage state or loss for the building.

The exposure-layer data gets passed to the function as a Python dictionary. If our function argument is called building, then can access attributes from the exposure-layer using:

value = building['ATTRIBUTE_NAME']

Replace ATTRIBUTE_NAME with whatever exposure-layer attribute you are interested in, e.g. Use_Cat, Cons_Frame, etc. Remember that you can use the riskscape bookmark info command to see what attributes are present in your exposure-layer.

Note

You can also access the exposure-layer attributes by using building.get('ATTRIBUTE_NAME'). The difference is this approach will return None if the attribute doesn’t exist in the exposure-layer, whereas building['ATTRIBUTE_NAME'] will result in a Python KeyError exception and your model will stop.

Let’s try a simple example of using an exposure-layer attribute. In the functions/ sub-directory there is a threshold_by_cons.py file. It is similar to the threshold.py function, except it uses a different threshold based on construction type.

def function(building, hazard):
    construction = building['Cons_Frame']
    if construction == 'Masonry':
        threshold = 0.2
    else:
        threshold = 0.1

    if hazard is None or hazard <= threshold:
        return 0
    else:
        return 1

Warning

This function is purely for demonstrative purposes and is not based on scientific methodology in any way.

Add the following to your project.ini file and save it.

[function threshold_by_construction]
description = simple example of checking the building construction type
location = functions/threshold_by_cons.py
argument-types = [building: anything, hazard: nullable(floating)]
return-type = integer

This definition is very similar to the previous INI file function definition. We have only changed the function’s name, the .py file location, and its description.

Tip

We recommend using underscores (_) rather than hyphens (-) in your function names.

Now try using this new function in your model by running the following command:

riskscape model run exposure-by-region -p "analysis.function=threshold_by_construction"

It should produce a event-impact.csv file containing the following results.

Region,Exposed_buildings
,10
Aleipata Itupa i Lalo,474
Aleipata Itupa i Luga,298
Falealili,665
Lepa,257
Lotofaga,122

You can see that the results have changed again to reflect the changed logic in our function.

Returning complex consequences

The consequence, or return value, of our function can also be made up of several different attributes. For example, we might want to calculate several different damage states, or return the losses for building and land damage separately.

In order to do this, our function simply needs to return a Python dictionary. However, we have to make sure the return-type in our INI file function definition matches the return value in our Python code.

In the functions/ sub-directory there is a exposure_level.py file that contains the following code:

def function(building, hazard_depth):
    result = {}

    if hazard_depth is None or hazard_depth <= 0:
        result['exposed'] = 0
        result['level'] = 'N/A'
        return result

    if hazard_depth > 3.0:
        level = 'Exposure >3.0m'
    elif hazard_depth > 2.0:
        level = 'Exposure >2.0m to <=3.0m'
    elif hazard_depth > 1.0:
        level = 'Exposure >1.0m to <=2.0m'
    else:
        level = 'Exposure >0.0m to <=1.0m'

    result['exposed'] = 1
    result['level'] = level

    return result

It returns two attributes:

  • exposed: whether or not the building was exposed to the hazard as 0 or 1, i.e. an integer.

  • level: the range of inundation the building falls into, as a text string.

In RiskScape, a set of related attributes is called a Struct. For example, the RiskScape model holds the building data from the exposure-layer in an exposure struct.

Add the following to your project.ini file and save it.

[function exposure_level]
description = example of a function that returns multiple things
location = functions/exposure_level.py
argument-types = [building: anything, hazard: nullable(floating)]
return-type = struct(exposed: integer, level: text)

Notice that the return-type line looks quite different this time. We now return a struct type, which contains two attributes: exposed (an integer) and level (a text string).

Tip

To see what built-in types are supported by RiskScape (i.e. integer, text, etc), you can use the riskscape type-registry list command.

Try using this function in a model by running the following command:

riskscape model run group-by-consequence -p "analysis.function=exposure_level"

We are using a different model this time (group-by-consequence), which aggregates the results by consequence rather than by region. It should produces an event-impact.csv file that contains the following results:

consequence.exposed,consequence.level,Total_buildings
0,N/A,4265
1,Exposure >0.0m to <=1.0m,421
1,Exposure >1.0m to <=2.0m,390
1,Exposure >2.0m to <=3.0m,469
1,Exposure >3.0m,715

Type definitions

When there are many different attributes we want to return, defining a struct type for the function’s return-type can get a little awkward. To make life easier, we can define our own struct types separately in the project.ini file.

For example, add the following to your project.ini file and save it.

[type exposure_result]
type.exposed = integer
type.level = text

This defines a struct type called exposure_result, which contains two attributes: exposed and level. We can now use this type by name (i.e. exposure_result) for any function’s return-type or argument-types.

In your project.ini file, modify the return-type line for your exposure_level function definition, so that it looks like this:

[function exposure_level]
description = example of a function that returns multiple things
location = functions/exposure_level.py
argument-types = [building: anything, hazard: nullable(floating)]
return-type = exposure_result

This function definition will work exactly the same as it did previously. Try it out by running the model command again:

riskscape model run group-by-consequence -p "analysis.function=exposure_level"

Errors in your function

Let’s look at what happens when something goes wrong with our function.

In the functions/ sub-directory there is a bad.py file. This tries to access an attribute that isn’t present in our exposure-layer data.

def function(building, hazard):
    construction = building['Bad_attribute']

    if hazard is None or hazard <= threshold:
        return 0
    else:
        return 1

Add the following to your project.ini file and save it.

[function bad_function]
description = the exposure-layer attributes do not match what function expects
location = functions/bad.py
argument-types = [building: anything, hazard: nullable(floating)]
return-type = integer

Now try using this new function in your model by running the following command:

riskscape model run exposure-by-region -p "analysis.function=bad_function"

It should produce the following error:

Problems found with wizard model
  - Execution of your data processing pipeline failed. The reasons for this follow:
    - Failed to evaluate `{*, consequence: map(hazard, hv -> bad_function(exposure, hv))}`
      - A problem occurred while executing the function 'bad_function'. Please check
        your Python code carefully for the likely cause.
        - KeyError: Bad_attribute - File
          "file:///C:/RiskScape_Projects/project-tutorial/functions/bad.py", line 2

This message tells us the details of the Python exception that occurred (KeyError for Bad_attribute) and the line number in the Python file that triggered the problem.

This is just an example of what function errors look like in RiskScape. You don’t have to fix up the bad.py Python code unless you want to.

Note

You will get this sort of error if you change your exposure-layer and it does not contain the attributes that your function expects. You can use RiskScape’s type system to detect this problem, if you specify a struct for the argument-types instead of using anything.

Case study: damage state functions

The next example looks at how the research paper Evaluating building exposure and economic loss changes after the 2009 South Pacific Tsunami used a RiskScape function to calculate building damage.

This research used a fragility curve to determine the probability of damage to a building, based on a given tsunami hazard intensity measure. Five different damage states were used, from light non-structural damage (DS_1), through to complete structural collapse (DS_5).

The RiskScape function uses a log-normal Cumulative Distribution Function (CDF) to determine the conditional probability (between 0 and 1.0) of a building being in a given damage state as a result of the tsunami inundation.

The shape of the log-normal CDF curve will be different depending on the building’s construction material and the damage state being investigated. This means that different mean and standard deviation values will be used to build the log-normal CDF curve.

The Python code looks like this:

def function(building, hazard_depth):
    DS_1_Prob = 0.0
    DS_2_Prob = 0.0
    DS_3_Prob = 0.0
    DS_4_Prob = 0.0
    DS_5_Prob = 0.0
    construction = building["Cons_Frame"]

    if hazard_depth is not None and hazard_depth > 0:
        DS_1_Prob = log_normal_cdf(hazard_depth, -0.53, 0.46)

        if construction in ['Masonry', 'Steel']:
            DS_2_Prob = log_normal_cdf(hazard_depth, -0.33, 0.4)
            DS_3_Prob = log_normal_cdf(hazard_depth, 0.1, 0.35)
            DS_4_Prob = log_normal_cdf(hazard_depth, 0.26, 0.41)
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.39, 0.4)
        elif construction in ['Reinforced_Concrete', 'Reinforced Concrete']:
            DS_2_Prob = log_normal_cdf(hazard_depth, -0.33, 0.4)
            DS_3_Prob = log_normal_cdf(hazard_depth, 0.13, 0.56)
            DS_4_Prob = log_normal_cdf(hazard_depth, 0.53, 0.54)
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.86, 0.94)
        else: # 'Timber' or unknown
            DS_2_Prob = log_normal_cdf(hazard_depth, -0.33, 0.4)
            DS_3_Prob = log_normal_cdf(hazard_depth, 0.06, 0.38)
            DS_4_Prob = log_normal_cdf(hazard_depth, 0.1, 0.4)
            DS_5_Prob = log_normal_cdf(hazard_depth, 0.1, 0.28)

    result = {}
    result['DS_1'] = DS_1_Prob
    result['DS_2'] = DS_2_Prob
    result['DS_3'] = DS_3_Prob
    result['DS_4'] = DS_4_Prob
    result['DS_5'] = DS_5_Prob
    return result

def log_normal_cdf(x, mean, stddev):
    # this uses the built-in RiskScape 'lognorm_cdf' function
    return functions.get('lognorm_cdf').call(x, mean, stddev)

Note

This function was provided by NIWA and has been refactored and adapted for this tutorial.

There are two things of note about this Python code:

  1. The Python file contains two functions. RiskScape will try to always use the def function(... block of Python code.

  2. A built-in RiskScape function (lognorm_cdf) is used to calculate the log-normal CDF. This is the functions.get('lognorm_cdf').call(... line in the code. You can find out more about this built-in function by entering the riskscape function info lognorm_cdf command.

Note

Calling a built-in RiskScape function from Python is only possible if you use the Jython Python implementation. RiskScape Python functions use Jython by default, but you can switch to CPython instead. CPython is recommended if you want to import packages, such as numpy or scipy. The RiskScape documentation explains more about the difference between Jython vs CPython.

In order to use this function, add the following to your project.ini file and save it.

[type building]
type.Cons_Frame = text

[type damage_states]
type.DS_1 = floating
type.DS_2 = floating
type.DS_3 = floating
type.DS_4 = floating
type.DS_5 = floating

[function Samoa_Building_Fragility]
description = Samoa tsunami fragility functions for buildings
location = functions/Samoa_Building_Fragility.py
argument-types = [building, hazard: nullable(floating)]
return-type = damage_states
framework = jython

As well as defining the function, this defines types that the function uses for its argument-types and return-type.

Note

The building struct we defined only has one attribute, but our exposure-layer input data has several more attributes. The argument-types only need to define the exposure-layer attributes that your function actually uses (Cons_Frame here). This will make your functions easier to reuse with different input data.

We also want to import the pre-existing building-fragility model into our project, which will use the new function. Go to the top of your project.ini file add the line models = models/models_building-fragility.ini to the [project] section. The [project] section in your project.ini file should now look like this:

[project]
description = Initial project file. You will add more bookmarks and functions to it
models = models/models_exposure-by-region.ini
models = models/models_group-by-consequence.ini
models = models/models_building-fragility.ini

...

Try running the model with the following command:

riskscape model run building-fragility

This should produce an event-impact.csv results file. Open these results in a spreadsheet application.

The results are aggregated by region. As well as the total Exposed_buildings, we can also see a count of how many buildings have > 0.5 or > 0.9 probability of being in damage state 5 (complete structural collapse). Some percentiles are also recorded for damage state 5 and for inundation depth.

Recap

Let’s review some of the key points we have covered so far:

  • The project.ini file holds the bookmarks and functions that the model will use.

  • Bookmarks configure the input data that RiskScape models can use.

  • The attributes in a RiskScape model correspond to the attributes that are present in the input data.

  • All the data in a RiskScape model has type information associated with it.

  • Bookmarks let you manipulate the input data before it gets used by the model.

  • The input data for RiskScape models always needs a geometry-type attribute present and a CRS defined.

  • File-paths and bookmarks can often be used interchangeably in RiskScape. In particular, shapefiles, GeoTIFFs, ESRI Grid, and GeoJSON files generally have all the information RiskScape needs, such as the CRS, saved as part of the file format.

  • The riskscape bookmark info command is a useful way to find our more about a file or bookmark, such as the attributes the data contains or its CRS.

  • You always need to define a bookmark in order to use CSV input data in a model. The bookmark will need to define set-attribute.geom and crs-name for the CSV data.

  • RiskScape can do some error-checking on the input data, such as whether the geometry is valid.

  • You can use the riskscape format info command to find out more about what parameters a bookmark supports.

  • RiskScape models use a Python function to determine the impact that the hazard has on each element-at-risk. The function’s return value becomes the consequence in the model’s results.

  • The function gets passed the exposure-layer input data, along with the hazard intensity measure. These values are called the function’s arguments.

  • A set of related attributes (i.e. attributes that come from the same input layer) is called a struct in RiskScape. In your Python function, a struct is simply a Python dictionary.

  • The hazard function argument is nullable. If no hazard intensity measure was determined, then your function will be passed a hazard value equal to None.

  • You can optionally define your own struct types in your project.ini. This can make it easier to define your functions. Alternatively, you can use anything for your function’s argument-types if you’re not sure what type the data is.

  • If there is a coding error in your Python function, then you will get the Python error reported when you try to use the function in a RiskScape model.

  • RiskScape uses the Jython Python implementation by default, but you can switch to CPython if you want to use packages like numpy or scipy.

Once you feel comfortable with project files, you could go through Recapping the basics.

Extra for experts

If you want to explore bookmarks and functions a little further, you could try the following exercises out on your own.

  • Practice adding a description to some of the bookmarks you created in the project.ini file. Try also using # to add a few INI file comments.

  • Some buildings are not assigned to any region when you run the exposure-by-region model. Try specifying the sample.areas-buffer parameter when you run the model. See if you can work out the buffer distance needed to assign all buildings to a region. Start off with 100m, 1250m, 500m, 1000m, and so on.

  • Try creating a bookmark for the data/Building_XY_coords.csv file and use this as the exposure-layer in the exposure-by-region model. This file contains separate POINT_X, POINT_Y coordinate attributes for the geometry, so you will have to use create_point() instead of geom_from_wkt() in the bookmark.

  • Try creating a bookmark for the data/bad-data.csv file and use this as the exposure-layer in the exposure-by-region model. It will report warnings that rows are being skipped. See if you can identify the problem in the CSV file and fix it.

  • Try fixing up the bad_function/functions/bad.py code so that it works with the riskscape model run command.

  • In the functions/ sub-directory there is a buggy.py Python file that has a couple of problems with it. Add this function to your project and try using it in the exposure-by-region model. Look at the Python error that the riskscape model run command gives you and try to fix it in the buggy.py file. Re-run the command until the model runs successfully.

  • Try adding some debug to buggy.py Python function. Add the statements below to the Python code and then run the function in the exposure-by-region model again. Make sure you use the building centroid CSV as the exposure-layer, i.e. -p "input-exposures.layer=building_centroids_csv".

    if building['ID'] == '1000' or building['ID'] == '7000':
        print("ID: {} Cons_Frame: {} Use_Cat: {} hazard: {}".format(building['ID'],
             building['Cons_Frame'], building['Use_Cat'], hazard))
    

    .