.. _project-tutorial: # Creating a RiskScape project ## Before we start This tutorial is aimed at new users who want to start creating their own projects. Projects need to be setup first, before you can build and run your own risk models in RiskScape. We expect that you: - Have completed the :ref:`wizard-how-to` guide and are familiar with building and running a RiskScape model. - Have a basic understanding of geospatial data and risk analysis. - Have some basic Python knowledge, or a willingness to learn. The aim of this tutorial is to get you familiar with creating projects, bookmarks, and functions in RiskScape, so that you can build models on your own. ## Getting started ### Setup Click [here](../project-tutorial.zip) to download the example project we will use in this guide. Unzip the file into the :ref:`top_level_dir` where you keep your RiskScape projects. For example, if your top-level projects directory is `C:\RiskScape_Projects\`, then your unzipped directory will be `C:\RiskScape_Projects\project-tutorial`. Open a command prompt and `cd` to the directory where you unzipped the files, e.g. ```none cd project-tutorial ``` You will use this command prompt to run the RiskScape commands in this tutorial. The unzipped project contains a few sub-directories: - `project-tutorial\data` contains the input data files we will use in this tutorial. This data is similar to the :ref:`Upolu tsunami data ` that we used in the previous tutorials. - `project-tutorial\functions` contains Python files we will import as RiskScape functions. - `project-tutorial\models` contains some pre-built models we will use to test our project as we go along. .. note:: This input data was provided by `NIWA `_, as well as the `PCRAFI `_ (Pacific Risk Information System) website. The data files have been adapted slightly for this tutorial. There is also an initial `project-tutorial\project.ini` file that we will modify. Open this `project.ini` file in Notepad (or your preferred text editor). ## Background ### Project INI files You may have noticed from previous tutorials that RiskScape gets all its configuration information from a `project.ini` file. This tells RiskScape things like what models can be run, and what input data should be used in the models. The `project.ini` file is in the [INI format](https://en.wikipedia.org/wiki/INI_file) and can be modified in any plain-text editor, such as Notepad or `gedit`. INI files contain key-value pairs, which are organized into sections. Square brackets are used to indicate the start of a section. A simple INI section might look something like this: ```ini [section my-id] key-one = some value key-two = 2.0 ``` In the previous tutorials, we have used INI files to save our model's parameters. For example: ```ini [model basic-exposure] description = Simple example of a RiskScape model framework = wizard input-exposures.layer = data/Buildings_SE_Upolu.shp input-hazards.layer = data/MaxEnv_All_Scenarios_50m.tif sample.hazards-by = CLOSEST analysis.function = is_exposed ``` Here the section starts with `model`, indicating that we are defining a RiskScape model, followed by the ID of the model (`basic-exposure`). The lines that follow store the settings for the model's parameters as key-value pairs. In addition to models, the `project.ini` also stores details about what goes into the models. These are: - The input data files to use, which RiskScape calls *bookmarks*. - Python *functions* that will determine the impact the hazard has on each element-at-risk. We will look into how to configure each of these in more detail. .. tip:: The idea behind the ``project.ini`` file is that it provides a way to organize your RiskScape models, much like a work-space, so that you can keep related models (i.e. ones that use similar data or functions) together. Completely unrelated models can go in a separate ``project.ini`` file in another directory. ## Bookmarks A RiskScape bookmark identifies a file that can be used as an input layer in a model. Imagine your file system is a book - your bookmarks tell RiskScape what to use and how to use it. ### A simple bookmark Let's look at a simple example. Add the following to your `project.ini` file. ```ini [bookmark Samoa_electoral_boundaries] location = data/Samoa_constituencies.shp ``` Each RiskScape bookmark has an ID, which is the text that follows `[bookmark ...]`. In this case, the bookmark ID is `Samoa_electoral_boundaries`. All RiskScape bookmarks must also have a `location`, which specifies the input data to read. .. note:: Your bookmark's ID can contain spaces, e.g. ``[bookmark cool file]``. However, this makes some RiskScape commands slightly harder to use. You will need to enclose the bookmark ID in double-quotes when you use it on the command line, e.g. ``riskscape bookmark info "cool file"`` Save the `project.ini` file and enter the following command in your terminal to check that RiskScape now knows about the new bookmark. ```none riskscape bookmark list ``` You should see output similar to the following: ```none +--------------------------+-----------+----------------------------------------------------------------------------+ |id |description|location | +--------------------------+-----------+----------------------------------------------------------------------------+ |Samoa_electoral_boundaries| |file:///C:/RiskScape_Projects/project-tutorial/data/Samoa_constituencies.shp| +--------------------------+-----------+----------------------------------------------------------------------------+ ``` .. tip:: You can add an optional ``description`` key for most things in the ``project.ini`` file. The description is purely to help you keep track of what each model/bookmark/function does. You can also add comments to the INI file by using ``#`` at the start of the line. Enter the following command into your terminal to view more detailed information about the bookmark: ```none riskscape bookmark info Samoa_electoral_boundaries ``` You should see output similar to the following: ```none "Samoa_electoral_boundaries" Description : Location : file:///C:/RiskScape_Projects/project-tutorial/data/Samoa_constituencies.shp Attributes : the_geom[MultiPolygon[crs=EPSG:4326]] fid[Integer] NAME_1[Text] Region[Text] Axis-order : long,lat / X,Y / Easting,Northing CRS code : EPSG:4326 CRS (full) : GEOGCS["WGS 84", DATUM["World Geodetic System 1984", SPHEROID["WGS 84", 6378137.0, 298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich", 0.0, AUTHORITY["EPSG","8901"]], UNIT["degree", 0.017453292519943295], AXIS["Geodetic longitude", EAST], AXIS["Geodetic latitude", NORTH], AUTHORITY["EPSG","4326"]] Summarizing... Row count : 43 Bounds : EPSG:4326 [-172.8041 : -171.3977 East, -14.0772 : -13.4398 North] (original) ``` This output is quite technical-looking, but it tells us a few useful things: - The attributes that are present in the input data, i.e. `the_geom`, `fid`, `NAME_1`, and `Region`. It also shows us what *type* of data each attribute holds, e.g. `fid` is an `Integer` whereas `NAME_1` is a `Text` string. - The [Coordinate Reference System](https://en.wikipedia.org/wiki/Spatial_reference_system) (CRS) (i.e. EPSG:4326 or WGS 84) and axis-order (i.e. `long,lat`) of the geometry. - The number of rows of data the file holds (i.e. `Row count`). - The geographic bounds of the data. .. note:: The CRS is important part of the input data, which we will learn more about. Conveniently, the CRS information for shapefiles is already all defined in a ``.prj`` file (i.e. ``Samoa_constituencies.prj``), so we don't have to worry about specifying a CRS for the bookmark. ### Manipulating the input data The main benefit of bookmarks is that they tell RiskScape *how* to load the data into the model. When you are working with Shapefiles, GeoTIFFs, and [ESRI Grid](https://en.wikipedia.org/wiki/Esri_grid) (i.e. `.asc`) files, most of what RiskScape needs to know is already encoded into the file format. However, even with these file formats, bookmarks still allow you to manipulate the input data in useful ways. Let's look at a example of this in action. Your project comes with a `exposure-by-region` model, which is already defined in the `models/models_exposure-by-region.ini` file: ```ini [model exposure-by-region] framework = wizard description = Produces a total count of buildings in each region exposed to tsunami inundation input-exposures.layer = data/Buildings_SE_Upolu.shp input-exposures.geoprocess = false input-hazards.layer = data/MaxEnv_All_Scenarios_50m.tif input-areas.layer = Samoa_electoral_boundaries input-areas.geoprocess = false sample.hazards-by = CLOSEST analysis.function = is_exposed report-event-impact.filter = consequence = 1 report-event-impact.group-by[0] = area report-event-impact.aggregate[0] = count(*) as Exposed_buildings report-event-impact.select[0] = area.Region as Region report-event-impact.select[1] = Exposed_buildings ``` This model counts the number of exposed buildings by region (using our `Samoa_electoral_boundaries` bookmark as the *area-layer*), similar to models we have used in previous tutorials. Run this model now, by entering the following command: ```none riskscape model run exposure-by-region ``` It should produce a `output/exposure-by-region/TIMESTAMP/event-impact.csv` results file, where `TIMESTAMP` is the current date/time, e.g. `2022-01-13T17_38_2`. We can use the `more "FILENAME"` command to quickly look at a text file's contents from the terminal, such as the `event-impact.csv` file produced here, e.g. ```none more "output/exposure-by-region/TIMESTAMP/event-impact.csv" ``` .. tip:: Forward slashes in file-paths generally work OK in the Windows Command Prompt, as long as you surround them in double-quotes, e.g. ``"output/some-file.csv"``. This means you can copy-paste the results filename from the URI that RiskScape displays. Simply select the text and use ``Ctrl`` + ``c`` and ``Ctrl`` + ``v`` to copy-paste in the Windows Command Prompt. The `event-impact.csv` file should contain the following: ```none Region,Exposed_buildings ,10 Aleipata Itupa i Lalo,526 Aleipata Itupa i Luga,340 Falealili,749 Lepa,288 Lotofaga,146 ``` Now let's say we wanted a slightly different regional breakdown of the results. The area-layer is just a parameter to the model, so RiskScape will let us replace the parameter with a different file. Try running the following command to use the `data/ws_districts.shp` file as our area-layer. ```none riskscape model run exposure-by-region -p "input-areas.layer=data/ws_districts.shp" ``` This time, instead of running our model, RiskScape gives us an error: ```none There was a problem with the parameters for wizard model - Failed to load the saved model. Some parameters specified may be invalid. If you have altered parameters manually, try going through the interactive wizard again - Problems found with 'report-event-impact.select' parameter - Failed to validate 'select({area.Region as Region, Exposed_buildings})' step ... - Failed to validate expression '{area.Region as Region, Exposed_buildings}' ... - Could not find 'area.Region' among [area.the_geom, area.fid, area.District, Exposed_buildings] ``` #### Troubleshooting RiskScape errors RiskScape errors are often nested like this. The top problem describes the high-level operation that failed, and the subsequent problems then drill-down into more and more specific context about what went wrong. Let's look at these errors in more detail and try to work out what went wrong: - The first error tell us there was a problem loading the saved model, possibly related to the model parameters that we used. - The next error says the problem was specifically with the `report-event-impact.select` parameter. We didn't actually change that parameter at all. In our model, that parameter looks like this: ```ini report-event-impact.select[0] = area.Region as Region ``` - The next two errors specify the *pipeline step* and *expression* that failed. We will learn more about these concepts in subsequent tutorials. - The final error tells us that the `area.Region` does not exist. Only the `area.the_geom`, `area.fid`, and `area.District` attributes are present in the model. So, what went wrong? The attributes that are available in a RiskScape model depend on what input data the model uses. In this case, it appears that our original area-layer has a `Region` attribute, but our new area-layer does not. Let's confirm this by taking a closer look at our new area-layer. Enter the following command: ```none riskscape bookmark info "data/ws_districts.shp" ``` You can see from the output that the file does not contain a `Region` attribute, although it does have a `District` attribute instead, i.e. ``` Location : file:///C:/RiskScape_Projects/project-tutorial/data/data/ws_districts.shp Attributes : the_geom[MultiPolygon[crs=EPSG:4326]] fid[Integer] District[Text] ... ``` .. tip:: In many cases, bookmarks and file paths can be used interchangeably in RiskScape. For example, here we passed a file path directly to the ``riskscape bookmark info`` command. This means you can use file paths as model parameters without necessarily creating bookmarks. #### Consistent input data In order to reuse the same model with different input files, some attributes in the input data (in this case, the `Region` attribute) will need to be consistent across the files. The naive approach would be to manually rename the attribute in the input data, and re-save the shapefile. However, this can be cumbersome and error-prone if you need to do it often. RiskScape bookmarks can solve the problem for us. Let's create a _new_ bookmark for this second area-layer shapefile. Add the following to your `project.ini` file and save it. ```ini [bookmark Samoa_districts] location = data/ws_districts.shp set-attribute.Region = District ``` The last line is setting a new attribute called `Region`, which will hold whatever value is in the `District` attribute. Enter the following command to see what the bookmark data looks like now: ```none riskscape bookmark info Samoa_districts ``` You should see that there is now a new `Region` attribute in the output. The original `District` attribute is still also present. ```none "Samoa_districts" Description : Location : file:///C:/RiskScape_Projects/project-tutorial/data/data/ws_districts.shp Attributes : the_geom[MultiPolygon[crs=EPSG:4326]] fid[Integer] District[Text] Region[Text] ... ``` Now enter the following command to use our new bookmark in the model. ```none riskscape model run exposure-by-region -p "input-areas.layer=Samoa_districts" ``` This time the model runs successfully because all the attributes it needs are present in the input data. .. note:: In this case we simply copied an existing attribute in the input data, but you can manipulate the data in more complicated ways. For example, you could convert imperial units into the metric system using: ``set-attribute.metres = feet / 3.281`` .. _bookmark_filter: ### Filtering Let's just take a quick look at the `event-impact.csv` results file that the last `riskscape model run` command produced. Use `more "output/MODEL/TIMESTAMP/event-impact.csv"` to look at the results, e.g. ```none more "output/exposure-by-region/2022-01-13T17_38_25/event-impact.csv" Region,Exposed_buildings Aleipata Itupa i Lalo,507 Aleipata Itupa i Luga,339 Falealili,749 Lepa,283 Lotofaga,146 Marine Area,35 ``` If you look carefully, you will notice there is a 'Marine Area' region now present in the results. Our model now thinks some buildings are located in the sea, which is not ideal. Often area-layer shapefiles will contain polygons that denote bodies of water, however, we generally want to ignore these areas in our model. Bookmarks also let us *filter* the input data so that only certain rows of data are included in the model. We can specify a true/false condition, and only input data that satisfies that condition will be used in the model. In your `project.ini` file, add the following line to your `Samoa_districts` bookmark, and save the file. ```none filter = Region != 'Marine Area' ``` Your bookmark should now look like this: ```ini [bookmark Samoa_districts] location = data/ws_districts.shp set-attribute.Region = District filter = Region != 'Marine Area' ``` .. note:: We are using a ``!=`` condition here, because we want to *exclude* a specific row of data, i.e. include everything *except* the 'Marine Area' row of data. Now try using the updated area-layer bookmark in your model by running the following command: ```none riskscape model run exposure-by-region -p "input-areas.layer=Samoa_districts" ``` Take a look at the `event-impact.csv` file that the model produces. It should look like this: ``` more "output/exposure-by-region/2022-01-13T18_05_00/event-impact.csv" Region,Exposed_buildings ,19 Aleipata Itupa i Lalo,518 Aleipata Itupa i Luga,341 Falealili,749 Lepa,286 Lotofaga,146 ``` The 'Marine Area' is no longer present in the results, although we do have 19 buildings that were not matched to any region now. If you look carefully, you will notice that 35 buildings were previously matched to the 'Marine Area', but now only 19 buildings have no region. This is because some buildings (16) were straddling a regional boundary. We use 'closest' spatially matching for the area-layer. When a building intersects *two* regions, we assign it to the region that's closest to the building's centroid. When we removed the 'Marine Area', it meant that 16 buildings now only intersected _one_ region instead of two. We could potentially use the `sample.areas-buffer` model parameter here to assign *all* buildings to a region, like we did in the previous tutorial. .. tip:: The bookmark ``filter`` parameter essentially works the same as the 'filter' *geoprocessing* option in the wizard. Using the wizard can make it easier to build filter expressions. ### Problematic input data Dealing with real world data can sometimes be a little messy. Let's look at some examples of how RiskScape deals with problematic data. In the `data/` sub-directory, there is also a `problematic.shp` file. Try run the following command to use it as the model's area-layer. ```none riskscape model run exposure-by-region -p "input-areas.layer=data/problematic.shp" ``` You should see an error message like this: ```none 15:29:14.642 [main] WARN n.o.r.e.d.r.FeatureSourceBookmarkResolver - No crs could be parsed for feature source from file:///C:/RiskScape_Projects/project-tutorial/data/problematic.shp, falling back to generic 2d There was a problem with the parameters for wizard model - Could not apply the answer to the 'input-areas.layer' parameter to your model - The given Geom type does not contain the required spatial meta-data (i.e. CRS). This could be because the input data comes from a CSV file and 'crs-name' needs to be set in the bookmark ``` The error tells us that RiskScape could not read the CRS information for this shapefile. If you look closely at the `data/` sub-directory, you will see that the `.prj` file that contains all the shapefile's CRS information is actually missing, i.e. there is no `problematic.prj` file. .. tip:: In Windows Command Prompt, you can use the ``dir`` command to get a list of any matching files in a directory, e.g. ``dir data\problematic.prj`` Let's try doing what the error suggests and create a bookmark with `crs-name` set. We know the CRS for this file _should_ be [EPSG:4326](https://epsg.io/4326), or WGS 84, so add the following to your `project.ini` file and save it. ```ini [bookmark problematic] location = data/problematic.shp crs-name = EPSG:4326 ``` Now, try running the following command to use the new bookmark in the model: ```none riskscape model run exposure-by-region -p "input-areas.layer=problematic" ``` This time the model runs to completion. However, we still see some warnings about invalid input data displayed: ``` WARNING: An invalid row of input data has been skipped - An invalid geometry which cannot be fixed automatically has been detected. Caused by: Invalid Coordinate at or near point (NaN, -172.03240134903). Refer to the Geometry reference in the RiskScape documentation for tips on how to avoid this. The row containing this geometry was: {fid=999, Region=Bad geo…} WARNING: Problems found with 'problematic' bookmark in location file:///C:/RiskScape_Projects/project-tutorial/data/problematic.shp - Invalid geometry has been detected and fixed automatically. Refer to the Geometry reference in the RiskScape documentation for tips on how to avoid this. The record containing this geometry was: {fid=1, Region=Marine …} ``` These warnings tell us that RiskScape encountered invalid geometry in the input data. The first message tells us that a row of input data was _skipped_ because it contained invalid geometry. This means that this particular row of input data was omitted from our model. The second message also deals with invalid geometry, but this time RiskScape _fixed_ the geometry for us and continued to use it in the model. .. note:: Under the *Reference Guides* in RiskScape's documentation, there is a page on Geometry that contains more details about :ref:`invalid-geometry`. If you wanted to, you can control what RiskScape does in these situations using bookmark parameters: - The `skip-invalid` bookmark parameter determines what RiskScape should do when an invalid row of input data is detected. By default, the invalid row is simply skipped and RiskScape continues, but this can be changed so that the `riskscape model run` command stops with an error by using `skip-invalid = false`. - `validate-geometry` controls whether or not RiskScape validates geometry and attempts to fix it. .. tip:: The default bookmark settings *should* be sufficient for most modelling, so you shouldn't need to worry too much about changing these bookmark parameters. ### Using CSV data Let's try another bookmark example. This time we will replace the model's *exposure-layer*. We have a `data/Buildings_SE_Upolu_centroids.csv` Comma Separated Values (CSV) file that contains building centroid data for south-eastern Upolu. If you use the `more` command to look at this file, it contains data that looks like the following: ```none more "data/Buildings_SE_Upolu_centroids.csv" WKT,ID,Use_Cat,Cons_Frame POINT (422324.1392684035 8450527.521981074),1360,Outbuilding,Masonry POINT (422192.23654263915 8450396.489492511),1361,Residential,Masonry POINT (422204.39138965635 8450380.92939743),1362,Outbuilding,Masonry POINT (422208.9813466044 8450102.043773355),1607,Residential,Masonry POINT (422219.40361522196 8450115.30060319),1608,Residential,Masonry ... ``` .. note:: The first column of this CSV file contains a ``WKT`` attribute that stores geometry information in Well-Known Text (WKT) format. Try using this CSV file in the model using the following command: ```none riskscape model run exposure-by-region -p "input-exposures.layer=data/Buildings_SE_Upolu_centroids.csv" ``` You should see the following error this time: ```none There was a problem with the parameters for wizard model - Could not apply the answer to the 'input-exposures.layer' parameter to your model - Geometry attribute required but none found in {WKT=>Text, ID=>Text, Use_Cat=>Text, Cons_Frame=>Text} ``` Each input layer in the RiskScape model needs to contain some form of geometry, but RiskScape couldn't find any geometry in our exposure-layer input data. Let's take a look at the attributes that this CSV file contains by running the following command: ```none riskscape bookmark info "data/Buildings_SE_Upolu_centroids.csv" ``` It should produce the following output: ```none Location : file:///C:/RiskScape_Projects/project-tutorial/data/Buildings_SE_Upolu_centroids.csv Attributes : WKT[Text] ID[Text] Use_Cat[Text] Cons_Frame[Text] Summarizing... Row count : 6260 ``` Each attribute in this output has a name as well as a _data type_, which is in the square brackets. So RiskScape can see the `WKT` attribute in the input data, but it has a `Text` string type rather than a `Geometry` type, which is what RiskScape needs. .. note:: All the data in a RiskScape model has type information associated with it. With shapefiles, the attribute data types are saved as part of the file format. However, attributes in a CSV file are *always* ``Text`` type by default. .. _set_attribute_type: ### Types We can use the `set-attribute` bookmark parameter to change the underlying type of the input data. Converting CSV attributes into numeric data is pretty simple in RiskScape. It looks similar to using type casts in Python, for example: ```ini # below converts 'year' attribute to an integer (i.e. a whole number) set-attribute.year = int(year) # below converts 'cost' into a floating-point number (i.e. with a decimal place) set-attribute.cost = float(cost) ``` Here, the `int(year)` line is an example of a _RiskScape expression_. It is actually calling the built-in RiskScape `int()` function, which converts a text-string into an integer. To turn a WKT string into a geometry type, We can use a built-in RiskScape _function_ called `geom_from_wkt`. Try adding the following bookmark to your `project.ini` file and then save it. ```ini [bookmark building_centroids_csv] location = data/Buildings_SE_Upolu_centroids.csv set-attribute.geom = geom_from_wkt(WKT) ``` .. note:: Instead of WKT, sometimes the input data will contain point geometry, where each coordinate is a *separate* attribute, e.g. ``POINT_X`` and ``POINT_Y``. Instead of ``geom_from_wkt(WKT)``, you can use the ``create_point(POINT_X, POINT_Y)`` RiskScape function to turn the individual coordinates into geometry. Run the following command to use the new bookmark in your model: ```none riskscape model run exposure-by-region -p "input-exposures.layer=building_centroids_csv" ``` We still get the following error, but we have seen this problem before. ```none There was a problem with the parameters for wizard model - Could not apply the answer to the 'input-exposures.layer' parameter to your model - The given Geom type does not contain the required spatial meta-data (i.e. CRS). This could be because the input data comes from a CSV file and 'crs-name' needs to be set in the bookmark ``` In this case, we know the geometry data is in the [EPSG:32702](https://epsg.io/32702) CRS. Add a `crs-name = EPSG:32702` line to your bookmark so that it looks like this: ```ini [bookmark building_centroids_csv] location = data/Buildings_SE_Upolu_centroids.csv set-attribute.geom = geom_from_wkt(WKT) crs-name = EPSG:32702 ``` .. tip:: When you have CSV input data, you will *always* need to specify the ``set-attribute.geom`` *and* ``crs-name`` parameters for your bookmark. Save your `project.ini` file and try using the updated bookmark in the 'model run' command: ```none riskscape model run exposure-by-region -p "input-exposures.layer=building_centroids_csv" ``` This time the model should successfully output a results file. .. note:: With CSV data you may also have to specify the axis-order that the CRS is in, i.e. whether the coordinates are in ``lat,long`` or ``long,lat`` order. In this case the EPSG:32702 specification defines an *easting, northing* (i.e. ``long,lat``) axis order so we don't need to specify the axis-order manually. The Geometry Reference Guide has more details on :ref:`crs-lat-long`. #### Testing your bookmark RiskScape provides a way to easily see what your input data will look like when it is used in your model. This is particularly useful when dealing with CSV input data, where it is easy to get the CRS axis ordering wrong. Using the `riskscape bookmark evaluate BOOKMARK_NAME` command will produce a shapefile that contains all the changes that your bookmark applies to the input data. This shapefile can then be easily viewed in your preferred GIS application. You can try this yourself using the `building_centroids_csv` bookmark in the `project.ini` file. ```none riskscape bookmark evaluate building_centroids_csv ``` ### Bookmark formats How RiskScape loads input data depends on the file _format_ that the data is in. In our bookmark examples so far, RiskScape has determined the file format based on the file extension. However, we can use the `format` parameter to specify explicitly what file format the data is in. Try adding the following bookmark to your `project.ini` file and save it. ```ini [bookmark Te_Araroa] description = An online map of the Te Araroa trail, NZ location = https://opendata.arcgis.com/api/v3/datasets/330fe731ff444471a45d88d8b681e53d_0/downloads/data?format=geojson&spatialRefId=4326 format = geojson ``` This hyperlink points to a map of the [Te Araroa](https://www.teararoa.org.nz) walking trail, in GeoJSON format. RiskScape can download remote data and use it in a model, however, we need to explicitly set the bookmark's `format` in this case. Check that RiskScape can load the bookmark's data by running the following command: ```none riskscape bookmark info Te_Araroa ``` It should display output similar to the following: ```none "Te_Araroa" Description : An online map of the Te Araroa trail, NZ Location : https://opendata.arcgis.com/api/v3/datasets/330fe731ff444471a45d88d8b681e53d_0/downloads/data?format=geojson&spatialRefId=4326 Attributes : geometry[Geom[crs=EPSG:4326]] OBJECTID[Integer] SEQUENCE[Integer] STATUS[Text] LENGTH[Floating] NAME[Text] ISLAND[Text] LEGALSTAT[Text] complete[Integer] Notes[Text] Fromkm[Floating] Tokm[Floating] category[Integer] Cycle[Integer] walkid[Integer] mapName[Text] link[Text] editor[Text] create_dt[Text] last_editor[Text] last_edit_dt[Text] SHAPE_Length[Floating] Axis-order : long,lat / X,Y / Easting,Northing CRS code : EPSG:4326 CRS (full) : GEOGCS["WGS84", DATUM["WGS84", SPHEROID["WGS84", 6378137.0, 298.257223563]], PRIMEM["Greenwich", 0.0], UNIT["degree", 0.017453292519943295], AXIS["Geodetic longitude", EAST], AXIS["Geodetic latitude", NORTH], AUTHORITY["Web Map Service CRS","84"]] Summarizing... Row count : 482 Bounds : EPSG:4326 [167.8103 : 175.6674 East, -46.6253 : -34.4267 North] (original) ``` #### Supported formats The file format can affect what bookmark parameters RiskScape will accept. For example, a shapefile bookmark will support some parameters that cannot be used with a GeoTIFF bookmark. To see a list of supported input formats, use the command: ```none riskscape format list ``` To see what parameters a particular bookmark format supports, use the command: ```none riskscape format info FORMAT_NAME ``` ## Functions Besides bookmarks, the other important piece of information that our `project.ini` file holds is _functions_. Functions are typically written in Python and are used in the _Consequence Analysis_ phase of the model workflow, to determine the _impact_ or _consequence_ that the hazard has on each element-at-risk. You may recall the following points from the previous tutorial: - In general, RiskScape will call your function for _each_ element-at-risk (i.e. building) in your exposure-layer. If your data contains 6,000 buildings, then your function will get called 6,000 times. - RiskScape will pass your function two values: the element-at-risk and the hazard intensity measure. We call these the function's _arguments_. - The function's return value gets added to the model's results as the `consequence` attribute. .. tip:: If you are new to Python, or find the idea of RiskScape functions a little intimidating, then there is a simple RiskScape :ref:`Hello, world ` exercise you could try first. ### A simple function Currently the `exposure-by-region` model uses the built-in `is_exposed` function. This returns `1` if the element-at-risk was exposed to _any_ hazard data, and `0` if not. Let's try adding our own version of this function that applies a minimum _threshold_ to the hazard intensity value. In the `functions/` sub-directory there is a `threshold.py` file that contains the following Python code: ```python THRESHOLD = 0.1 # metres def function(building, hazard): if hazard is None or hazard <= THRESHOLD: return 0 else: return 1 ``` .. warning:: This function is purely for demonstrative purposes and is **not** based on scientific methodology in any way. Before we can use this function in our model, we have to tell RiskScape about it in our `project.ini` file. RiskScape needs to know: - where the Python code is located, i.e. its `location`. - what _types_ of arguments the function expects, i.e. its `argument-types`. - what _type_ of data the function returns, i.e. its `return-type`. Add the following to your `project.ini` file and save it. ```ini [function exceeds_threshold] description = returns 1 if the hazard value exceeds a pre-determined threshold location = functions/threshold.py argument-types = [building: anything, hazard: nullable(floating)] return-type = integer ``` The `building` argument type here is `anything`, which means we can pass any sort of exposure-layer data to our function. The `hazard` argument here is `nullable`, which means a hazard intensity measure might not exist for _every_ element-at-risk. For example, if a building falls outside the hazard bounds, then there will be no hazard intensity measure associated with it. In these cases our function will still be called, but the `hazard` argument will be nothing (`None` in Python). .. tip:: Using the ``anything`` type as a function argument can be a little inefficient for performance, but it is a simple way to get started defining your own RiskScape functions. If your hazard-layer is shapefile data, then you could use the ``anything`` type for it too, e.g. ``hazard: nullable(anything)``. Run the following command to check that RiskScape now knows about the function: ```none riskscape function list ``` It should display the following: ```none +------------------+-------------------------------------+------------------------------------+-----------+---------------+ |id |description |arguments |return-type|category | +------------------+-------------------------------------+------------------------------------+-----------+---------------+ |exceeds_threshold |returns 1 if the hazard value exceeds|[building: Anything, hazard: |Integer |UNASSIGNED | | |a pre-determined threshold |Nullable[Floating]] | | | | | | | | | |is_exposed |Simple function to check if an |[exposure: Anything, hazard: |Integer |RISK_MODELLING | | |element-at-risk is exposed to the |Nullable[Anything], resource: | | | | |hazard. Returns 1 if the `hazard` |Nullable[Anything]] | | | | |argument is present (i.e. not null) | | | | | |and 0 if not. Useful as a placeholder| | | | | |function in risk modelling as it | | | | | |accepts any types for exposure, | | | | | |hazard and optional resource. | | | | +------------------+-------------------------------------+------------------------------------+-----------+---------------+ ``` Now try using this new function in your model by running the following command: ```none riskscape model run exposure-by-region -p "analysis.function=exceeds_threshold" ``` It should produce a `event-impact.csv` file containing the following results. ```none Region,Exposed_buildings ,10 Aleipata Itupa i Lalo,498 Aleipata Itupa i Luga,318 Falealili,704 Lepa,264 Lotofaga,138 ``` If you look closely, you will see the `Exposed_buildings` count is now lower, as buildings that were exposed to <= 10cm of tsunami inundation are now excluded from the results. .. tip:: Using a threshold function like this might be useful for dealing with hazard data such as rainfall, wind-speed, or Peak Ground Acceleration (PGA). For example, a given element-at-risk might be exposed to hazard data, but the hazard intensity might be too small to cause any real damage. ### Exposure-layer arguments The consequence that your Python function produces can vary depending on what you are modelling. The consequence might be: - whether or not the building is exposed to the hazard. This is what we have been modelling so far. - the _damage state_ of the building. This can measure the probability that a building will sustain a given level of damage, such as complete structural collapse. - the resulting loss. This is the cost to repair or replace the building. The Python function examples we have covered so far have only used the `hazard` function argument. Our functions have all ignored the building data that is coming from the exposure-layer, but this data will be useful if we want to calculate the damage state or loss for the building. The exposure-layer data gets passed to the function as a Python dictionary. If our function argument is called `building`, then can access attributes from the exposure-layer using: ```python value = building['ATTRIBUTE_NAME'] ``` Replace `ATTRIBUTE_NAME` with whatever exposure-layer attribute you are interested in, e.g. `Use_Cat`, `Cons_Frame`, etc. Remember that you can use the ``riskscape bookmark info`` command to see what attributes are present in your exposure-layer. .. note:: You can also access the exposure-layer attributes by using ``building.get('ATTRIBUTE_NAME')``. The difference is this approach will return ``None`` if the attribute doesn't exist in the exposure-layer, whereas ``building['ATTRIBUTE_NAME']`` will result in a Python ``KeyError`` exception and your model will stop. Let's try a simple example of using an exposure-layer attribute. In the `functions/` sub-directory there is a `threshold_by_cons.py` file. It is similar to the `threshold.py` function, except it uses a _different_ threshold based on construction type. ```python def function(building, hazard): construction = building['Cons_Frame'] if construction == 'Masonry': threshold = 0.2 else: threshold = 0.1 if hazard is None or hazard <= threshold: return 0 else: return 1 ``` .. warning:: This function is purely for demonstrative purposes and is **not** based on scientific methodology in any way. Add the following to your `project.ini` file and save it. ```ini [function threshold_by_construction] description = simple example of checking the building construction type location = functions/threshold_by_cons.py argument-types = [building: anything, hazard: nullable(floating)] return-type = integer ``` This definition is very similar to the previous INI file function definition. We have only changed the function's name, the `.py` file location, and its description. .. tip:: We recommend using underscores (``_``) rather than hyphens (``-``) in your function names. Now try using this new function in your model by running the following command: ```none riskscape model run exposure-by-region -p "analysis.function=threshold_by_construction" ``` It should produce a `event-impact.csv` file containing the following results. ```none Region,Exposed_buildings ,10 Aleipata Itupa i Lalo,476 Aleipata Itupa i Luga,307 Falealili,679 Lepa,262 Lotofaga,126 ``` You can see that the results have changed again to reflect the changed logic in our function. ### Returning complex consequences The `consequence`, or return value, of our function can also be made up of several different attributes. For example, we might want to calculate several different damage states, or return the losses for building and land damage separately. In order to do this, our function simply needs to return a Python dictionary. However, we have to make sure the `return-type` in our INI file function definition matches the return value in our Python code. In the `functions/` sub-directory there is a `exposure_level.py` file that contains the following code: ```python def function(building, hazard_depth): result = {} if hazard_depth is None or hazard_depth <= 0: result['exposed'] = 0 result['level'] = 'N/A' return result if hazard_depth > 3.0: level = 'Exposure >3.0m' elif hazard_depth > 2.0: level = 'Exposure >2.0m to <=3.0m' elif hazard_depth > 1.0: level = 'Exposure >1.0m to <=2.0m' else: level = 'Exposure >0.0m to <=1.0m' result['exposed'] = 1 result['level'] = level return result ``` It returns two attributes: - `exposed`: whether or not the building was exposed to the hazard as `0` or `1`, i.e. an _integer_. - `level`: the range of inundation the building falls into, as a _text_ string. In RiskScape, a set of related attributes is called a _Struct_. For example, the RiskScape model holds the building data from the exposure-layer in an `exposure` struct. Add the following to your `project.ini` file and save it. ```ini [function exposure_level] description = example of a function that returns multiple things location = functions/exposure_level.py argument-types = [building: anything, hazard: nullable(floating)] return-type = struct(exposed: integer, level: text) ``` Notice that the `return-type` line looks quite different this time. We now return a `struct` type, which contains two attributes: `exposed` (an `integer`) and `level` (a `text` string). .. tip:: To see what built-in types are supported by RiskScape (i.e. ``integer``, ``text``, etc), you can use the ``riskscape type-registry list`` command. Try using this function in a model by running the following command: ```none riskscape model run group-by-consequence -p "analysis.function=exposure_level" ``` We are using a different model this time (`group-by-consequence`), which aggregates the results by _consequence_ rather than by _region_. It should produces an `event-impact.csv` file that contains the following results: ```none consequence.exposed,consequence.level,Total_buildings 0,N/A,4201 1,Exposure >0.0m to <=1.0m,473 1,Exposure >1.0m to <=2.0m,394 1,Exposure >2.0m to <=3.0m,472 1,Exposure >3.0m,720 ``` ### Type definitions When there are many different attributes we want to return, defining a `struct` type for the function's `return-type` can get a little awkward. To make life easier, we can define our own struct types separately in the `project.ini` file. For example, add the following to your `project.ini` file and save it. ```ini [type exposure_result] type.exposed = integer type.level = text ``` This defines a `struct` type called `exposure_result`, which contains two attributes: `exposed` and `level`. We can now use this type by name (i.e. `exposure_result`) for any function's `return-type` or `argument-types`. In your `project.ini` file, modify the `return-type` line for your `exposure_level` function definition, so that it looks like this: ```ini [function exposure_level] description = example of a function that returns multiple things location = functions/exposure_level.py argument-types = [building: anything, hazard: nullable(floating)] return-type = exposure_result ``` This function definition will work exactly the same as it did previously. Try it out by running the model command again: ```none riskscape model run group-by-consequence -p "analysis.function=exposure_level" ``` ### Errors in your function Let's look at what happens when something goes wrong with our function. In the `functions/` sub-directory there is a `bad.py` file. This tries to access an attribute that isn't present in our exposure-layer data. ```python def function(building, hazard): construction = building['Bad_attribute'] if hazard is None or hazard <= threshold: return 0 else: return 1 ``` Add the following to your `project.ini` file and save it. ```ini [function bad_function] description = the exposure-layer attributes do not match what function expects location = functions/bad.py argument-types = [building: anything, hazard: nullable(floating)] return-type = integer ``` Now try using this new function in your model by running the following command: ```none riskscape model run exposure-by-region -p "analysis.function=bad_function" ``` It should produce the following error: ```none Problems found with wizard model - Execution of your data processing pipeline failed. The reasons for this follow: - Failed to evaluate `{*, consequence: map(hazard, hv -> bad_function(exposure, hv))}` - A problem occurred while executing the function 'bad_function'. Please check your Python code carefully for the likely cause. - KeyError: Bad_attribute - File "file:///C:/RiskScape_Projects/project-tutorial/functions/bad.py", line 2 ``` This message tells us the details of the Python exception that occurred (`KeyError` for `Bad_attribute`) and the line number in the Python file that triggered the problem. This is just an example of what function errors look like in RiskScape. You don't have to fix up the `bad.py` Python code unless you want to. .. note:: You will get this sort of error if you change your exposure-layer and it does not contain the attributes that your function expects. You can use RiskScape's type system to detect this problem, if you specify a ``struct`` for the ``argument-types`` instead of using ``anything``. ### Case study: damage state functions The next example looks at how the research paper [Evaluating building exposure and economic loss changes after the 2009 South Pacific Tsunami](https://www.sciencedirect.com/science/article/abs/pii/S2212420921000972) used a RiskScape function to calculate building damage. This research used a fragility curve to determine the probability of damage to a building, based on a given tsunami hazard intensity measure. Five different damage states were used, from light non-structural damage (`DS_1`), through to complete structural collapse (`DS_5`). The RiskScape function uses a [log-normal Cumulative Distribution Function (CDF)](https://en.wikipedia.org/wiki/Log-normal_distribution) to determine the conditional probability (between 0 and 1.0) of a building being in a given damage state as a result of the tsunami inundation. The shape of the log-normal CDF curve will be different depending on the building's construction material and the damage state being investigated. This means that different mean and standard deviation values will be used to build the log-normal CDF curve. The Python code looks like this: ```python def function(building, hazard_depth): DS_1_Prob = 0.0 DS_2_Prob = 0.0 DS_3_Prob = 0.0 DS_4_Prob = 0.0 DS_5_Prob = 0.0 construction = building["Cons_Frame"] if hazard_depth is not None and hazard_depth > 0: DS_1_Prob = log_normal_cdf(hazard_depth, -0.53, 0.46) if construction in ['Masonry', 'Steel']: DS_2_Prob = log_normal_cdf(hazard_depth, -0.33, 0.4) DS_3_Prob = log_normal_cdf(hazard_depth, 0.1, 0.35) DS_4_Prob = log_normal_cdf(hazard_depth, 0.26, 0.41) DS_5_Prob = log_normal_cdf(hazard_depth, 0.39, 0.4) elif construction in ['Reinforced_Concrete', 'Reinforced Concrete']: DS_2_Prob = log_normal_cdf(hazard_depth, -0.33, 0.4) DS_3_Prob = log_normal_cdf(hazard_depth, 0.13, 0.56) DS_4_Prob = log_normal_cdf(hazard_depth, 0.53, 0.54) DS_5_Prob = log_normal_cdf(hazard_depth, 0.86, 0.94) else: # 'Timber' or unknown DS_2_Prob = log_normal_cdf(hazard_depth, -0.33, 0.4) DS_3_Prob = log_normal_cdf(hazard_depth, 0.06, 0.38) DS_4_Prob = log_normal_cdf(hazard_depth, 0.1, 0.4) DS_5_Prob = log_normal_cdf(hazard_depth, 0.1, 0.28) result = {} result['DS_1'] = DS_1_Prob result['DS_2'] = DS_2_Prob result['DS_3'] = DS_3_Prob result['DS_4'] = DS_4_Prob result['DS_5'] = DS_5_Prob return result def log_normal_cdf(x, mean, stddev): # this uses the built-in RiskScape 'lognorm_cdf' function return functions.get('lognorm_cdf').call(x, mean, stddev) ``` .. note:: This function was provided by `NIWA `_ and has been refactored and adapted for this tutorial. There are two things of note about this Python code: 1. The Python file contains *two* functions. RiskScape will try to always use the `def function(...` block of Python code. 2. A built-in RiskScape function (`lognorm_cdf`) is used to calculate the log-normal CDF. This is the `functions.get('lognorm_cdf').call(...` line in the code. You can find out more about this built-in function by entering the `riskscape function info lognorm_cdf` command. .. note:: Calling a built-in RiskScape function from Python is only possible if you use the *Jython* Python implementation. RiskScape Python functions use Jython by default, but you can switch to CPython instead. CPython is recommended if you want to import packages, such as ``numpy`` or ``scipy``. The RiskScape documentation explains more about the difference between :ref:`jython_vs_cpython`. In order to use this function, add the following to your `project.ini` file and save it. ```ini [type building] type.Cons_Frame = text [type damage_states] type.DS_1 = floating type.DS_2 = floating type.DS_3 = floating type.DS_4 = floating type.DS_5 = floating [function Samoa_Building_Fragility] description = Samoa tsunami fragility functions for buildings location = functions/Samoa_Building_Fragility.py argument-types = [building, hazard: nullable(floating)] return-type = damage_states framework = jython ``` As well as defining the function, this defines types that the function uses for its `argument-types` and `return-type`. .. note:: The ``building`` struct we defined only has *one* attribute, but our exposure-layer input data has several more attributes. The ``argument-types`` only need to define the exposure-layer attributes that your function actually uses (``Cons_Frame`` here). This will make your functions easier to reuse with different input data. We also want to import the pre-existing `building-fragility` model into our project, which will use the new function. Go to the _top_ of your `project.ini` file add the line `models = models/models_building-fragility.ini` to the `[project]` section. The `[project]` section in your `project.ini` file should now look like this: ```ini [project] description = Initial project file. You will add more bookmarks and functions to it models = models/models_exposure-by-region.ini models = models/models_group-by-consequence.ini models = models/models_building-fragility.ini ... ``` Try running the model with the following command: ```none riskscape model run building-fragility ``` This should produce an `event-impact.csv` results file. Open these results in a spreadsheet application. The results are aggregated by region. As well as the total `Exposed_buildings`, we can also see a count of how many buildings have > 0.5 or > 0.9 probability of being in damage state 5 (complete structural collapse). Some percentiles are also recorded for damage state 5 and for inundation depth. ## Recap Let's review some of the key points we have covered so far: - The `project.ini` file holds the bookmarks and functions that the model will use. - Bookmarks configure the input data that RiskScape models can use. - The attributes in a RiskScape model correspond to the attributes that are present in the input data. - All the data in a RiskScape model has _type_ information associated with it. - Bookmarks let you manipulate the input data *before* it gets used by the model. - The input data for RiskScape models *always* needs a geometry-type attribute present and a CRS defined. - File-paths and bookmarks can often be used interchangeably in RiskScape. In particular, shapefiles, GeoTIFFs, ESRI Grid, and GeoJSON files generally have all the information RiskScape needs, such as the CRS, saved as part of the file format. - The `riskscape bookmark info` command is a useful way to find our more about a file or bookmark, such as the attributes the data contains or its CRS. - You *always* need to define a bookmark in order to use CSV input data in a model. The bookmark will need to define `set-attribute.geom` and `crs-name` for the CSV data. - RiskScape can do some error-checking on the input data, such as whether the geometry is valid. - You can use the `riskscape format info` command to find out more about what parameters a bookmark supports. - RiskScape models use a Python function to determine the impact that the hazard has on *each* element-at-risk. The function's return value becomes the `consequence` in the model's results. - The function gets passed the exposure-layer input data, along with the hazard intensity measure. These values are called the function's *arguments*. - A set of related attributes (i.e. attributes that come from the same input layer) is called a _struct_ in RiskScape. In your Python function, a struct is simply a Python dictionary. - The hazard function argument is `nullable`. If no hazard intensity measure was determined, then your function will be passed a `hazard` value equal to `None`. - You can optionally define your own struct types in your `project.ini`. This can make it easier to define your functions. Alternatively, you can use `anything` for your function's `argument-types` if you're not sure what type the data is. - If there is a coding error in your Python function, then you will get the Python error reported when you try to use the function in a RiskScape model. - RiskScape uses the Jython Python implementation by default, but you can switch to CPython if you want to use packages like `numpy` or `scipy`. Once you feel comfortable with project files, you could go through :ref:`recap_intro`. ## Extra for experts If you want to explore bookmarks and functions a little further, you could try the following exercises out on your own. - Practice adding a `description` to some of the bookmarks you created in the `project.ini` file. Try also using `#` to add a few INI file comments. - Some buildings are not assigned to any region when you run the `exposure-by-region` model. Try specifying the `sample.areas-buffer` parameter when you run the model. See if you can work out the buffer distance needed to assign _all_ buildings to a region. Start off with 100m, 1250m, 500m, 1000m, and so on. - Try creating a bookmark for the `data/Building_XY_coords.csv` file and use this as the exposure-layer in the `exposure-by-region` model. This file contains separate `POINT_X`, `POINT_Y` coordinate attributes for the geometry, so you will have to use `create_point()` instead of `geom_from_wkt()` in the bookmark. - Try creating a bookmark for the `data/bad-data.csv` file and use this as the exposure-layer in the `exposure-by-region` model. It will report warnings that rows are being skipped. See if you can identify the problem in the CSV file and fix it. - Try fixing up the `bad_function`/`functions/bad.py` code so that it works with the `riskscape model run` command. - In the `functions/` sub-directory there is a `buggy.py` Python file that has a couple of problems with it. Add this function to your project and try using it in the `exposure-by-region` model. Look at the Python error that the `riskscape model run` command gives you and try to fix it in the `buggy.py` file. Re-run the command until the model runs successfully. - Try adding some debug to ``buggy.py`` Python function. Add the statements below to the Python code and then run the function in the `exposure-by-region` model again. Make sure you use the building centroid CSV as the exposure-layer, i.e. `-p "input-exposures.layer=building_centroids_csv"`. ```python if building['ID'] == '1000' or building['ID'] == '7000': print("ID: {} Cons_Frame: {} Use_Cat: {} hazard: {}".format(building['ID'], building['Cons_Frame'], building['Use_Cat'], hazard)) ``` .