Multi-file hazard data

Some probabilistic hazard datasets are organized by event, typically with each event stored in its own file. For example, probabilistic flood data may be represented by a series of raster files.

Probabilistic data organized in this way means the RiskScape pipeline processes the hazard data sequentially, event by event.

For example, it is more efficient for RiskScape to process a set of flood GeoTIFFs representing 100 different flooding scenarios one file at a time. Trying to open 100 large GeoTIFFs all at once is going to go very slowly and require a lot of the same data to be read from disk over and over again.

Multiple GeoTIFFs example

If you have a set of GeoTIFFs, each one representing a different event, then you can process them into an event loss table with the following worked example.

To start, we need to create a CSV that enumerates the files and supplies any extra metadata that belongs to each file (e.g. event id, probability metadata such as an ARI or exceedance probability). The most important bit, though, is the hazard_file path to the GeoTIFF file, which is relative to the CSV file.

eventid,hazard_file,probability_metadata
1,maps/flood_01.tif,0.001
2,maps/flood_02.tif,0.045
3,maps/flood_03.tif,0.012
...

We can add this to our RiskScape project.ini as a relation dataset, which will allow us to process each event map one by one.

[bookmark flood_maps]
location = flood_maps.csv
# make sure the ID is a number
set-attribute.id = int(id)
# this bit turns the location in to a coverage which we can spatially query
set-attribute.coverage = bookmark(id: hazard_file, options: {}, type: 'coverage(floating)')

With this bookmark, we can now build an event loss table for all the hazard maps in the CSV as though it is a single probabilistic dataset.

You will also need some sort of loss function present in your project.ini file. This example will assume the function is called loss_function, e.g.

[function loss_function]
location = my-project/loss_function.py
argument-types = [building: anything, gmv: floating]
return-type = floating

Then, assuming we also have a buildings bookmark configured for our exposure-layer data, we can use the following pipeline to generate the event loss table.

input('flood_maps', name: 'event') as event_input
  -> join.lhs
input('buildings', name: 'building') as exposure_input
  -> join.rhs

# this combines each building with each map
join(on: true)
  ->
# each row is now a building and a flood map, we can sample the flood depth
# (`hazard_intensity`) for each building
select({
    *,
    hazard_intensity: sample_centroid(
      geometry: building,
      coverage: event.coverage
    )
})
  ->
# we can now compute a loss value for each building
select({
    event: event,
    hazard_intensity: hazard_intensity,
    exposure: exposure.id,
    loss: loss_function(building, hazard_intensity)
})
  ->
# total the exposures by event - this is our event loss table
group(
  select: {
    event,
    sum(loss) as total_loss
  },
  by: event
)
  ->
save('event-loss')