.. _probabilistic_multi_file_hazard:

# Multi-file hazard data

Some probabilistic hazard datasets are organized by event, typically with each event stored in its own file.
For example, probabilistic flood data may be represented by a series of raster files.

Probabilistic data organized in this way means the RiskScape pipeline processes the hazard data sequentially,
event by event.

For example, it is more efficient for RiskScape to process a set of flood GeoTIFFs representing
100 different flooding scenarios one file at a time.  Trying to open 100 large GeoTIFFs all at once is
going to go very slowly and require a lot of the same data to be read from disk over and over again.

## Multiple GeoTIFFs example

If you have a set of GeoTIFFs, each one representing a different event, then you can process them
into an event loss table with the following worked example.

To start, we need to create a CSV that enumerates the files and supplies any extra metadata that
belongs to each file (e.g. event id, probability metadata such as an ARI or exceedance probability).  The most important
bit, though, is the `hazard_file` path to the GeoTIFF file, which is relative to the CSV file.

```none
eventid,hazard_file,probability_metadata
1,maps/flood_01.tif,0.001
2,maps/flood_02.tif,0.045
3,maps/flood_03.tif,0.012
...
```

We can add this to our RiskScape `project.ini` as a relation dataset, which will allow us to process each event map
one by one.

```ini
[bookmark flood_maps]
location = flood_maps.csv
# make sure the ID is a number
set-attribute.id = int(id)
# this bit turns the location in to a coverage which we can spatially query
set-attribute.coverage = bookmark(id: hazard_file, options: {}, type: 'coverage(floating)')
```

With this bookmark, we can now build an event loss table for all the hazard maps in the CSV as though it is
a single probabilistic dataset.

You will also need some sort of loss function present in your `project.ini` file.
This example will assume the function is called `loss_function`, e.g.

```ini
[function loss_function]
location = my-project/loss_function.py
argument-types = [building: anything, gmv: floating]
return-type = floating
```

Then, assuming we also have a `buildings` bookmark configured for our exposure-layer data, we can use the following pipeline
to generate the event loss table.

```none
input('flood_maps', name: 'event') as event_input
  -> join.lhs
input('buildings', name: 'building') as exposure_input
  -> join.rhs

# this combines each building with each map
join(on: true)
  ->
# each row is now a building and a flood map, we can sample the flood depth
# (`hazard_intensity`) for each building
select({
    *,
    hazard_intensity: sample_centroid(
      geometry: building,
      coverage: event.coverage
    )
})
  ->
# we can now compute a loss value for each building
select({
    event: event,
    hazard_intensity: hazard_intensity,
    exposure: exposure.id,
    loss: loss_function(building, hazard_intensity)
})
  ->
# total the exposures by event - this is our event loss table
group(
  select: {
    event,
    sum(loss) as total_loss
  },
  by: event
)
  ->
save('event-loss')
```