.. _probabilistic_multi_file_hazard: # Multi-file hazard data Some probabilistic hazard datasets are organized by event, typically with each event stored in its own file. For example, probabilistic flood data may be represented by a series of raster files. Probabilistic data organized in this way means the RiskScape pipeline processes the hazard data sequentially, event by event. For example, it is more efficient for RiskScape to process a set of flood GeoTIFFs representing 100 different flooding scenarios one file at a time. Trying to open 100 large GeoTIFFs all at once is going to go very slowly and require a lot of the same data to be read from disk over and over again. ## Multiple GeoTIFFs example If you have a set of GeoTIFFs, each one representing a different event, then you can process them into an event loss table with the following worked example. To start, we need to create a CSV that enumerates the files and supplies any extra metadata that belongs to each file (e.g. event id, probability metadata such as an ARI or exceedance probability). The most important bit, though, is the `hazard_file` path to the GeoTIFF file, which is relative to the CSV file. ```none eventid,hazard_file,probability_metadata 1,maps/flood_01.tif,0.001 2,maps/flood_02.tif,0.045 3,maps/flood_03.tif,0.012 ... ``` We can add this to our RiskScape `project.ini` as a relation dataset, which will allow us to process each event map one by one. ```ini [bookmark flood_maps] location = flood_maps.csv # make sure the ID is a number set-attribute.id = int(id) # this bit turns the location in to a coverage which we can spatially query set-attribute.coverage = bookmark(id: hazard_file, options: {}, type: 'coverage(floating)') ``` With this bookmark, we can now build an event loss table for all the hazard maps in the CSV as though it is a single probabilistic dataset. You will also need some sort of loss function present in your `project.ini` file. This example will assume the function is called `loss_function`, e.g. ```ini [function loss_function] location = my-project/loss_function.py argument-types = [building: anything, gmv: floating] return-type = floating ``` Then, assuming we also have a `buildings` bookmark configured for our exposure-layer data, we can use the following pipeline to generate the event loss table. ```none input('flood_maps', name: 'event') as event_input -> join.lhs input('buildings', name: 'building') as exposure_input -> join.rhs # this combines each building with each map join(on: true) -> # each row is now a building and a flood map, we can sample the flood depth # (`hazard_intensity`) for each building select({ *, hazard_intensity: sample_centroid( geometry: building, coverage: event.coverage ) }) -> # we can now compute a loss value for each building select({ event: event, hazard_intensity: hazard_intensity, exposure: exposure.id, loss: loss_function(building, hazard_intensity) }) -> # total the exposures by event - this is our event loss table group( select: { event, sum(loss) as total_loss }, by: event ) -> save('event-loss') ```