.. _coverage_ref:

# Coverage data

A [coverage](https://en.wikipedia.org/wiki/Coverage_data) is essentially a spatial lookup table.
Coverages are usually grid-based GeoTIFF or ASC _raster_ files.

Coverage data is handled by RiskScape slightly differently to [relational]((https://en.wikipedia.org/wiki/Relation_(database%29)
data. A relation holds rows or records of data, such as CSV or shapefile _vector_ data.

Coverages are typically used in *spatial sampling*, which is geospatially matching data
in your exposure-layer to the coverage layer. For example, RiskScape can take the geometry
of a building footprint from your exposure-layer and use it as a lookup into the
hazard-layer coverage. The result it returns is the hazard intensity measure (if any)
for that particular building.

.. tip::
    Normally coverages are used for hazard-layers, but you can also use coverage files
    (i.e. ``.tif`` or  ``.asc`` files) as the exposure-layer in a wizard or pipeline model.
    This can be useful if your exposure-layer is a population density map, or similar data.
    Each cell in the coverage will be treated as a polygon square input to your model.

## Bookmarks

Setting up a bookmark for a coverage is pretty simple.
For example, you could add something like the following to your `project.ini` file:

```ini
[bookmark MY_COOL_NAME]
description = Optionally specify additional details about the data here...
location = MY/COOL/DATA.tif
```

You can generally use paths to ``.tif`` and ``.asc`` files directly, without necessarily
needing to configure a bookmark.

.. tip::
    If you have lots of similar coverage files that you want to run through the same model,
    and get a separate set of results output for each coverage,
    then this is simple to do in RiskScape. Refer to :ref:`model_batch` for more details.

.. _coverage_map_value:

### Transform the sampled value

You can apply your own custom transformation to the data returned by RiskScape's spatial sampling.
This can be handy if your coverage data doesn't match what is expected by the model, for example,
if one file's data is in units of gravity (`g`) and another file is in log units (`log(g)`).

In your coverage bookmark, you can specify a simple *mapping* expression that will modify any
values sampled from it. This has the benefit of better model reuse,
i.e. you don't have to create a separate model just because the input data is in a slightly different format.

The following bookmark takes a GeoTIFF file in `g` units and
converts the data into `log(g)` when it gets used in a model.
The `value` in the expression is the value that was sampled from the GeoTIFF.

```ini
[bookmark hazard-data-in-log-units]
location = DATA_IN_G_UNITS.tif
map-value = log(value)
```

Alternatively, you can use a [lambda](https://en.wikipedia.org/wiki/Anonymous_function) expression
in the bookmark, which makes data value's identifier clearer and customizable.
The following example is equivalent to the previous bookmark, except it uses a lambda expression.

```ini
[bookmark the-same-thing-with-lambda]
location = DATA_IN_G_UNITS.tif
map-value = g -> log(g)
```

In the above example, the lambda argument is called `g`, but you can call this whatever you want.

### Sampling relational data

You can also turn *relational* data (e.g. shapefile input data)
into a coverage that can be used for spatial sampling. This can be useful for matching elements-at-risk
to the regional area they are located in.

.. tip::
    If you are using the wizard to build a model, then you do not have to worry too
    much about whether your input data is relational or in coverage form. RiskScape will take care of it all for you.

It can sometimes be useful to be able to use raster data (i.e. GeoTIFFs) or vector data (i.e. shapefiles) interchangeably
as input data for your model.
The simplest way to do this is to specify that the relational data should be *rasterized*
when you create your bookmark. For example:

```ini
[bookmark relation-as-coverage]
location = MY/COOL/DATA.shp
rasterize = true
rasterize-grid-size = 50
rasterize-expression = MY_ATTRIBUTE
```

When relational data is rasterized, you need to specify:

1. The *grid-size* that the coverage should have, in metres.
  The above example uses a 50m by 50m grid.
2. An *expression* for the numeric value to return when the coverage is sampled.
  Basically, this is the attribute in the shapefile that you are most interested in.
  It could also be a combination of attributes, e.g. `Depth * Velocity`.

Advanced RiskScape users can also turn relational data into a coverage directly in RiskScape pipeline code,
without using a bookmark. See :ref:`sampling_relations` for an example.

.. note::
    If there is any overlapping geometry in the input data, then spatial sampling will just
    arbitrarily pick one of the matching geometries. We recommend ensuring there is *not*
    overlapping geometry in your input data.

.. _nn_coverage:

## Nearest-neighbour coverage

A GeoTIFF stores varying hazard intensities across a geospatial grid.
Other file formats, in particular :ref:`NetCDF <netcdf>` and HDF5, can represent similar hazard data through a
mesh of geospatial _points_.

For example, you may have PGA shaking intensities or temperature readings across a series of points, or 'sites'.
To determine the hazard intensity for a given element-at-risk, you simply need to find the site that is closest to it.

In this case, you will want to use a _nearest neighbour_ coverage in your model.
Normally RiskScape's spatial sampling will look for _intersecting_ geometry,
whereas a nearest neighbour coverage lets us match the _closest_ point,
even if it doesn't intersect directly with our exposure-layer geometry.

.. note::
    Currently nearest neighbour coverages are a feature that is only available in pipelines,
    and so they are only suitable for advanced users.

You will usually want to specify a cut-off distance in metres for your nearest neighbour coverage.
Otherwise, the coverage will _always_ find the closest match, even if it is thousands of miles away.

Determining a suitable cut-off distance is a trade-off between accuracy and performance.
If the cut-off is too small, then sampling the coverage might not find any matching data.
If the cut-off is too large, then your model may take longer to run, as there will be more potential matches to narrow down.

To build a nearest neighbour coverage, you can specify additional `options` in the `to_coverage()` function.
For example:

```none
  to_coverage(bookmark('YOUR_POINT_BASED_HAZARD'),
              options: { index: 'nearest_neighbour', nearest_neighbour_max_distance: $cutoff_distance }
             ) as nn_coverage
```

.. note::
    Currently interpolation or attenuation of the hazard intensity measure is not supported,
    i.e. RiskScape will not find the three closest points and then take the average reading.

.