Geometry

This page describes how RiskScape handles geometries and the coordinate reference systems (CRS) that accompany them.

Coordinate Reference System

The coordinate reference system (CRS) defines what the geometry coordinates represent and where on the globe they are.

For example, a very common CRS is WGS 84 (World Geodetic System), where each coordinate is a degree of latitude or longitude, and so a single coordinate unit can span more than 100km. Whereas New Zealand Transverse Mercator 2000 (NZTM) is a Mercator projection where each coordinate is in one-metre units.

Reprojection

Changing the CRS that geometry is in is called reprojection. Sometimes, in order to process geometry operations, RiskScape will need to automatically reproject your input file’s geometry into a different CRS.

But there are some known issues with reprojection that can lead to bad results. For example:

Geometry may become invalid after reprojection.
Reprojecting geometry that spans the dateline can wrap the wrong way around the globe.

Reprojection will also increase the time it takes for a model pipeline to run.

Note

Not all RiskScape functions will reproject geometry automatically. Logical or predicate geometry functions, such as contains(), will not reproject but will produce an error if the geometries are in different CRSs. Refer to riskscape function list -c geometry_logical for a full list of these functions.

Cases where RiskScape reprojects

When running a model, RiskScape will automatically reproject your input geometry in the following cases:

Spatial sampling operations. When RiskScape geospatially matches an element-at-risk to another input layer (e.g. the hazard-layer), reprojection will be needed when the input layers are in different CRSs.
Segmenting or measuring operations. When cutting the geometry into smaller pieces, or measuring its length or area, RiskScape will always work in metre units and so the input geometry will need to be in a metric CRS (i.e. a Transverse Mercator-based CRS). Input geometry in another CRS, such as WGS 84, will need to be reprojected in order to cut or measure it. After the segment or measure operation, the geometry data will always end up back in its original CRS.

These reprojection operations can cause geometry to become invalid.

Tip

If possible, try to ensure all of the input files use the same CRS. This will speed up running models and reduce the likelihood of errors, as RiskScape will not need to reproject all your input data. If segmenting or measuring is required in your model, try to ensure that input files use a Transverse Mercator-based CRS, such as a Universal Transverse Mercator CRS (this may not be possible if the input data spans a large geographic area).

Axis/Ordinate Order

One common source of confusion when working with GIS data is disagreement over the axis order for a given CRS. Geometry co-ordinates can be defined in one of two formats:

latitude, longitude (or Y, X order or northing, easting).
longitude, latitude (or X, Y order or easting, northing).

The EPSG definitions generally (but not always) use the first lat, long approach, and can be found on https://epsg.io. For example, EPSG:2193 (NZTM) is defined with a northing, easting axis order.

However, many GIS software applications use the alternative long, lat approach. For example, when based on the OGR/GDAL specification, the same EPSG:2193 CRS is defined with the opposite easting, northing axis order.

The actual axis order that your source data is in will depend on what GIS software generated it. It can also depend on when the file was generated, as sometimes different versions of the same software can behave differently.

Note

Shapefile data is always in the long, lat order.

RiskScape uses the EPSG lat, long approach by default. More specifically, RiskScape bases its geometry processing on the GeoTools library, which use the EPSG lat, long order. GeoTools describes the axis-ordering problem in more detail here.

Projection files

Normally when geographic data is saved, there is a projection (.prj) file associated with it. This .prj file describes the CRS for the data in WKT (Well-Known Text format). The .prj file will usually (although not always) define the axis order that the data is in.

This means that when your bookmarked data source has a .prj file associated with it (i.e. almost all shapefiles), you usually won’t have to worry about specifying the CRS and axis-order manually.

One exception is that some .prj files are in a format that RiskScape does not support. These files may have been generated by an older version of ArcGIS, or may be based on a .prj.adf file. RiskScape will clearly warn you if the .prj file is unsupported.

When your .prj file is unsupported, you can either:

Try re-saving the data file, either in a newer version of the same software (e.g. ArcGIS) or in an alternative GIS application (e.g. QGIS).
Remove the .prj file and manually specify the CRS name and axis-order as part of the RiskScape bookmark.

Note

Some spatial data files, such as GeoTIFFs, do not have a .prj file but still have the CRS information ‘baked in’ to the file where RiskScape can access it easily.

Manually specifying the CRS

You can manually specify a data source’s CRS when defining a RiskScape bookmark, by providing a crs-name setting. This can be useful when dealing with geographic data in a CSV file, or if the .prj file is unsupported.

When setting the CRS manually, you need to know what co-ordinate order the source data is in - either lat,long or long,lat. If the first value in the coordinate pair is the longitude (i.e. the X axis), then you should also set crs-longitude-first = true for the bookmark.

Checking what geometry RiskScape will use

You can check the CRS details that RiskScape will use for a data source by using the riskscape bookmark info BOOKMARK_ID command.

These commands display the CRS in WKT that RiskScape will use, as well as the axis-order (long,lat or lat,long) that the coordinate data will be read in. The --measure means RiskScape will read through all the source data to build an overall envelope that encompasses the geographic data.

Invalid geometry

When an input data layer contains complex geometric shapes, sometimes one of these shapes may have invalid geometry.

When geometry is invalid and is not corrected, it can cause other geometry operations to fail. This means that RiskScape may produce a stack-trace containing a TopologyException (explained in more detail here) when running your model.

What makes geometry invalid

There are many potential causes of invalid geometry. Technically any type of geometry will be invalid if any of it’s coordinates are not valid. So POINT (10 NaN) would be invalid because the coordinate contains NaN (not a number).

But most of the time invalid geometry is more likely to affect polygon geometry types. Some of the rules for a valid polygon are:

Polygon rings must close.
Rings that define holes should be inside rings that define exterior boundaries.
Rings may not self-intersect (they may neither touch nor cross themselves).
Rings may not touch other rings, except at a point.
Elements of multi-polygons may not touch each other.

What causes invalid geometries

Invalid geometries could exist in input files. Possibly those files have been created by software that has not followed the rules when creating the geometry.

But more often invalid geometry is caused by reprojecting geometries to a different CRS. This can happen because points within the original geometry may shift in relation to other points. For example, two polygon lines that were very close together may unexpectedly cross after reprojection.

Note

Refer to Cases where RiskScape reprojects for more details on when RiskScape will automatically reproject geometry, as well as tips on how to avoid unnecessary reprojection.

What does RiskScape do with invalid geometries

RiskScape has options for the detection and fixing of invalid geometries when data is read from a bookmark and when geometries are reprojected.

These options are controlled by the project file ‘validate-geometry’ setting. The project setting is the default for all bookmarks, but any individual bookmark can specify its own setting.

When the validate-geometry setting is either WARN or ERROR then RiskScape will detect invalid geometries and attempt to fix them. (See How RiskScape fixes invalid geometries).

If a fix is possible, RiskScape will automatically correct the invalid geometry and output a warning that a fix has been made.

If a fix is not possible then RiskScape will produce either a warning (WARN) or an error (ERROR), depending on the validate-geometry setting.

Tip

Validating geometry requires extra processing that will mean pipeline models take longer to run. If you know that the input files only contain valid geometries, and these do not become invalid due to reprojection, then turning geometry validation off could speed up your models.

Geometry validation can also be turned off on a per-bookmark basis.

Fixing geometry without RiskScape

Most GIS applications have tools for fixing invalid geometries. For example:

QGIS
ARCGIS

Fixing invalid geometries in the source files is always the preferred option. Using specialized GIS software should also allow for better verification of how the fixes have been applied.

In the case that geometries have become invalid following re-projection, you could use your GIS software to do the required re-projection, then fix any invalid geometries.

Tip

By default, the riskscape bookmark evaluate command will generally produce a new shapefile with any invalid geometry fixed. This can be a quick alternative way to fix the geometry in a layer, although specialized GIS software may still do a better job.

How RiskScape fixes invalid geometries

RiskScape fixes invalid geometries using the JTS Geometry Fixer which is described here.

When fixing geometry, RiskScape applies some rules to determine if the fix is suitable. The fix will not be applied if it is:

an empty geometry
a different type of geometry (a polygon is not allowed to become a line or point)
a geometry collection containing different geometry types

Geometry that spans the dateline

When geometries are in a lat/lon projection they may need to span the international dateline. The international dateline is where the longitude changes from 180 to -180 degrees.

Lines and polygons that span the dateline may be handled differently by various GIS software.

Take the following polygon as an example:

POLYGON ((-45 178, -40 178, -40 -178, -45 -178, -45 178))

The coordinates in this polygon are located near New Zealand (-178 longitude) and extend westwards into the Pacific Ocean (178 longitude).

Some GIS software may interpret this geometry as a polygon that is four degrees wide and spans the dateline.

Other software (including RiskScape) will see this as a polygon that is 176 degrees wide that is wrapping the long way around the globe.

To express this polygon accurately in RiskScape it currently needs to be a multi-polygon with a part on either side of the dateline. E.g

POLYGON ((-45 178, -40 178, -40 180, -45 180, -45 178), (-45 -180, -40 -180, -40 -178, -45 -178, -45 -180))

Reprojecting geometry that spans the dateline to lat/lon

Some CRSs have a spatial extent that spans the dateline. NZTM (EPSG:2193) is one example.

In these projections it is possible to have a geometry that physically spans the dateline. An example would be:

LINESTRING (5560253 2026893, 5533255 2368783)

This line approximates to starting at lat/lon -40 178 then crossing the dateline to end at -40 -178.

But when RiskScape re-projects this line from EPSG:2193 to WGS 84 (lat/lon) it becomes a line that wraps the long way around the globe between those two points.

The same thing will occur of polygons that wrap the dateline and this may lead to invalid geometry as the lines (that wrap the globe the wrong way) may cross other lines.

Warning

This is a known issue in RiskScape and we plan to resolve it in a future release.

For now, if geometries that wrap the dateline are used it is best to ensure that they do not need to be re-projected by RiskScape. This is best done by ensuring that all input layers are in the same CRS.

Missing datum shift information

When RiskScape reprojects geometry into a different CRS, it can sometimes output a warning that the reprojection may be inaccurate because of missing datum shift information.

A CRS definition should include a datum, which can be thought of as a model of the earth’s shape. Some datum may be more accurate than others for a specific region.

When reprojecting geometry, if the source and target CRS use different datum, a potential datum shift is required. RiskScape builds a datum-aware transformation using Bursa-Wolf parameters, which are usually included as part of the shapefile’s .prj file, as a TOWSG84 entry.

However, sometimes the .prj file for a shapefile is missing the datum information, in which case RiskScape will display a warning and ignore potential datum shifts when reprojecting.

Usually, the RiskScape datum shift warning can be avoided by adding a crs-name entry to the shapefile’s bookmark. This will cause RiskScape to lookup and use the full CRS definition, which should include the datum shift information.

Tip

If you are unsure what CRS that a given shapefile uses, the CRS will be displayed in the output from the riskscape bookmark info <bookmark-name> command.

.