Miscellaneous functions
bookmark
Arguments: [id: Text, options: Anything, type: Text]
Returns: Anything
Lookup data from a project by its bookmark id, with an optional second options
argument that allows you to modify the options set in the original bookmark.
The first argument to bookmark does not have to be an existing bookmark id, it could be a file path or URI.
The options
argument can be used to override/set the options in the bookmark. Example: bookmark('buildings', {location: 'data/all_buildings.csv', format: 'csv', add_line_numbers: true})
- looks up a bookmark with the id ‘buildings’, changes the location, sets the format to csv
, and enables line numbers.
The type
argument is used to set the type that is expected to be returned. This is required when either the id
or options
are not constants. Example: bookmark(xyz, options: {}, type: 'coverage(floating)')
where zyz
is a property expected to contain the path to a grid coverage. Note that if the returned bookmark does not contain the expected data type then your pipeline will fail mid execution.
relation_to_coverage
Arguments: [values: Relation[{}], options: Anything]
Returns: Coverage[Anything]
DEPRECATED please use to_coverage
instead. Makes a relation into a coverage (Only possible for relations that include a geometry attribute).
Converts a relation in to a coverage, allowing it to be sampled using one of the sampling functions (e.g. sample
, sample_centroid
or sample_one
). The coverage is constructed by adding all of the relation’s tuples in to a spatial index (The relation must have a single geometry member) which is queried for matching tuples when sampled.
There are two types of index that can be constructed, ‘intersection’ and ‘nearest_neighbour, which can be chosen by specifying the index
option, e.g. relation_to_coverage(relation, options: {index: 'intersection'})
. The default index
is 'intersection'
.
The intersection
index is most appropriate for indexing polygonal features. The sampling operation will query the index for any intersecting features, and return those that match according to the semantics of the sampling operation that was used. The intersection
index is the default index type.
The nearest_neighbour
index is best when indexing point features. A sampling operation will query the index for the nearest feature’s point (with a max distance cut off). Only sample_centroid
is currently supported for this type of index. The distance cutoff must be supplied as the nearest_neighbour_max_distance
option. E.g to use a nearest neighbour index with a cutoff of one kilometre use relation_to_coverage(relation, options: {index: 'nearest_neighbour', nearest_neighbour_max_distance: 1000})
lookup
Arguments: [lookuptable: LookupTable(key=Anything, value=Anything), key: Anything]
Returns: Anything
Lookup data from a lookup table by key, returning null if no value matched.
list_to_columns
Arguments: [list: Nullable[List[Nullable[Anything]]], columns: {names=>List[Text], prefix=>Text, number=>Integer, start_numbering=>Integer}]
Returns: Anything
Turns a list of items into column-wise data (i.e. a struct type). This can be helpful when saving a list to file. For example, a list containing 3 items could be turned into 3 separate columns in the output CSV file.
Finally, you can also change how the numbering starts. For example, if you prefer zero-based numbering, then list_to_columns([7, 8], columns: { number: 2, start_numbering: 0 })
will return {0=7, 1=8}
.
An example of explicitly specifying the column names would be: list_to_columns([1, 2], columns: { names: ['one', 'two']})
which would return {one=1, two=2}
Alternatively, the column names can be based on the item’s position in the list, e.g. the first list item gets named column ‘1’. In this case, you need to specify the total number of columns (i.e. items in the list). E.g list_to_columns([3, 4], columns: { number: 2})
will return {1=3, 2=4}
.
Note that if the list contains more items than the specified ‘number’ or ‘names’, then the extraneous list items will be discarded.
For convenience, a prefix can also be given to the columns names. E.g list_to_columns([5, 6], columns: { number: 2, prefix: 'value'})
will return {value1=5, value2=6}
.
to_lookup_table
Arguments: [list: List[Anything], keyExpression: λ(listElement), valueExpression: λ(listElement), options: {unique=>Bool}]
Returns: LookupTable(key=Anything, value=Anything)
Aggregate function that collect tuples and produces a lookup table. A lookup table allows values to be looked up using the lookup
function in an expression, which can be a simple way of accessing tabular data in an expression.
The key and value args
are used to derive keys and values from each tuple.
An optional third struct
argument can be given, which allows a unique
option to be specified. When unique
is false
(the default), the lookup table assumes that multiple values can be mapped to each key, and so it keeps a list of values for each key. If unique
is true
, then only one value can exist for each key, and evaluation will fail if more than one value is seen for each key.
Example: input('events') -> group(to_lookup_table(key: event_id, value: *, options: {unique: true}))
- this creates a lookup table of events keyed by the event’s id.
bucket
Arguments: [select: Anything, pick: λ(aggregation_value), buckets: {}]
Returns: Nullable[Anything]
Aggregate function that assigns a row of data to one or more categories, or ‘buckets’, and aggregates each bucket individually. This is conceptually similar to a SUMIF
spreadsheet operation, but is more powerful and flexible as any aggregate operation can be used, and multiple categories can be aggregated at once. See bucket_range()
for a slightly simpler function that aggregates based on what range a value falls into.
The bucket() function requires three arguments: the buckets
or categories you are interested in, a pick
lambda expression which determines which bucket(s) apply for any given row, and a select
aggregate expression that defines how to aggregate the data.
For example, say you are aggregating an event loss table by region, and you want to see the regional damage summarized by building type. Such an expression might look like this, where each bucket represents a building type: bucket(pick: bucket -> building.type = bucket, select: {sum(loss) as total_loss, count(*)}, buckets: {concrete: ‘concrete’, steel: ‘steel’, brick: ‘brick’})
The result returned is a struct consisting of the select
aggregate expression applied to each bucket separately. E.g. {concrete.total_loss, concrete.count, steel.total_loss, steel.count, brick.total_loss, brick.count}
. Note that the name of each bucket is used in the result struct.
The pick
argument is used to compare rows against all of the buckets. If a bucket applies to the row, then the expression must return true
. If many buckets apply, then each matching bucket will accumulate rows. If the row does not match any bucket, then that data is simply ignored.
The select
argument gives the aggregate expression to apply to rows in each of the buckets, for example sum(loss)
or percentile(loss, 50)
. These expressions can be complex, e.g. {'$' + str(sum(loss)) as "Total Losses (NZD)"}
The buckets
argument is used to define the set of buckets used to group rows in the aggregation. A bucket can be simple like a numeric value or a bit of text, or can be a more complex set of criteria. Note that every bucket must be of the exact same type, including nullability.
bucket_range
Arguments: [pick: Anything, select: Anything, range: List[Anything], options: Nullable[Anything]]
Returns: Nullable[Anything]
Aggregate function that assigns each row of data to a ‘bucket’ based on the numeric range a given value falls into, and then aggregates each bucket separately. This works the same as the bucket()
function, but is a bit simpler to use when dealing with numeric ranges.
The bucket_range() function requires three arguments: the numeric range
you are interested in, the attribute in each row to pick
(i.e. what to match against the ranges), and a select
aggregate expression that defines how to aggregate the data.
For example, an event loss table could by bucketed by loss range and have a count and sum of losses between predefined levels. Such an expression might look like this: bucket_range(pick: loss, select: {sum(loss) as total_loss, count(*)}, range: [1, 5000, 100000])
The result returned is a struct consisting of the select
aggregate expression applied to each bucket separately. E.g. {range_<_1.total_loss, range_<_1.count, range_1_5000.total_loss, range_1_5000.count, range_5000_100000.total_loss, range_5000_100000.count, range_100000_+.total_loss, range_100000_+.count}
.
The select
argument gives the aggregate expression to apply to rows in each of the buckets, for example sum(loss)
or percentile(loss, 50)
. These expressions can be complex, e.g. {'$' + str(sum(loss)) as "Total Losses (NZD)"}
The pick
argument specifies the value in each row to compare against the range
buckets. A match is found if pick
is >=
the bucket’s lower-bound and <
its upper-bound, e.g. the comparison for bucket range_1_10
would be pick >= 1 and pick < 10
The range
argument is a numeric list, where each list element defines a boundary for a bucket. For example, the list [1, 2]
would yield the following buckets: range_<_1
, range_1_2
and range_2_+
. The range_<_1
and range_2_+
bounds get added automatically to ensure that any values that might fall outside the specified range do not get ignored. To avoid this default behaviour and deliberately ignore values, you can pass in a 4th argument options: { add_bounds: false })