# Miscellaneous functions ## `bookmark` Arguments: `[id: Text, options: Anything, type: Text]` Returns: `Anything` Lookup data from a project by its bookmark id, with an optional second `options` argument that allows you to modify the options set in the original bookmark. The first argument to bookmark does not have to be an existing bookmark id, it could be a file path or URI. The `options` argument can be used to override/set the options in the bookmark. Example: `bookmark('buildings', {location: 'data/all_buildings.csv', format: 'csv', add_line_numbers: true})` - looks up a bookmark with the id 'buildings', changes the location, sets the format to `csv`, and enables line numbers. The `type` argument is used to set the type that is expected to be returned. This is required when either the `id` or `options` are not constants. Example: `bookmark(xyz, options: {}, type: 'coverage(floating)')` where `zyz` is a property expected to contain the path to a grid coverage. Note that if the returned bookmark does not contain the expected data type then your pipeline will fail mid execution. ## `relation_to_coverage` Arguments: `[values: Relation[{}], options: Anything]` Returns: `Coverage[Anything]` DEPRECATED please use `to_coverage` instead. Makes a relation into a coverage (Only possible for relations that include a geometry attribute). Converts a relation in to a coverage, allowing it to be sampled using one of the sampling functions (e.g. `sample`, `sample_centroid` or `sample_one`). The coverage is constructed by adding all of the relation's tuples in to a spatial index (The relation must have a single geometry member) which is queried for matching tuples when sampled. There are two types of index that can be constructed, 'intersection' and 'nearest_neighbour, which can be chosen by specifying the `index` option, e.g. `relation_to_coverage(relation, options: {index: 'intersection'})`. The default `index` is `'intersection'`. The `intersection` index is most appropriate for indexing polygonal features. The sampling operation will query the index for any intersecting features, and return those that match according to the semantics of the sampling operation that was used. The `intersection` index is the default index type. The `nearest_neighbour` index is best when indexing point features. A sampling operation will query the index for the nearest feature's point (with a max distance cut off). Only `sample_centroid` is currently supported for this type of index. The distance cutoff must be supplied as the `nearest_neighbour_max_distance` option. E.g to use a nearest neighbour index with a cutoff of one kilometre use `relation_to_coverage(relation, options: {index: 'nearest_neighbour', nearest_neighbour_max_distance: 1000})` ## `lookup` Arguments: `[lookuptable: LookupTable(key=Anything, value=Anything), key: Anything]` Returns: `Anything` Lookup data from a lookup table by key, returning null if no value matched. ## `list_to_columns` Arguments: `[list: Nullable[List[Nullable[Anything]]], columns: {names=>List[Text], prefix=>Text, number=>Integer, start_numbering=>Integer}]` Returns: `Anything` Turns a list of items into column-wise data (i.e. a struct type). This can be helpful when saving a list to file. For example, a list containing 3 items could be turned into 3 separate columns in the output CSV file. Each item in the given list becomes a "column" (i.e. an attribute or struct member) in the return value. The column names can either be explicitly specified or auto-generated based on the item's order in the list. An example of explicitly specifying the column names would be: `list_to_columns([1, 2], columns: { names: ['one', 'two']})` which would return `{one=1, two=2}` Alternatively, the column names can be based on the item's position in the list, e.g. the first list item gets named column '1'. In this case, you need to specify the total number of columns (i.e. items in the list). E.g `list_to_columns([3, 4], columns: { number: 2})` will return `{1=3, 2=4}`. Note that if the list contains more items than the specified 'number' or 'names', then the extraneous list items will be discarded. For convenience, a prefix can also be given to the columns names. E.g `list_to_columns([5, 6], columns: { number: 2, prefix: 'value'})` will return `{value1=5, value2=6}`. Finally, you can also change how the numbering starts. For example, if you prefer zero-based numbering, then `list_to_columns([7, 8], columns: { number: 2, start_numbering: 0 })` will return `{0=7, 1=8}`. ## `to_lookup_table` Arguments: `[list: List[Anything], keyExpression: λ(listElement), valueExpression: λ(listElement), options: {unique=>Bool}]` Returns: `LookupTable(key=Anything, value=Anything)` Aggregate function that collect tuples and produces a lookup table. A lookup table allows values to be looked up using the `lookup` function in an expression, which can be a simple way of accessing tabular data in an expression. The key and value `args` are used to derive keys and values from each tuple. An optional third `struct` argument can be given, which allows a `unique` option to be specified. When `unique` is `false` (the default), the lookup table assumes that multiple values can be mapped to each key, and so it keeps a list of values for each key. If `unique` is `true`, then only one value can exist for each key, and evaluation will fail if more than one value is seen for each key. Example: `input('events') -> group(to_lookup_table(key: event_id, value: *, options: {unique: true}))` - this creates a lookup table of events keyed by the event's id. ## `bucket` Arguments: `[select: Anything, pick: λ(aggregation_value), buckets: {}]` Returns: `Nullable[Anything]` Aggregate function that assigns a row of data to one or more categories, or 'buckets', and aggregates each bucket individually. This is conceptually similar to a `SUMIF` spreadsheet operation, but is more powerful and flexible as any aggregate operation can be used, and multiple categories can be aggregated at once. See `bucket_range()`for a slightly simpler function that aggregates based on what range a value falls into. The bucket() function requires three arguments: the `buckets` or categories you are interested in, a `pick` lambda expression which determines which bucket(s) apply for any given row, and a `select` aggregate expression that defines how to aggregate the data. For example, say you are aggregating an event loss table by region, and you want to see the regional damage summarized by building type. Such an expression might look like this, where each bucket represents a building type: bucket(pick: bucket -> building.type = bucket, select: {sum(loss) as total_loss, count(*)}, buckets: {concrete: 'concrete', steel: 'steel', brick: 'brick'}) The result returned is a struct consisting of the `select` aggregate expression applied to each bucket separately. E.g. `{concrete.total_loss, concrete.count, steel.total_loss, steel.count, brick.total_loss, brick.count}`. Note that the name of each bucket is used in the result struct. The `pick` argument is used to compare rows against all of the buckets. If a bucket applies to the row, then the expression must return `true`. If many buckets apply, then each matching bucket will accumulate rows. If the row does not match any bucket, then that data is simply ignored. The `select` argument gives the aggregate expression to apply to rows in each of the buckets, for example `sum(loss)` or `percentile(loss, 50)`. These expressions can be complex, e.g. `{'$' + str(sum(loss)) as "Total Losses (NZD)"}` The `buckets` argument is used to define the set of buckets used to group rows in the aggregation. A bucket can be simple like a numeric value or a bit of text, or can be a more complex set of criteria. Note that every bucket must be of the exact same type, including nullability. ## `bucket_range` Arguments: `[pick: Anything, select: Anything, range: List[Anything], options: Nullable[Anything]]` Returns: `Nullable[Anything]` Aggregate function that assigns each row of data to a 'bucket' based on the numeric range a given value falls into, and then aggregates each bucket separately. This works the same as the `bucket()` function, but is a bit simpler to use when dealing with numeric ranges. The bucket_range() function requires three arguments: the numeric `range` you are interested in, the attribute in each row to `pick` (i.e. what to match against the ranges), and a `select` aggregate expression that defines how to aggregate the data. For example, an event loss table could by bucketed by loss range and have a count and sum of losses between predefined levels. Such an expression might look like this: `bucket_range(pick: loss, select: {sum(loss) as total_loss, count(*)}, range: [1, 5000, 100000])` The result returned is a struct consisting of the `select` aggregate expression applied to each bucket separately. E.g. `{range_<_1.total_loss, range_<_1.count, range_1_5000.total_loss, range_1_5000.count, range_5000_100000.total_loss, range_5000_100000.count, range_100000_+.total_loss, range_100000_+.count}`. The `select` argument gives the aggregate expression to apply to rows in each of the buckets, for example `sum(loss)` or `percentile(loss, 50)`. These expressions can be complex, e.g. `{'$' + str(sum(loss)) as "Total Losses (NZD)"}` The `pick` argument specifies the value in each row to compare against the `range` buckets. A match is found if `pick` is `>=` the bucket's lower-bound and `<` its upper-bound, e.g. the comparison for bucket `range_1_10` would be `pick >= 1 and pick < 10` The `range` argument is a numeric list, where each list element defines a boundary for a bucket. For example, the list `[1, 2]` would yield the following buckets: `range_<_1`, `range_1_2` and `range_2_+`. The `range_<_1` and `range_2_+` bounds get added automatically to ensure that any values that might fall outside the specified range do not get ignored. To avoid this default behaviour and deliberately ignore values, you can pass in a 4th argument `options: { add_bounds: false })`