Python Step
Note
The python step is in the Beta plugin.
As well as supporting python functions within RiskScape expressions, RiskScape also supports processing the entire dataset within a CPython function. This feature supports being able to use libraries like Pandas and Numpy across the whole dataset, rather than row at a time.
Consider the case where you want to use numpy to compute some loss statistics from an event loss table in RiskScape.
# some kind of pandas/numpy concoction
def compute_aal(dataframe):
# figure out aal somehow
return aal
To integrate this code in to your pipeline you could add the following to your pipeline:
event_loss
->
python(
script: 'compute-aal.py',
result-type: 'struct(aal: floating, peril: text)'
)
Then add to your python script:
def function(rows):
# 1. construct a dataframe from all the rows
df = pd.DataFrame(rows)
# 2. call your aal function (from the first example)
aal_eq = compute_aal(df['eq_loss'])
# 3. return a result to riskscape
yield {'aal': aal_eq, 'peril': 'earthquake'}
This example would send all the tuples from the event_loss
step in your pipeline to
the compute-aal.py
script. First, the script then passes all these rows in to a Dataframe,
and second gives that Dataframe to your existing AAL function. Last of all, the function
‘yields’ the result as a dictionary, so that RiskScape can convert it back in to a tuple.
This feature is not limited to returning a single result. The example can be adapted to return multiple rows back to RiskScape:
# 4. call your aal function (from the first example)
aal_flood = compute_aal(df['eq_flood'])
# 5. return a second result to riskscape
yield {'aal': aal_flood, 'peril': 'fluvial_flooding'}
Only once the final yield is called will the script finish.
Generator functions
RiskScape makes use of a feature of the Python language called generator functions to support
whole-dataset processing. Tuples come in to the function using a generator function, and
rows are sent back to RiskScape in the same way. For the most part, you don’t need to know much
about how these work, as long as you remember to return rows back to RiskScape using the yield
keyword instead of return
.
More examples
Batch-processing
This example shows how computation can be batched up, which can be beneficial when using advanced features like GPU offloading.
import itertools
BATCH_SIZE = 100
def function(rows):
# use python stdlib itertools to batch the rows coming in so we
# can operate on them en masse
for batch in itertools.batched(rows, BATCH_SIZE):
df = pd.DataFrame(batch)
# call the function that benefits from running across many rows at once
df = df.reticulate_splines()
# return each result from the dataframe back to RiskScape
for new_row in df:
yield new_row
Row-at-a-time
This example shows how you can call a function more like a traditional CPython function in RiskScape. Assume you already have a script that has a compute_damage and compute_loss function:
def function(rows):
for row in rows:
dr = compute_damage(row)
loss = compute_loss(row, dr)
# Return a row back to RiskScape for each row we are given
yield {dr: dr, loss: loss}
Note that unlike a standard RiskScape function that appears in a select step, only those attributes that are returned from the function are returned.