Interface Collector<T>

All Superinterfaces:
AutoCloseable, Realized

public interface Collector<T> extends Realized

Realized interface for a functor that can arbitrarily transform one set of tuples in to another.

  • Method Details

    • getSourceType

      Struct getSourceType()

      The type of tuple this collector accepts.

    • getTargetType

      Struct getTargetType()

      The type of tuple this collector produces

    • newAccumulator

      T newAccumulator()

      Create a new accumulator that holds collection/aggregation state for this collector to use. These don't need to be thread safe - any code making use of this API should assume they'll be accessed by only a single thread. NB do we want to support a shared accumulator API?

    • accumulate

      void accumulate(T accumulator, Tuple tuple)

      Give a tuple to the accumulator for accumulation. This should add any required state to the accumulator for this tuple, for example add a number to a running total, append an element to a list, etc.

    • combine

      T combine(T lhs, T rhs)

      Combine two accumulators, returning an accumulator that contains the accumulated state of both. This needs to be done in such a way that it produces the same result as if only one accumulator is used. Implementations are free to modify and return one of the given accumulators if it makes sense.

    • process

      TupleIterator process(T accumulator)

      Convert an accumulator in to a set of results. This can be done lazily, e.g. rows can be pulled from the accumulator on demand.

    • getAccumulatorClass

      Class<T> getAccumulatorClass()
    • getCharacteristics

      default Set<Collector.Characteristic> getCharacteristics()
      Returns:
      a set of any Collector.Characteristics this Collector has
    • isParallelizable

      default boolean isParallelizable()
      Returns:
      true if this Collector has the Collector.Characteristic.PARALLELIZABLE.
    • size

      default Optional<Long> size(T accumulator)
      Returns:
      the number of accumulated entries, i.e. the total number of tuples that process(Object) is expected to emit.