clj-headlights.pardo

create-and-apply

(create-and-apply pcoll name clj-call options)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall options :- {s/Keyword s/Any}] Returns: pcollections/PCollectionType

Create a ParDo operation from a DoFn class, and apply it to a PCollection.

It accepts an options map parameter that describes how the ParDo is created. You can specify the following keys: :dofn-cls - which DoFn class is created for the ParDo. For now, it is required that the class inherits from clj_headlights.AbstractCljDoFn, because the constructor invocation is hardcoded. Create a CljDoFn by default. :outputs - a map associating the (tagged) outputs of the ParDo with their respective coders. This map should always contain at least a :main key, that is used for the main output coder. :side-inputs - a collection of side-inputs the ParDo needs. As Beam requires, the given side-inputs need to extend PCollectionView. Note: this only attaches the views to the ParDo; it is up to you to carry around the same view object in your code to access the view data.

default-coder

(default-coder)

emit-main-output

(emit-main-output context value)

Emit the main output value

emit-side-output

(emit-side-output context tag value)

Emit a value to a side output.

get-side-output

(get-side-output pcoll tag)

Retrieve the pcollection associated with a given tag in the output of a df-map-with-side-outputs.

get-side-outputs

(get-side-outputs pcoll)

Retrieve the map of Tags to PCollections.

get-tag

(get-tag tag)

input-restructurer

(input-restructurer coder)

Inputs: [coder :- Coder] Returns: s/Any

Given the coder of the input, create a function which pulls the input value from the context and turns it into clojure data structures.

invoke-with-optional-state

(invoke-with-optional-state & args)

kv-value-restructurer

(kv-value-restructurer coder)

Inputs: [coder :- Coder] Returns: s/Any

If the output of the previous stage was a KV, then it may have been the result of a GroupBy or CoGroupBy, which means we need to transform the output of those operations into more idiomatic clojure data structures. For a GroupBy we wrap the list of results in a seq, so that it acts like a normal list. For a CoGroupBy we create a positional vector, where the results for each pcoll that was grouped are in the same position they were sent to co-group-by-key.

make-tags-list

(make-tags-list tags)

Inputs: [tags :- [s/Keyword]] Returns: TupleTagList

process-element

(process-element c window serialized-clj-call creation-stack input-extractor & states)

set-side-output-coder

(set-side-output-coder pcoll tag coder)

Inputs: [pcoll :- PCollectionTuple tag :- s/Keyword coder :- Coder] Returns: PCollection

Sets the coder for the pcollection associated with a given tag in the output of a df-map-with-side-outputs.

setup

(setup input-coder serializable-clj-call)