clj-headlights.pardo
create-and-apply
(create-and-apply pcoll name clj-call options)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall options :- {s/Keyword s/Any}] Returns: pcollections/PCollectionType
Create a ParDo operation from a DoFn class, and apply it to a PCollection.
It accepts an options
map parameter that describes how the ParDo is created. You can specify the following keys: :dofn-cls - which DoFn class is created for the ParDo. For now, it is required that the class inherits from clj_headlights.AbstractCljDoFn
, because the constructor invocation is hardcoded. Create a CljDoFn by default. :outputs - a map associating the (tagged) outputs of the ParDo with their respective coders. This map should always contain at least a :main key, that is used for the main output coder. :side-inputs - a collection of side-inputs the ParDo needs. As Beam requires, the given side-inputs need to extend PCollectionView. Note: this only attaches the views to the ParDo; it is up to you to carry around the same view object in your code to access the view data.
get-side-output
(get-side-output pcoll tag)
Retrieve the pcollection associated with a given tag in the output of a df-map-with-side-outputs.
input-restructurer
(input-restructurer coder)
Inputs: [coder :- Coder] Returns: s/Any
Given the coder of the input, create a function which pulls the input value from the context and turns it into clojure data structures.
kv-value-restructurer
(kv-value-restructurer coder)
Inputs: [coder :- Coder] Returns: s/Any
If the output of the previous stage was a KV, then it may have been the result of a GroupBy or CoGroupBy, which means we need to transform the output of those operations into more idiomatic clojure data structures. For a GroupBy we wrap the list of results in a seq, so that it acts like a normal list. For a CoGroupBy we create a positional vector, where the results for each pcoll that was grouped are in the same position they were sent to co-group-by-key.
process-element
(process-element c window serialized-clj-call creation-stack input-extractor & states)
set-side-output-coder
(set-side-output-coder pcoll tag coder)
Inputs: [pcoll :- PCollectionTuple tag :- s/Keyword coder :- Coder] Returns: PCollection
Sets the coder for the pcollection associated with a given tag in the output of a df-map-with-side-outputs.