clj-headlights.pipeline
apply-to-value-and-output-kv
(apply-to-value-and-output-kv context value _window state clj-call)
apply-to-value-and-output-with-timestamp
(apply-to-value-and-output-with-timestamp context value _window state clj-call)
apply-to-value-and-seq-outputs
(apply-to-value-and-seq-outputs context value _window state clj-call)
apply-to-value-and-seq-outputs-kv
(apply-to-value-and-seq-outputs-kv context value _window state clj-call)
co-group-by-key
(co-group-by-key pcolls name)
Inputs: [pcolls :- (s/cond-pre PCollection [PCollection]) name :- s/Str] Returns: PCollection
Takes a list of PCollections which are already in KV form, and joins them on their keys. Output pcollection will look like: [key [first-pcoll-vals second-pcoll-vals third-pcoll-vals &c]]
composite
(composite name inputs f)
Inputs: [name :- s/Str inputs :- [pcollections/PCollectionType] f :- IFn]
Nests transforms that happen in f into a composite transform
create
(create options)
Inputs: [options :- PipelineOptions] Returns: Pipeline
Create a Pipeline object. This is the first building block of any Beam pipeline.
df-filter
(df-filter pcoll name clj-call)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall] Returns: PCollection
Returns a PCollection
containing the elements of pcoll
where (f element)
is truthy
df-filter-kv
(df-filter-kv pcoll name clj-call)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall] Returns: PCollection
returns a PCollection
containing the elements of pcoll
where (f element)
is true. coerces to KV
df-map
(df-map pcoll name clj-call)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall] Returns: PCollection
Returns a PCollection
of the return values of function clj-call
being applied to the input pcoll
- used for strictly 1-to-1 transformations
df-map-cat-with-side-outputs
(df-map-cat-with-side-outputs pcoll name clj-call tags)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall tags :- [s/Keyword]] Returns: PCollectionTuple
Acts like a map-cat but returns a PCollectionTuple
partitioned by the tag of the outputs of f
. f
must return a sequence of tuples in the form [:tag value]
. The main output should use the tag :main
. Nil values are filtered out
df-map-kv
(df-map-kv pcoll name clj-call)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall]
Same as df-map but expects the output of f
to be a sequence of length 2 which is converted to a Dataflow KV object. Mostly used before a GroupBy
df-map-with-side-input
(df-map-with-side-input pcoll name side-input-view clj-call)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str side-input-view clj-call :- clj-fn-call/CljCall] Returns: PCollection
df-map-with-timestamp
(df-map-with-timestamp pcoll name clj-call)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall]
Assigns a timestamp to pcoll rows
df-mapcat
(df-mapcat pcoll name clj-call)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall] Returns: PCollection
Similar to df-map
but the return-values of f
are flattened in the output PCollection
- used when the transformation is not strictly 1-to-1
df-mapcat-kv
(df-mapcat-kv pcoll name clj-call)
Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall]
Same as df-map-kv
except f
returns a sequence of lenght-2-sequences to be flattened and converted into KV objects
flatten-pcollections
(flatten-pcollections pcolls name)
Inputs: [pcolls :- [PCollection] name :- s/Str] Returns: PCollection
Merge a list of pcolls.
group-by-key
(group-by-key pcoll name)
Inputs: [pcoll :- PCollection name :- s/Str] Returns: PCollection
make-pipeline-options
(make-pipeline-options)
Inputs: [] Returns: PipelineOptions
Create a PipelineOptions object.
replace-keywords
(replace-keywords clj-call ctx window)
Replaces special keywords with Beam objects, preserves order
side-output-function-wrapper
(side-output-function-wrapper context val window state clj-call)