clj-headlights.pipeline

apply-to-value-and-output

(apply-to-value-and-output context value window state clj-call)

apply-to-value-and-output-kv

(apply-to-value-and-output-kv context value _window state clj-call)

apply-to-value-and-output-with-timestamp

(apply-to-value-and-output-with-timestamp context value _window state clj-call)

apply-to-value-and-seq-outputs

(apply-to-value-and-seq-outputs context value _window state clj-call)

apply-to-value-and-seq-outputs-kv

(apply-to-value-and-seq-outputs-kv context value _window state clj-call)

co-group-by-key

(co-group-by-key pcolls name)

Inputs: [pcolls :- (s/cond-pre PCollection [PCollection]) name :- s/Str] Returns: PCollection

Takes a list of PCollections which are already in KV form, and joins them on their keys. Output pcollection will look like: [key [first-pcoll-vals second-pcoll-vals third-pcoll-vals &c]]

composite

(composite name inputs f)

Inputs: [name :- s/Str inputs :- [pcollections/PCollectionType] f :- IFn]

Nests transforms that happen in f into a composite transform

context->ms

(context->ms c)

create

(create options)

Inputs: [options :- PipelineOptions] Returns: Pipeline

Create a Pipeline object. This is the first building block of any Beam pipeline.

df-apply-dofn

(df-apply-dofn pcoll name clj-call)

Inputs: [pcoll name clj-call]

df-filter

(df-filter pcoll name clj-call)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall] Returns: PCollection

Returns a PCollection containing the elements of pcoll where (f element) is truthy

df-filter-kv

(df-filter-kv pcoll name clj-call)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall] Returns: PCollection

returns a PCollection containing the elements of pcoll where (f element) is true. coerces to KV

df-map

(df-map pcoll name clj-call)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall] Returns: PCollection

Returns a PCollection of the return values of function clj-call being applied to the input pcoll - used for strictly 1-to-1 transformations

df-map-cat-with-side-outputs

(df-map-cat-with-side-outputs pcoll name clj-call tags)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall tags :- [s/Keyword]] Returns: PCollectionTuple

Acts like a map-cat but returns a PCollectionTuple partitioned by the tag of the outputs of f. f must return a sequence of tuples in the form [:tag value]. The main output should use the tag :main. Nil values are filtered out

df-map-kv

(df-map-kv pcoll name clj-call)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall]

Same as df-map but expects the output of f to be a sequence of length 2 which is converted to a Dataflow KV object. Mostly used before a GroupBy

df-map-with-side-input

(df-map-with-side-input pcoll name side-input-view clj-call)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str side-input-view clj-call :- clj-fn-call/CljCall] Returns: PCollection

df-map-with-timestamp

(df-map-with-timestamp pcoll name clj-call)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall]

Assigns a timestamp to pcoll rows

df-mapcat

(df-mapcat pcoll name clj-call)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall] Returns: PCollection

Similar to df-map but the return-values of f are flattened in the output PCollection - used when the transformation is not strictly 1-to-1

df-mapcat-kv

(df-mapcat-kv pcoll name clj-call)

Inputs: [pcoll :- pcollections/PCollectionType name :- s/Str clj-call :- clj-fn-call/CljCall]

Same as df-map-kv except f returns a sequence of lenght-2-sequences to be flattened and converted into KV objects

filterer

(filterer context value _window state clj-call)

filterer-kv

(filterer-kv context value _window state clj-call)

flatten-pcollections

(flatten-pcollections pcolls name)

Inputs: [pcolls :- [PCollection] name :- s/Str] Returns: PCollection

Merge a list of pcolls.

get-side-output

get-side-outputs

group-by-key

(group-by-key pcoll name)

Inputs: [pcoll :- PCollection name :- s/Str] Returns: PCollection

make-pipeline-options

(make-pipeline-options)

Inputs: [] Returns: PipelineOptions

Create a PipelineOptions object.

output-is-kv

(output-is-kv pcoll)

process-pcoll

(process-pcoll pcoll name wrapping-call clj-call)

process-pcoll-kv

(process-pcoll-kv pcoll name wrapping-call clj-call)

replace-keywords

(replace-keywords clj-call ctx window)

Replaces special keywords with Beam objects, preserves order

side-output-function-wrapper

(side-output-function-wrapper context val window state clj-call)