clj-headlights.input-output
Tools for pipeline data input and output
multi-source
(multi-source pipeline name resource-strings)
Take collection of resource strings and return a composite transform which contains all those resources. If collection is empty, return an empty pcollection.
read-json-source
(read-json-source pcoll composite-name resource-string)
Inputs: [pcoll :- pcollections/PCollectionType composite-name :- s/Str resource-string :- s/Str]
Like resource-string->source, but maps elements from json-strings to objects.
resource-string->source
(resource-string->source resource-string)
Construct a Dataflow source transform to read from a resource-string. Supported are: * Local files (file://) * GCS (gs://) * PubSub topics / subscriptions
url->sink
(url->sink url)
Construct a Dataflow sink transform to write text to a url. Supported are: * Local files (file://) * GCS (gs://) * PubSub topics
write-groups-to-partitioned-files
(write-groups-to-partitioned-files pipeline name destination suffix)
Inputs: [pipeline :- pcollections/PCollectionType name :- s/Str destination :- s/Str suffix :- s/Str] Returns: pcollections/PCollectionType
write-json-to-sink
(write-json-to-sink pcoll name url)
Like write-to-sink, but maps elements to json before.
write-to-sink
(write-to-sink pcoll name sink-url)
Construct a Dataflow transform to write text to a sink and apply it to a pcoll. Supported are: * Local files (file://) * GCS (gs://) * PubSub topics