Skip to main content

Input: azure-blob

Send data to a Microsoft Azure Storage Blob (Block Storage)

Field Summary

Field NameTypeDescriptionDefault
whenmessage_filterFire this input when a specific internal message occurs-
intervaldurationHow often to run the command-
croncronHow often to run the command. Note that unlike standard Cron, Pipes use a Cron syntax that includes a column for seconds. See full discussion-
immediateboolRun as soon as invoked, instead of waiting for the specified cron intervalfalse
random-offsetdurationSets a random offset to the schedule, then sticks to it0s
windowWindowFor resources that need a time window to be specified-
blockboolBlock further input schedules from triggering if the pipe output is retryingfalse
container-namestringThe storage service container for created blobs-
blob-namesarray of stringsThe name for the blob-
blob-name-fieldfieldThe field that a blob name from an operation should be stored in-
creation-time-fieldfieldThe field that the blob creation time should be stored in-
last-modified-fieldfieldThe field that the blob last modified time should be stored in-
content-length-fieldfieldThe field that the blob content length information should be stored in-
content-type-fieldfieldThe field that the blob content type information should be stored in-
content-md5-fieldfieldThe field that the blob content md5 should be stored in-
data-fieldfieldA field that the blob data should be nested in-
storage-accountstringThe Storage Account Name to be used (credential)-
storage-master-keystringThe Storage Master Key to be used (credential)-
timestamp-modeAzureBlobTimestampModeDerive a timestamp for this blob for filtering purposes based on the selected strategy.-
maximum-ageMaxAgeSpecifierRemove any blobs older than this many seconds from the candidate list-
modeAzureBlobInputModeThe operating mode for this input-
fingerprintingboolEnable object fingerprinting, which will cause a object to only be downloaded oncefalse
fingerprinting-db-pathpathSpecify a custom path for the fingerprinting database-
maximum-fingerprint-ageMaxAgeSpecifierRemove any object fingerprints older than this from the tracker30 days
preprocessorsPreProcessorPreprocessors (process downloaded data before making it available to the pipeline) these processors will be run in the order they are specified-

Fields

when

Type: message_filter

Fire this input when a specific internal message occurs

This field overloads time-based scheduling with a scheduler that fires on matching messages.

Example

Pipe Language Snippet:

input:
http-poll:
when:
message-received:
filter-type:
- pipe-idle
url: "http://localhost:8888"
raw: true
ignore-line-breaks: true

interval

Type: duration

How often to run the command

By default, interval: 0s which means: once. Note that scheduled inputs set document markers. See full discussion

Example

Pipe Language Snippet:

exec:
command: echo 'once a day'
interval: 1d

cron

Type: cron

How often to run the command. Note that unlike standard Cron, Pipes use a Cron syntax that includes a column for seconds. See full discussion

Example: Once a day

Pipe Language Snippet:

exec:
command: echo 'once a day'
cron: '0 0 0 * * *'

Example: Once a day, using a convenient shortcut

Pipe Language Snippet:

exec:
command: echo 'once a day'
cron: '@daily'

immediate

Type: bool

Default: false

Run as soon as invoked, instead of waiting for the specified cron interval

Example: Run immediately on invocation, and thereafter at 10h every morning

Pipe Language Snippet:

exec:
command: echo 'hello'
immediate: true
cron: '0 0 10 * * *'

random-offset

Type: duration

Default: 0s

Sets a random offset to the schedule, then sticks to it

This can help avoid the thundering herd problem, where you do not, for example, want to overload some service at 00:00:00

Example: Would fire up to a minute after every hour

Pipe Language Snippet:

exec:
command: echo 'hello'
random-offset: 1m
cron: '0 0 * * * *'

window

Type: Window

For resources that need a time window to be specified

Field NameTypeDescriptionDefault
sizedurationWindow size-
offsetdurationWindow offset0s
start-timetimeAllows the windowing to start at a specified time-
highwatermark-filepathSpecify file where timestamp would be stored in order to resume, for when Pipe has been restarted-

  size

Type: duration

Window size

Example

Pipe Language Snippet:

exec:
command: echo 'one two'
window:
size: 1m

  offset

Type: duration

Default: 0s

Window offset

Example

Pipe Language Snippet:

exec:
command: echo 'one two'
window:
size: 1m
offset: 10s

  start-time

Type: time

Allows the windowing to start at a specified time

It should in the following format: 2019-07-10 18:45:00.000 +0200

Example

Pipe Language Snippet:

exec:
command: echo 'one two'
window:
size: 1m
start-time: 10s

  highwatermark-file

Type: path

Specify file where timestamp would be stored in order to resume, for when Pipe has been restarted

Example

Pipe Language Snippet:

exec:
command: echo 'one two'
window:
size: 1m
highwatermark-file:: /tmp/mark.txt

block

Type: bool

Default: false

Block further input schedules from triggering if the pipe output is retrying

container-name

Type: string

The storage service container for created blobs

blob-names

Type: array of strings

The name for the blob

blob-name-field

Type: field

The field that a blob name from an operation should be stored in

creation-time-field

Type: field

The field that the blob creation time should be stored in

last-modified-field

Type: field

The field that the blob last modified time should be stored in

content-length-field

Type: field

The field that the blob content length information should be stored in

content-type-field

Type: field

The field that the blob content type information should be stored in

content-md5-field

Type: field

The field that the blob content md5 should be stored in

data-field

Type: field

A field that the blob data should be nested in

storage-account

Type: string

The Storage Account Name to be used (credential)

storage-master-key

Type: string

The Storage Master Key to be used (credential)

timestamp-mode

Type: AzureBlobTimestampMode

Derive a timestamp for this blob for filtering purposes based on the selected strategy.

Field NameTypeDescriptionDefault
noneThe default mode, do not filter blobs based on timestamps-
creation-timeFilter blobs on the creation-time timestamp reported by the service-
last-modifiedFilter blobs on the last-modified timestamp reported by the service-
blob-name-patternstringFilter blobs on the timestamp derived from the blob name for example: blob-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/ -

  none

The default mode, do not filter blobs based on timestamps

  creation-time

Filter blobs on the creation-time timestamp reported by the service

  last-modified

Filter blobs on the last-modified timestamp reported by the service

  blob-name-pattern

Type: string

Filter blobs on the timestamp derived from the blob name for example: blob-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/

maximum-age

Type: MaxAgeSpecifier

Remove any blobs older than this many seconds from the candidate list

Field NameTypeDescriptionDefault
secondsintegerSpecify the maximum age in number of seconds-
durationstringSpecify the maximum age as a human readable duration (example: 1 hour)-

  seconds

Type: integer

Specify the maximum age in number of seconds

  duration

Type: string

Specify the maximum age as a human readable duration (example: 1 hour)

mode

Type: AzureBlobInputMode

The operating mode for this input

Field NameTypeDescriptionDefault
list-blobsList Blobs-
download-blobDownload Given Blobs-
list-and-download-blobsList Blobs and Download-

  list-blobs

List Blobs

  download-blob

Download Given Blobs

  list-and-download-blobs

List Blobs and Download

fingerprinting

Type: bool

Default: false

Enable object fingerprinting, which will cause a object to only be downloaded once

fingerprinting-db-path

Type: path

Specify a custom path for the fingerprinting database

maximum-fingerprint-age

Type: MaxAgeSpecifier

Default: 30 days

Remove any object fingerprints older than this from the tracker

Field NameTypeDescriptionDefault
secondsintegerSpecify the maximum age in number of seconds-
durationstringSpecify the maximum age as a human readable duration (example: 1 hour)-

  seconds

Type: integer

Specify the maximum age in number of seconds

  duration

Type: string

Specify the maximum age as a human readable duration (example: 1 hour)

preprocessors

Type: PreProcessor

Preprocessors (process downloaded data before making it available to the pipeline) these processors will be run in the order they are specified

Field NameTypeDescriptionDefault
extensionPreprocess the object or blob based on the extension of the object or blob name (.gz, .parquet)-
gzipUnGzip the received data-
parquetExtract the received data as JSON rows from a parquet file-

  extension

Preprocess the object or blob based on the extension of the object or blob name (.gz, .parquet)

  gzip

UnGzip the received data

  parquet

Extract the received data as JSON rows from a parquet file