Skip to main content

Working with Data

JSON

All actions, except for raw (more on this subject later), will operate on valid JSON data. Each input line is a JSON document, delimited by a line feed, internally known as an "event". The JSON document is composed of keys followed by values. Values can be text (str), number (int), null, or boolean values (true or false). All numbers are stored as double-precision floating-point numbers, without an "integer" or "float" distinction.

All inputs provide JSON data. A line of output converts to a JSON document, shown in the example:

{"_raw":"the line"}

As is the case with TCP or UDP inputs, there may be additional fields.

Here is an example of the default output of exec from the uptime command:

{"_raw":" 13:46:33 up 2 days, 4:25, 1 user, load average: 0.48, 0.39, 0.31"}

Extract Fields Using Patterns

The extract action works as follows:

# Input: {"_raw":" 13:46:33 up 2 days, 4:25, 1 user, load average: 0.48, 0.39, 0.31"}

- extract:
input-field: _raw
remove: true
pattern: 'load average: (\S+), (\S+), (\S+)'
output-fields: [m1, m5, m15]

# Output: {"m1":"0.48","m5","0.39","m15","0.31"}

Without the inclusion of remove: true, the output event would still contain _raw.

extract is tolerant by default, for instances where you may want to pass the same data through a variety of patterns.

:::warn Data that cannot be matched is passed through unaltered except when drop: true is added. If you want extract to highlight unmatched data, add warning: true. :::

Using this method to convert data requires some familiarity with regular expressions. Here is the dialect understood by Pipes. If possible, use expand for delimited data.

Number and Unit Conversion

extract does not automatically convert strings into numbers. That is the function of convert:

# Input: {"m1":"0.48","m5","0.39","m15","0.31"}

- convert
- m1: num
- m5: num
- m15: num

# Output: {"m1":0.48,"m5",0.39,"m15",0.31}

The usual JSON types are converted using num, str, and bool and this function also converts from units of storage and time.

For example, if the field mem was 512K and the field time was 252ms, then we can convert them into different units:

- convert
# Memory as MB.
- mem: M
# Time as fractional seconds.
- time: S

# Output: {"mem":0.5,"time":0.252}

The example below shows an extract followed by convert. The output of hotrod server traffic is a useful way to monitor incoming and outgoing Server traffic.

$> hotrod server traffic

metrics 644.00 B
logs 1.05 kiB
unsent logs 0.00 B
tarballs sent 213.34 kiB

The pattern in extract can be:

  • multiline
  • it is possible to see whitespace insensitive patterns using (?x)
  • whitespace (such as \s or \n) must be explicitly specified
  • the pattern itself can extend over several lines
  • comments beginning with # can be included
  • this makes longer regular expressions easier to read

(?x) is known as free-spacing (see man perlre). Assume the above output is saved in traffic.txt:

name: traffic

input:
exec:
command: cat traffic.txt
ignore-linebreaks: true
interval: 1s
count: 1

actions:
- extract:
remove: true
pattern: |
(?x)
metrics\s+(.+)\n
logs\s+(.+)\n
unsent\slogs\s+.+\n
tarballs\ssent\s+(.+)
output-fields: [metrics,logs,tarballs]

- convert:
- metrics: K
- logs: K
- tarballs: K

output:
write: console

# Output: {"metrics":0.62890625,"logs":1.05,"tarballs":213.34}

Working with Raw Text

Sometimes a data input must be raw text. Take the fictitious Netter Corporation and their netter command, with the following output:

netter v0.1
Copyright Netter Corp
output
port,throughput
1334,45552
1335,5666

Let's treat this example as raw CSV by skipping the header lines and then transform the _raw field to JSON using actions['raw'].to-json: _raw for future processing.

Paste the above text as content into netter.txt and run the following Pipe:

  • use input.exec.raw: true in order to stop exec quoting the line
  • use actions['raw'].discard-until: '^port,' to skip lines until we see a line beginning with port,
  • use actions['raw'].to-json: _raw to quote the line as JSON
name: netter

input:
exec:
command: 'cat netter.txt'
raw: true

actions:
- raw:
discard-until: '^port,'

- raw:
to-json: _raw

output:
write: console

# Output:
# {"_raw":"port,throughput"}
# {"_raw":"1334,45552"}
# {"_raw":"1335,5666"}

raw has the unique ability to work with any text, not just JSON. It can also perform operations on text, such as substitution. Using raw is clearer and easier to maintain than commands like tr:

input:
text: "Hello Hound"

- raw:
replace:
pattern: H
substitution: h

# Output: "hello hound".

raw.extract will extract matches from text:

input:
text: "Hello Dolly"

- raw:
extract:
pattern: Hello (\S+)

# Output: Dolly

If you do not want to operate on the entire line, input-field provides for both replace and extract to operate on the text in a particular field.

A replacement containing regex group specifiers can be provided. In this case, the first matched group is $1, which differs from \1 used with most utilities like sed and awk:

# Input: {"greeting":"Hello Dolly"}

- raw:
extract:
input-field: greeting
pattern: Hello (\S+)
replace: Goodbye $1

# Output: {"greeting":"Goodbye Dolly"}

All input text is available as $0 as no pattern was specified.

This method minimizes the need for complicated pipelines involving sed, awk, and the resulting output is guaranteed to be consistent on all supported platforms.

Converting from CSV

Once input data is in this form, we can use expand to convert CSV data. Reuse the above netter.txt file:

name: netter_csv

input:
exec:
command: cat netter.txt
raw: true

actions:
- raw:
discard-until: '^port,'

- raw:
to-json: _raw

- expand:
remove: true
input-field: _raw
csv:
header: true

output:
print: STDOUT

# Output:
# {"port":1334,"throughput":45552}
# {"port":1335,"throughput":5666}
note

expand assumes by default that fields are comma-separated. An alternate delimiter can be specified using delim.

Keep in mind that using an existing header may be convenient but the actual types of the fields are worked out by auto-conversion, which may not always be desired.

autoconvert: false ensures that all fields remain as text (str):

    csv:
header: true
autoconvert: false

If the source generates headers each time it is run, such as when scheduled with input-exec, expand-csv will need a field to flag these first lines. Use begin-marker-field to specify the field name, which must correspond to the same in batch with exec.

Alternatively, you can provide fields or a field-file. fields will specify the column name and type (str, num, null or bool are allowed). field-file is a file containing name:type lines.

Headers may also be specified as field: header-field, which contains the column names separated by the delimiter. If header-field-types: true then the format is name:type.

This header-field only needs to be specified at the start but can be specified again when the schema changes, such as when names and/or the type of column changes. collapse with header-field-on-change: true will write events with this format.

In the absence of any column information, gen_headers can be used to name the columns as: _0, _1, and so on.

Some formats use a special marker like - to indicate null fields.

Fields separated by a space, require delim: ' ' to be added to the csv section.

note

This is a special case and will skip any whitespace between fields. \t can also be used for tab-separated fields.

So expand takes a field containing delimiter-separated data and converts it into JSON, removing the original field if needed. The expand action is preferable to extract in this case because there is no need to write regular expressions.

Converting from Key-Value Pairs

Key-value pairs (KV pairs), is a popular data format. Here, the KV pair delimeter is ' ':

# Input: {"_raw":"a=1 b=2"}

- expand:
input-field: _raw
remove: true
delim: ' '
key-value:
autoconvert: true

# Output: {"a":1,"b":2}

Here, the KV pair delimeter is ,, declared with delim: ','. The key and value delimeter is a :, declared with key-value-delim: ':':

# Input: {"_raw":"name:\"Arthur\",age:42"}

- expand:
input-field: _raw
remove: true
delim: ','
key-value:
autoconvert: true
key-value-delim: ':'

# Output: {"name":"Arthur","age":42}

The separator can be a newline, delim: '\n'. If your incoming data resembles this example:

name=dolly
age=42

It can easily be converted into: {"name":"dolly","age":42}.

Working with Input JSON

If a field contains quoted JSON, using expand with json: true parses and extracts the fields to merge it with the existing event.

Another option is to use expand-events. This differs from the previous example as it splits the value of input-field with the delimiter to convert one event to multiple events.

# Input: {"family":"baggins","data":"frodo bilbo"}.

- expand:
input-field: data
remove: true
delim: ' '
events:
output-split-field: name

# Output:
# {"family":"baggins","name":"frodo"}
# {"family":"baggins","name":"bilbo"}

Output as Raw

In most cases, we output the final events as JSON. However, the situation occasionally requires more unstructured lines. For instance, "classic" Pipe output is captured by systemd, passed to the server through rsyslog, unpacked using logstash and finally routed into Elasticsearch.

To send events back using this route, you will need to prepend the event with "@cee: " using the raw action, as seen in the final action below:

- raw:
extract:
replace: "@cee: $0"

$0 is the full match over the entire line.

While it is common for outputs to receive events as line-separated JSON documents (so-called "streaming JSON"), it is not essential. Single lines of text can be passed, creating and passing multi-line data is also possible.

Templates

note

The provision of template-result-field when using add, allows for an arbitrary format template such as YAML. Note the ${field} expansion:

# Input: {"one":1,"two":2}

- add:
template-result-field: result
template: |
results:
one: ${one}
two: ${two}

# Output:
# {"one":1,"two": 2,"result":"results:\n one: 1\n two: 2\n"}

To http-post this arbitrary data to a server, set body-field to the result field:

output:
http-post:
body-field: result
url: 'http://localhost:3030'

Similarly, exec has input-field:

input:
text: '{"name":"dolly"}'

actions:
- time:
output-field: tstamp

- add:
template-result-field: greeting
template: |
time: ${tstamp}
hello ${name}
goodbye ${name}

output:
exec:
command: 'cat'
input-field: greeting

# Output:
# time: 2019-02-19T09:27:03.943Z
# hello dolly
# goodbye dolly

The command itself can contain Field Expansion such as ${name}. The greeting field will be appended to the specified file if there is also a field called file:

output:
exec:
command: 'cat >>${file}'
input-field: greeting