Working with Data
JSON
All actions
, except for raw
(more on this subject later), will operate on
valid JSON data. Each input
line is a JSON document, delimited by a line feed,
internally known as an "event". The JSON document is composed of keys followed
by values. Values can be text (str
), number (int
), null
, or boolean values
(true
or false
). All numbers are stored as double-precision floating-point
numbers, without an "integer" or "float" distinction.
All inputs
provide JSON data. A line of output
converts to a JSON document,
shown in the example:
{"_raw":"the line"}
As is the case with TCP or UDP inputs
, there may be additional fields.
Here is an example of the default output
of exec
from the uptime
command:
{"_raw":" 13:46:33 up 2 days, 4:25, 1 user, load average: 0.48, 0.39, 0.31"}
Extract Fields Using Patterns
The extract
action
works as follows:
# Input: {"_raw":" 13:46:33 up 2 days, 4:25, 1 user, load average: 0.48, 0.39, 0.31"}
- extract:
input-field: _raw
remove: true
pattern: 'load average: (\S+), (\S+), (\S+)'
output-fields: [m1, m5, m15]
# Output: {"m1":"0.48","m5","0.39","m15","0.31"}
Without the inclusion of remove: true
, the output
event would still contain
_raw
.
extract
is tolerant by default, for instances where you may want to pass the
same data through a variety of patterns.
:::warn
Data that cannot be matched is passed through unaltered except when drop: true
is added. If you want extract
to highlight unmatched data, add warning: true
.
:::
Using this method to convert
data requires some familiarity with regular
expressions. Here is the dialect
understood by Pipes. If possible, use expand
for
delimited data.
Number and Unit Conversion
extract
does not automatically convert
strings into numbers. That is the
function of convert
:
# Input: {"m1":"0.48","m5","0.39","m15","0.31"}
- convert
- m1: num
- m5: num
- m15: num
# Output: {"m1":0.48,"m5",0.39,"m15",0.31}
The usual JSON types are converted using num
, str
, and bool
and this
function also converts from units of storage and time.
For example, if the field mem
was 512K
and the field time
was 252ms
, then we
can convert
them into different units:
- convert
# Memory as MB.
- mem: M
# Time as fractional seconds.
- time: S
# Output: {"mem":0.5,"time":0.252}
The example below shows an extract
followed by convert
. The output of hotrod server traffic
is a useful way to monitor incoming and outgoing Server traffic.
$> hotrod server traffic
metrics 644.00 B
logs 1.05 kiB
unsent logs 0.00 B
tarballs sent 213.34 kiB
The pattern in extract
can be:
- multiline
- it is possible to see whitespace insensitive patterns using
(?x)
- whitespace (such as
\s
or\n
) must be explicitly specified - the pattern itself can extend over several lines
- comments beginning with
#
can be included - this makes longer regular expressions easier to read
(?x)
is known as free-spacing (see man perlre
). Assume the above output is saved in traffic.txt
:
name: traffic
input:
exec:
command: cat traffic.txt
ignore-linebreaks: true
interval: 1s
count: 1
actions:
- extract:
remove: true
pattern: |
(?x)
metrics\s+(.+)\n
logs\s+(.+)\n
unsent\slogs\s+.+\n
tarballs\ssent\s+(.+)
output-fields: [metrics,logs,tarballs]
- convert:
- metrics: K
- logs: K
- tarballs: K
output:
write: console
# Output: {"metrics":0.62890625,"logs":1.05,"tarballs":213.34}
Working with Raw Text
Sometimes a data input
must be raw
text. Take the fictitious Netter
Corporation and their netter
command, with the following output:
netter v0.1
Copyright Netter Corp
output
port,throughput
1334,45552
1335,5666
Let's treat this example as raw
CSV by skipping the header lines and then
transform the _raw
field to JSON using actions['raw'].to-json: _raw
for
future processing.
Paste the above text as content into netter.txt
and run the following Pipe:
- use
input.exec.raw: true
in order to stopexec
quoting the line - use
actions['raw'].discard-until: '^port,'
to skip lines until we see a line beginning withport,
- use
actions['raw'].to-json: _raw
to quote the line as JSON
name: netter
input:
exec:
command: 'cat netter.txt'
raw: true
actions:
- raw:
discard-until: '^port,'
- raw:
to-json: _raw
output:
write: console
# Output:
# {"_raw":"port,throughput"}
# {"_raw":"1334,45552"}
# {"_raw":"1335,5666"}
raw
has the unique ability to work with any text, not just JSON. It can also
perform operations on text, such as substitution
. Using raw
is clearer and easier
to maintain than commands like tr
:
input:
text: "Hello Hound"
- raw:
replace:
pattern: H
substitution: h
# Output: "hello hound".
raw.extract
will extract
matches from text
:
input:
text: "Hello Dolly"
- raw:
extract:
pattern: Hello (\S+)
# Output: Dolly
If you do not want to operate on the entire line, input-field
provides for
both replace
and extract
to operate on the text in a particular field.
A replacement containing regex group specifiers can be provided. In this case,
the first matched group is $1
, which differs from \1
used with most
utilities like sed
and awk
:
# Input: {"greeting":"Hello Dolly"}
- raw:
extract:
input-field: greeting
pattern: Hello (\S+)
replace: Goodbye $1
# Output: {"greeting":"Goodbye Dolly"}
All input
text is available as $0
as no pattern was specified.
This method minimizes the need for complicated pipelines involving sed
, awk
, and the resulting output
is guaranteed to be consistent on all supported platforms.
Converting from CSV
Once input data is in this form, we can use expand
to convert CSV data. Reuse the above netter.txt
file:
name: netter_csv
input:
exec:
command: cat netter.txt
raw: true
actions:
- raw:
discard-until: '^port,'
- raw:
to-json: _raw
- expand:
remove: true
input-field: _raw
csv:
header: true
output:
print: STDOUT
# Output:
# {"port":1334,"throughput":45552}
# {"port":1335,"throughput":5666}
expand
assumes by default that fields are comma-separated. An alternate
delimiter can be specified using delim
.
Keep in mind that using an existing header may be convenient but the actual types of the fields are worked out by auto-conversion, which may not always be desired.
autoconvert: false
ensures that all fields remain as text (str
):
csv:
header: true
autoconvert: false
If the source generates headers each time it is run, such as when scheduled with
input-exec
, expand-csv
will need a field to flag these first lines. Use
begin-marker-field
to specify the field name, which must correspond to the
same in batch
with exec
.
Alternatively, you can provide fields
or a field-file
. fields
will specify
the column name and type (str
, num
, null
or bool
are allowed).
field-file
is a file containing name:type
lines.
Headers may also be specified as field: header-field
, which contains the
column names separated by the delimiter. If header-field-types: true
then the
format is name:type
.
This header-field
only needs to be specified at the start but can be
specified again when the schema changes, such as when names and/or the type of
column changes. collapse
with
header-field-on-change: true
will write events with this format.
In the absence of any column information, gen_headers
can be used to name the
columns as: _0
, _1
, and so on.
Some formats use a special marker like -
to indicate null fields.
Fields separated by a space, require delim: ' '
to be added to the csv
section.
This is a special case and will skip any whitespace between fields. \t
can
also be used for tab-separated fields.
So expand
takes a field containing delimiter-separated data and converts it
into JSON, removing the original field if needed. The expand
action is
preferable to extract
in this case because there
is no need to write regular expressions.
Converting from Key-Value Pairs
Key-value pairs (KV pairs), is a popular data format. Here, the KV pair
delimeter is ' '
:
# Input: {"_raw":"a=1 b=2"}
- expand:
input-field: _raw
remove: true
delim: ' '
key-value:
autoconvert: true
# Output: {"a":1,"b":2}
Here, the KV pair delimeter is ,
, declared with delim: ','
. The key and
value delimeter is a :
, declared with key-value-delim: ':'
:
# Input: {"_raw":"name:\"Arthur\",age:42"}
- expand:
input-field: _raw
remove: true
delim: ','
key-value:
autoconvert: true
key-value-delim: ':'
# Output: {"name":"Arthur","age":42}
The separator can be a newline, delim: '\n'
. If your incoming data resembles
this example:
name=dolly
age=42
It can easily be converted into: {"name":"dolly","age":42}
.
Working with Input JSON
If a field contains quoted JSON, using expand
with json: true
parses and
extracts the fields to merge it with the existing event.
Another option is to use expand-events
. This differs from the previous example
as it splits the value of input-field
with the delimiter to convert one event
to multiple events.
# Input: {"family":"baggins","data":"frodo bilbo"}.
- expand:
input-field: data
remove: true
delim: ' '
events:
output-split-field: name
# Output:
# {"family":"baggins","name":"frodo"}
# {"family":"baggins","name":"bilbo"}
Output as Raw
In most cases, we output
the final events as JSON. However, the situation occasionally requires more unstructured lines. For instance, "classic" Pipe output is captured by systemd
, passed to the server through rsyslog
, unpacked using logstash
and finally routed into Elasticsearch.
To send events back using this route, you will need to prepend the event with
"@cee: "
using the raw
action
, as seen in the final
action
below:
- raw:
extract:
replace: "@cee: $0"
$0
is the full match over the entire line.
While it is common for outputs to receive events as line-separated JSON documents (so-called "streaming JSON"), it is not essential. Single lines of text can be passed, creating and passing multi-line data is also possible.
Templates
The provision of template-result-field
when using
add
, allows for an arbitrary format template such as
YAML. Note the ${field}
expansion:
# Input: {"one":1,"two":2}
- add:
template-result-field: result
template: |
results:
one: ${one}
two: ${two}
# Output:
# {"one":1,"two": 2,"result":"results:\n one: 1\n two: 2\n"}
To http-post
this arbitrary data to a server, set body-field
to the result
field:
output:
http-post:
body-field: result
url: 'http://localhost:3030'
Similarly, exec
has input-field
:
input:
text: '{"name":"dolly"}'
actions:
- time:
output-field: tstamp
- add:
template-result-field: greeting
template: |
time: ${tstamp}
hello ${name}
goodbye ${name}
output:
exec:
command: 'cat'
input-field: greeting
# Output:
# time: 2019-02-19T09:27:03.943Z
# hello dolly
# goodbye dolly
The command itself can contain Field Expansion such as ${name}
. The greeting
field will be appended to the specified file if there is also a field called file
:
output:
exec:
command: 'cat >>${file}'
input-field: greeting