Skip to main content

Author: Hotrod
Summary: Download CSV data from the web, reformat data to JSON
Keywords: csv, http
Categories: demo

CSV to JSON

This is a demo pipe that demonstrates how to:

  1. Download CSV data using HTTP
  2. Parse that data into JSON
  3. Perform various transformations

For this demo, the input is country data and metadata in CSV format, which is downloaded directly from Github.

The output that this pipe generates looks like this:

...
{"cca2":"CH","capital":"Bern","region":"Europe","subregion":"Western Europe","num":43,"name":"Switzerland","tld":".ch","latitude":47,"longitude":8}
{"cca2":"CL","capital":"Santiago","region":"Americas","subregion":"South America","num":44,"name":"Chile","tld":".cl","latitude":-30,"longitude":-71}
{"cca2":"CN","capital":"Beijing","region":"Asia","subregion":"Eastern Asia","num":45,"name":"China","tld":".cn","latitude":35,"longitude":105}
{"cca2":"CI","capital":"Yamoussoukro","region":"Africa","subregion":"Western Africa","num":46,"name":"Ivory Coast","tld":".ci","latitude":8,"longitude":-5}
{"cca2":"CM","capital":"Yaoundé","region":"Africa","subregion":"Middle Africa","num":47,"name":"Cameroon","tld":".cm","latitude":6,"longitude":12}
{"cca2":"CD","capital":"Kinshasa","region":"Africa","subregion":"Middle Africa","num":48,"name":"DR Congo","tld":".cd","latitude":0,"longitude":25}
...

Source dataset:

https://mledoze.github.io/countries/

Pipe Definition

name: demo-csv-to-json

# =========================================== input: download CSV file directory from the web
input:
http-poll:
address: https://raw.githubusercontent.com/mledoze/countries/master/dist/countries.csv

actions:

# =========================================== CSV to JSON
- expand:
input-field: _raw
remove: true
csv:
header: true

# =========================================== add a record count
- script:
let:
- num: count()

# =========================================== remove fields
- remove:
fields:
# - name
# - tld
# - cca2
- ccn3
- cca3
- cioc
- independent
- status
- unMember
- currencies
- idd
# - capital
- altSpellings
# - region
# - subregion
- languages
- translations
# - latlng
- landlocked
- borders
- area
- flag
- demonyms
- callingCodes

# =========================================== rename fields
- rename:
fields:
- name: orig_name
- tld: orig_tld

# =========================================== extract first item from name list
- extract:
input-field: orig_name
remove: true
pattern: '^([^,]+)'
output-fields:
- name

# =========================================== extract first item from tld list
- extract:
input-field: orig_tld
remove: true
pattern: '^([^,]+)'
output-fields:
- tld

# =========================================== extract lat/long into seperate fields
- extract:
input-field: latlng
remove: true
pattern: '^(.*),(.*)$'
output-fields:
- latitude
- longitude

# =========================================== convert lat/long from string to number to number
- convert:
fields:
- latitude: num
- longitude: num

# =========================================== output: print to the pipe's STDOUT
output:
print: STDOUT