# Rego Cookbook

*[back to index](./README.md)*

## Sample Data Files

The follow sample data files are provided in the [`./doc/data`](./data) folder, and are used throughout this document:

| file | description | source | retrieved | license |
|------|-------------|--------|-----------|---------|
| [`population.csv`](./data/population.csv) | Table of population by country. | [Wikipedia](https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population) | 2022-08-22 | [WP:CC BY-SA](https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License) |
| [`gdp.csv`](./data/gdp.csv) | Table of GDP by country. Dollar amounts listed in millions of USD. | [Wikipedia](https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)) | 2022-08-22 | [WP:CC BY-SA](https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License) |
| [`flights.json`](./data/flights.json) | Selected direct flight routes. | [FlightConnections](https://www.flightconnections.com) | 2022-08-27 | |
| [`iata.json`](./data/iata.json) | List of selected airport data. | [mborsetti/airportsdata](https://github.com/mborsetti/airportsdata) | 2022-08-27 | [MIT](https://github.com/mborsetti/airportsdata/blob/main/LICENSE) |
| [`psaux.txt`](./data/psaux.txt) | Sample of output from the `ps aux` command. | original | 2023-04-06 | [BSD-3](../LICENSE) |
| [`sysctl.txt`](./data/sysctl.txt) | Sample of output from the `sysctl -a` command. | original | 2023-04-06 | [BSD-3](../LICENSE) |

## Basics

### Retrieve a field from an object

Select IATA airports reachable on direct flights from CAE.

```plain
$ rq 'input.CAE' < flights.json
[
	"ATL",
	"CLT",
	"DFW",
	"DCA",
	"IAD",
	"ORD",
	"PHL",
	"LGA",
	"MIA",
	"XNA"
]
```

### Filter a list by criteria

Select only countries with an IMF GDP estimate of at least $1T USD.

```plain
$ rq -i csv -H '{g.Country | g := input[_]; i := g["IMF Estimate"]; i != ""; i > 1000000}' < gdp.csv
[
	"Australia",
	"Brazil",
	"Canada",
	"China",
	"France",
	"Germany",
	"India",
	"Indonesia",
	"Iran",
	"Italy",
	"Japan",
	"Mexico",
	"Netherlands",
	"Russia",
	"Saudi Arabia",
	"South Korea",
	"Spain",
	"United Kingdom",
	"United States"
]
```

NOTE: we check the IMF Estimate field against the empty string, since the empty string is considerd greater than any numeric value in Rego:

```plain
$ rq '"" > 0'
true
$ rq '"" > 100000'
true
```

### Index a list by a field

List country names by UN region for countries with an IMF GDP estimate of at least $1T USD.

```plain
$ rq -i csv -H '{g["UN Region"]: {g2.Country | g2 := input[_]; g2["UN Region"] == g["UN Region"]; i := g2["IMF Estimate"]; i != ""; i > 1000000} | g := input[_]} ' < gdp.csv
{
	"Africa": [],
	"Americas": [
		"Brazil",
		"Canada",
		"Mexico",
		"United States"
	],
	"Asia": [
		"China",
		"India",
		"Indonesia",
		"Iran",
		"Japan",
		"Saudi Arabia",
		"South Korea"
	],
	"Europe": [
		"France",
		"Germany",
		"Italy",
		"Netherlands",
		"Russia",
		"Spain",
		"United Kingdom"
	],
	"Oceania": [
		"Australia"
	]
}
```

### Aggregate values with grouping

List total IMF GDP estimate for each UN region.

```plain
$ rq -i csv -H '{g["UN Region"]: sum({i | g2 := input[_]; g2["UN Region"] == g["UN Region"]; i := g2["IMF Estimate"]; i != ""}) | g := input[_]}' < gdp.csv
{
	"Africa": 2.973028e+06,
	"Americas": 3.3147336e+07,
	"Europe": 2.4201466e+07,
	"Oceania": 2.046018e+06
}
```

### Join data from multiple objects

Add the UN region and IMF GDP estimate for each country in the `population.csv`, and filter to the top 5 countries by population.

```plain
$ rq -H -d population.csv -d gdp.csv '[object.union(p, object.filter(g, ["UN Region", "IMF Estimate"])) | p := data.population[_]; g := data.gdp[_]; p.country == g.Country; p.rank < 6]'
[
	{
		"IMF Estimate": 19911593,
		"UN Region": "Asia",
		"continent": "Asia",
		"country": "China",
		"population": 1412600000,
		"rank": 1,
		"source": "Official estimate"
	},
	{
		"IMF Estimate": 3534743,
		"UN Region": "Asia",
		"continent": "Asia",
		"country": "India",
		"population": 1380004385,
		"rank": 2,
		"source": "Official estimate"
	},
	{
		"IMF Estimate": 25346805,
		"UN Region": "Americas",
		"continent": "North America",
		"country": "United States",
		"population": 333016093,
		"rank": 3,
		"source": "National population clock"
	},
	{
		"IMF Estimate": 1289295,
		"UN Region": "Asia",
		"continent": "Asia",
		"country": "Indonesia",
		"population": 272248500,
		"rank": 4,
		"source": "Official estimate"
	},
	{
		"IMF Estimate": 347743,
		"UN Region": "Asia",
		"continent": "Asia",
		"country": "Pakistan",
		"population": 235824862,
		"rank": 5,
		"source": "UN projection"
	}
]
```

## Scripting

### Get airport info from an IATA code

This [script](./script.md) will use the `iata.json` file to print out airport information from an IATA airport code provided as an argument. Multiple codes can be specified to print out the info for multiple airports.

```rego
#!/usr/bin/env rq

# This sets the input file to the iata.json file. Note that the script needs
# to be executed from the same directory as the `data` folder.
# rq: input ./data/iata.json

# Before running the main query that produces the desired output, run the
# `check_codes` rule, which has a side-effect of crashing the script if the
# user didn't supply any codes as arguments.
# rq: silent-query data.script.check_codes

# The output from this script is determined by the query we put here. Note that
# since we didn't include a `package` statement, the script file is implied to
# have `package script`.
# rq: query data.script.output

# Cause output to be displayed in YAML format.
# rq: output-format yaml

# This script was written before Rego v1, and should be updated for v1
# compatibility.
# rq: v0-compatible true

codes := rq.args()

# Make sure that the user supplied at least one code. If not, exit with a
# helpful error.
check_codes {
	count(codes) == 0
	rq.error("usage: iata_info IATA_CODE [IATA_CODE]*")
}

# This is a partial rule keyed on the airport codes we got from the user as
# arguments. For each code we produce a pretty printed string.
output[code] = info {

	# code is any element of the codes array.
	code := codes[_]

	# The chosen code must appear in the input object; this statement is
	# only truthy if it does, so the rest of the rule body won't execute
	# if it isn't.
	input[code]

	# Produce our pretty printed info. By assigning the result of sprintf
	# to info, which appeared on the right hand side of the `=` in the rule
	# head, we are associating that result with the key `code`, which we
	# assigned above.
	info := sprintf("%s in %s, %s", [input[code].name, input[code].city, input[code].country])

}

# This creates an alternate case in the partial rule for the situation where a
# user-provided IATA code didn't appear in our database. The `not input[code]`
# is truthy if an only if the chosen `code` does not appear as a key in the
# input object.
#
# Because we had `input[code]` in the other rule body for the `output` partial
# rule, and we have `not input[code]` in this one, we can grantee that the two
# rule bodies will be disjoint and never produce an ambiguous assignment to
# `output[code]`.
output[code] = info {

	code := codes[_]
	not input[code]

	info := sprintf("no data found for IATA code '%s'", [code])
}
```

Here are some usage examples of this script:

```plain
$ ./iata_info
rq: error: usage: iata_info IATA_CODE [IATA_CODE]*
$ ./iata_info DFW ATL SFO
ATL: Hartsfield Jackson Atlanta International Airport in Atlanta, US
DFW: Dallas Fort Worth International Airport in Dallas-Fort Worth, US
SFO: San Francisco International Airport in San Francisco, US

$ ./iata_info nonexistent DFW ATL SFO
ATL: Hartsfield Jackson Atlanta International Airport in Atlanta, US
DFW: Dallas Fort Worth International Airport in Dallas-Fort Worth, US
SFO: San Francisco International Airport in San Francisco, US
nonexistent: no data found for IATA code 'nonexistent'
```

## `raw` input format

The `raw` input handler added in rq 0.0.6 by default simply loads the entire input into a single string. However, it includes several parameters to facilitate field and record splitting. This makes it a powerful tool for dealing with regularly structured data stored in nonstandard formats. This section shows a few examples of using these capabilities to parse data from existing shell tools that don't output structured data like JSON or YAML. 

### Parsing `ps aux`

The output of `ps aux` looks like the below (reproduced from [`psaux.txt`](./data/psaux.txt)):

```plain
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         749  0.0  0.0 344976 19668 ?        SNsl 10:57   0:02 /usr/sbin/NetworkManager --no-daemon
root         750  0.0  0.0 250224  8544 ?        SNsl 10:57   0:00 /usr/libexec/accounts-daemon
root         751  0.0  0.0   2812  2140 ?        SNs  10:57   0:02 /usr/sbin/acpid
avahi        753  0.0  0.0   8204  4796 ?        SNs  10:57   0:10 avahi-daemon: running [eos.local]
root         754  0.0  0.0  11572  5920 ?        SNs  10:57   0:02 /usr/lib/bluetooth/bluetoothd
root         756  0.0  0.0  86268  5964 ?        SNs  10:57   0:04 /usr/bin/system76-power daemon
root         757  0.0  0.0 213812  7872 ?        SNsl 10:57   0:03 /usr/bin/system76-scheduler daemon
root         760  0.0  0.0  19296  3004 ?        SNs  10:57   0:00 /usr/sbin/cron -f -P
root         764  0.0  0.0 247356  7948 ?        SNsl 10:57   0:05 /usr/libexec/iio-sensor-proxy
```

First, we need to split this into lines. We can do this using the record separator (`raw.rs`) parameter, which allows us to specify a regular expression to split the input into records with:

```plain
$ rq -I 'raw:raw.rs=\n:' < ./data/psaux.txt
[
        "USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND",
        "root         749  0.0  0.0 344976 19668 ?        SNsl 10:57   0:02 /usr/sbin/NetworkManager --no-daemon",
        "root         750  0.0  0.0 250224  8544 ?        SNsl 10:57   0:00 /usr/libexec/accounts-daemon",
        "root         751  0.0  0.0   2812  2140 ?        SNs  10:57   0:02 /usr/sbin/acpid",
        "avahi        753  0.0  0.0   8204  4796 ?        SNs  10:57   0:10 avahi-daemon: running [eos.local]",
        "root         754  0.0  0.0  11572  5920 ?        SNs  10:57   0:02 /usr/lib/bluetooth/bluetoothd",
        "root         756  0.0  0.0  86268  5964 ?        SNs  10:57   0:04 /usr/bin/system76-power daemon",
        "root         757  0.0  0.0 213812  7872 ?        SNsl 10:57   0:03 /usr/bin/system76-scheduler daemon",
        "root         760  0.0  0.0  19296  3004 ?        SNs  10:57   0:00 /usr/sbin/cron -f -P",
        "root         764  0.0  0.0 247356  7948 ?        SNsl 10:57   0:05 /usr/libexec/iio-sensor-proxy"
]
```

Next we'll want to split these records into fields (columns), which we can do using the field separator (`raw.fs`) parameter, which similarly takes a regular expression to split the fields on. We'll also switch to the `-o md-table` output format to make the result a little easier to read.

```plain
$ rq -o md-table -I 'raw:raw.rs=\n;raw.fs=[ \t]+:' < ./data/psaux.txt
| USER  | PID | %CPU | %MEM | VSZ    | RSS   | TTY | STAT | START | TIME | COMMAND                       |
|-------|-----|------|------|--------|-------|-----|------|-------|------|-------------------------------|
| root  | 749 | 0.0  | 0.0  | 344976 | 19668 | ?   | SNsl | 10:57 | 0:02 | /usr/sbin/NetworkManager      | --no-daemon |
| root  | 750 | 0.0  | 0.0  | 250224 | 8544  | ?   | SNsl | 10:57 | 0:00 | /usr/libexec/accounts-daemon  |
| root  | 751 | 0.0  | 0.0  | 2812   | 2140  | ?   | SNs  | 10:57 | 0:02 | /usr/sbin/acpid               |
| avahi | 753 | 0.0  | 0.0  | 8204   | 4796  | ?   | SNs  | 10:57 | 0:10 | avahi-daemon:                 | running     | [eos.local] |
| root  | 754 | 0.0  | 0.0  | 11572  | 5920  | ?   | SNs  | 10:57 | 0:02 | /usr/lib/bluetooth/bluetoothd |
| root  | 756 | 0.0  | 0.0  | 86268  | 5964  | ?   | SNs  | 10:57 | 0:04 | /usr/bin/system76-power       | daemon      |
| root  | 757 | 0.0  | 0.0  | 213812 | 7872  | ?   | SNsl | 10:57 | 0:03 | /usr/bin/system76-scheduler   | daemon      |
| root  | 760 | 0.0  | 0.0  | 19296  | 3004  | ?   | SNs  | 10:57 | 0:00 | /usr/sbin/cron                | -f          | -P          |
| root  | 764 | 0.0  | 0.0  | 247356 | 7948  | ?   | SNsl | 10:57 | 0:05 | /usr/libexec/iio-sensor-proxy |
```

This is almost correct, but we can see it's struggling with the `COMMAND` column, since some of the commands contain spaces. In this situation, we can use `raw.coalesce=N`, which instructs the `raw` input handler to stop splitting fields after the `Nth`. Since `COMMAND` is the 11th column, we can use `N=11` like so:

```plain
$ rq -o md-table -I 'raw:raw.rs=\n;raw.fs=[ \t]+;raw.coalesce=11:' < ./data/psaux.txt
| USER  | PID | %CPU | %MEM | VSZ    | RSS   | TTY | STAT | START | TIME | COMMAND                              |
|-------|-----|------|------|--------|-------|-----|------|-------|------|--------------------------------------|
| root  | 749 | 0.0  | 0.0  | 344976 | 19668 | ?   | SNsl | 10:57 | 0:02 | /usr/sbin/NetworkManager --no-daemon |
| root  | 750 | 0.0  | 0.0  | 250224 | 8544  | ?   | SNsl | 10:57 | 0:00 | /usr/libexec/accounts-daemon         |
| root  | 751 | 0.0  | 0.0  | 2812   | 2140  | ?   | SNs  | 10:57 | 0:02 | /usr/sbin/acpid                      |
| avahi | 753 | 0.0  | 0.0  | 8204   | 4796  | ?   | SNs  | 10:57 | 0:10 | avahi-daemon: running [eos.local]    |
| root  | 754 | 0.0  | 0.0  | 11572  | 5920  | ?   | SNs  | 10:57 | 0:02 | /usr/lib/bluetooth/bluetoothd        |
| root  | 756 | 0.0  | 0.0  | 86268  | 5964  | ?   | SNs  | 10:57 | 0:04 | /usr/bin/system76-power daemon       |
| root  | 757 | 0.0  | 0.0  | 213812 | 7872  | ?   | SNsl | 10:57 | 0:03 | /usr/bin/system76-scheduler daemon   |
| root  | 760 | 0.0  | 0.0  | 19296  | 3004  | ?   | SNs  | 10:57 | 0:00 | /usr/sbin/cron -f -P                 |
| root  | 764 | 0.0  | 0.0  | 247356 | 7948  | ?   | SNsl | 10:57 | 0:05 | /usr/libexec/iio-sensor-proxy        |
```

### Parsing `sysctl`

The output of `sysctl` on Linux looks similar to the below, which is reproduced from [`sysctl.txt`](./data/sysctl.txt):

```plain
fs.aio-max-nr = 65536
fs.aio-nr = 0
fs.binfmt_misc.status = enabled
fs.dentry-state = 158811        126899  45      0       31201   0
fs.dir-notify-enable = 1
fs.epoll.max_user_watches = 7234582
fs.fanotify.max_queued_events = 16384
fs.fanotify.max_user_groups = 128
fs.fanotify.max_user_marks = 263281
fs.file-max = 9223372036854775807
fs.file-nr = 14219      0       9223372036854775807
fs.inode-nr = 123599    530
fs.inode-state = 123599 530     0       0       0       0       0
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 65536
```

We could use the `raw` input format to parse this into simple key-value pairs delimited with `=` like so:

```plain
$ rq -I '{"format": "raw", "options": {"raw.rs": "\n", "raw.fs": "=", "raw.cutset": " "}}' < data/sysctl.txt
[
        [
                "fs.aio-max-nr",
                "65536"
        ],
        [
                "fs.aio-nr",
                "0"
        ],
        [
                "fs.binfmt_misc.status",
                "enabled"
        ],
        [
                "fs.dentry-state",
                "158811        126899  45      0       31201   0"
        ],
        [
                "fs.dir-notify-enable",
                "1"
        ],
        [
                "fs.epoll.max_user_watches",
                "7234582"
        ],
        [
                "fs.fanotify.max_queued_events",
                "16384"
        ],
        [
                "fs.fanotify.max_user_groups",
                "128"
        ],
        [
                "fs.fanotify.max_user_marks",
                "263281"
        ],
        [
                "fs.file-max",
                "9223372036854775807"
        ],
        [
                "fs.file-nr",
                "14219      0       9223372036854775807"
        ],
        [
                "fs.inode-nr",
                "123599    530"
        ],
        [
                "fs.inode-state",
                "123599 530     0       0       0       0       0"
        ],
        [
                "fs.inotify.max_queued_events",
                "16384"
        ],
        [
                "fs.inotify.max_user_instances",
                "128"
        ],
        [
                "fs.inotify.max_user_watches",
                "65536"
        ]
]
```

Notice that we used a JSON formatted [dataspec](./dataspec.md) with `-I`, since `=` isn't an allowed character to appear as a key or value in a concise formatted dataspec. The `raw.cutset` option lets us specify characters to be trimmed from the individual fields after all splitting is complete. This is useful since we only split on the `=` character in this example, so otherwise we might end up with leading or trailing spaces on some of the fields.

Depending on our use case, we might also want to further split the individual fields within the right hand side of the `=`, we could do that by just changing the `raw.fs` regex:

```plain
$ rq -I '{"format": "raw", "options": {"raw.rs": "\n", "raw.fs": "[= ]+", "raw.cutset": " "}}' < data/sysctl.txt
[
        [
                "fs.aio-max-nr",
                "65536"
        ],
        [
                "fs.aio-nr",
                "0"
        ],
        [
                "fs.binfmt_misc.status",
                "enabled"
        ],
        [
                "fs.dentry-state",
                "158811",
                "126899",
                "45",
                "0",
                "31201",
                "0"
        ],
        [
                "fs.dir-notify-enable",
                "1"
        ],
        [
                "fs.epoll.max_user_watches",
                "7234582"
        ],
        [
                "fs.fanotify.max_queued_events",
                "16384"
        ],
        [
                "fs.fanotify.max_user_groups",
                "128"
        ],
        [
                "fs.fanotify.max_user_marks",
                "263281"
        ],
        [
                "fs.file-max",
                "9223372036854775807"
        ],
        [
                "fs.file-nr",
                "14219",
                "0",
                "9223372036854775807"
        ],
        [
                "fs.inode-nr",
                "123599",
                "530"
        ],
        [
                "fs.inode-state",
                "123599",
                "530",
                "0",
                "0",
                "0",
                "0",
                "0"
        ],
        [
                "fs.inotify.max_queued_events",
                "16384"
        ],
        [
                "fs.inotify.max_user_instances",
                "128"
        ],
        [
                "fs.inotify.max_user_watches",
                "65536"
        ]
]
```

Finally, we can also ask `raw` to perform type inference for us, similar to the default for the `csv` input handler:

```plain
$ rq -I '{"format": "raw", "options": {"raw.rs": "\n", "raw.fs": "[= ]+", "raw.cutset": " ", "raw.infer": true}}' < data/sysctl.txt
[
        [
                "fs.aio-max-nr",
                65536
        ],
        [
                "fs.aio-nr",
                0
        ],
        [
                "fs.binfmt_misc.status",
                "enabled"
        ],
        [
                "fs.dentry-state",
                158811,
                126899,
                45,
                0,
                31201,
                0
        ],
        [
                "fs.dir-notify-enable",
                1
        ],
        [
                "fs.epoll.max_user_watches",
                7234582
        ],
        [
                "fs.fanotify.max_queued_events",
                16384
        ],
        [
                "fs.fanotify.max_user_groups",
                128
        ],
        [
                "fs.fanotify.max_user_marks",
                263281
        ],
        [
                "fs.file-max",
                9223372036854775807
        ],
        [
                "fs.file-nr",
                14219,
                0,
                9223372036854775807
        ],
        [
                "fs.inode-nr",
                123599,
                530
        ],
        [
                "fs.inode-state",
                123599,
                530,
                0,
                0,
                0,
                0,
                0
        ],
        [
                "fs.inotify.max_queued_events",
                16384
        ],
        [
                "fs.inotify.max_user_instances",
                128
        ],
        [
                "fs.inotify.max_user_watches",
                65536
        ]
]
```

## `rq` <-> `jq` Rosetta Stone

`jq` examples in this section are heavily based on [this blog post](https://schoeffm.github.io/posts/jq-by-example/).

### Extract a single property from an object

Extract the IATA airports reachable on direct flights from CAE.

`jq`:

```plain
$ jq '.CAE' < flights.json
[
  "ATL",
  "CLT",
  "DFW",
  "DCA",
  "IAD",
  "ORD",
  "PHL",
  "LGA",
  "MIA",
  "XNA"
]
```

`rq`:

```plain
$ rq 'input.CAE' < flights.json
[
	"ATL",
	"CLT",
	"DFW",
	"DCA",
	"IAD",
	"ORD",
	"PHL",
	"LGA",
	"MIA",
	"XNA"
]
```

### Extract a field from every object in a list

Extract the `country` field from the population data. Only the first 10 results are shown for brevity.

`jq`:

```plain
$ rq -i csv -H < population.csv | jq '.[] | .country' | sort | head
"Afghanistan"
"Albania"
"Algeria"
"Andorra"
"Angola"
"Antigua and Barbuda"
"Argentina"
"Armenia"
"Australia"
"Austria"
```

`jq` (with raw output):

```plain
$ rq -i csv -H < population.csv | jq -r '.[] | .country' | sort | head
Afghanistan
Albania
Algeria
Andorra
Angola
Antigua and Barbuda
Argentina
Armenia
Australia
Austria
```

`rq`:

```plain
$ rq --raw -i csv -H '{p.country | p := input[_]}' < population.csv | sort | head
Afghanistan
Albania
Algeria
Andorra
Angola
Antigua and Barbuda
Argentina
Armenia
Australia
Austria
```

NOTE: `rq` is used here to convert the CSV file to JSON format for `jq`, since to my knowledge, `jq` does not support CSV inputs.

### Determine the length of an array

Find how many direct flights there are from CAE airport.

`jq`:

```plain
$ jq '.CAE | length' < flights.json
10
```

`rq`:

```plain
$ rq 'count(input.CAE)' < flights.json
10
```

### Extract the keys from an object

Find all IATA airport codes we have country/name data for. Only the first 10 are listed for brevity.

`jq`:

```plain
$ jq -r '. | keys | .[]' < iata.json  | sort | head
ABE
ABI
ABQ
ABV
ABY
ABZ
ACA
ACC
ACK
ACT
```

NOTE: the `| .[]` is only needed here to make the output compatible with `-r`.

```plain
$ rq --raw '{k | input[k]}' < iata.json | sort | head
ABE
ABI
ABQ
ABV
ABY
ABZ
ACA
ACC
ACK
ACT
```

### Filter entries by value

Find all countries with IMF GDP estimates of at least $1T USD.

`jq`:

```plain
$ rq -i csv -H < gdp.csv | jq -r '.[] | select(."IMF Estimate" != "") | select(."IMF Estimate" > 1000000) | .Country'
United States
China
Japan
Germany
India
United Kingdom
France
Canada
Italy
Brazil
Russia
South Korea
Australia
Iran
Spain
Mexico
Indonesia
Saudi Arabia
Netherlands
```

`rq`:


```plain
$ rq -i csv -H '{g.Country | g := input[_]; i := g["IMF Estimate"]; i != ""; i > 1000000}' < gdp.csv
[
	"Australia",
	"Brazil",
	"Canada",
	"China",
	"France",
	"Germany",
	"India",
	"Indonesia",
	"Iran",
	"Italy",
	"Japan",
	"Mexico",
	"Netherlands",
	"Russia",
	"Saudi Arabia",
	"South Korea",
	"Spain",
	"United Kingdom",
	"United States"
]
```

NOTE: `rq` is used here to convert the CSV file to JSON format for `jq`, since to my knowledge, `jq` does not support CSV inputs.

## `rq` <-> SQL Rosetta Stone

### Select a Record by Primary Key

Select the UN region and IMF GDP estimate for France.

`sqlite`:

```plain
$ sqlite3 :memory: -cmd '.mode csv' -cmd '.import gdp.csv gdp'   'SELECT "IMF Estimate","UN Region" FROM gdp WHERE Country = "France"'
2936702,Europe
```

`rq`:

```plain
$ rq -o csv -H --input gdp.csv '{object.filter(rec, ["IMF Estimate", "UN Region"]) | rec := input[_]; rec.Country == "France"}'
IMF Estimate,UN Region
2936702,Europe
```

### Aggregate with Criteria

Get the total IMF Estimate GDP of the Europe UN region.

`sqlite`:

```plain
$ sqlite3 :memory: -cmd '.mode csv' -cmd '.import gdp.csv gdp'   'SELECT SUM("IMF Estimate") FROM gdp WHERE "UN Region" = "Europe"'
24201466.0
```

`rq`:

```plain
$ rq -H --input gdp.csv 'sum({g  | rec := input[_]; rec["UN Region"] == "Europe"; g := rec["IMF Estimate"]; is_number(g)})'
2.4201466e+07
```

NOTE: the `is_number()` is required here, since there is at least one row that has an empty value in the `IMF Estimate` column, which `rq` treats as an empty string.

### Inner Join

Get the IMF estimate and population for all countries. Only the first 10 results are shown for brevity.

`sqlite`:

```plain
$ sqlite3 :memory: -cmd '.mode csv' -cmd '.import gdp.csv gdp' -cmd '.import population.csv pop'  'SELECT gdp.Country,"IMF Estimate",population FROM gdp INNER JOIN pop ON gdp.Country = pop.country' | sort | head
Afghanistan,20136,32890171
Albania,17942,2829741
Algeria,193601,45400000
Andorra,3400,79535
Angola,124862,33086278
"Antigua and Barbuda",1621,99337
Argentina,564277,47327407
Armenia,14047,2963900
Australia,1748334,25939679
Austria,479815,9061848
```

`rq`:

```plain
$ rq -o csv -H --data population.csv --data gdp.csv '{{"country": p.country, "population": p.population, "gdp": g["IMF Estimate"]} | g := data.gdp[_]; p := data.population[_]; g.Country == p.country}' | head
country,gdp,population
Afghanistan,20136,32890171
Albania,17942,2829741
Algeria,193601,45400000
Andorra,3400,79535
Angola,124862,33086278
Antigua and Barbuda,1621,99337
Argentina,564277,47327407
Armenia,14047,2963900
Australia,1748334,25939679
```

NOTE: the row sort order for a `set()` of objects in Rego is undefined, but it happens to be sorted in this situation. If your use case depends on ordering, you should sort explicitly, and/or use arrays.
