Training
Get a free hour of SANS training

Experience SANS training through course previews.

Learn More
Learning Paths
Can't find what you are looking for?

Let us help.

Contact us
Resources
Join the SANS Community

Become a member for instant access to our free resources.

Sign Up
For Organizations
Interested in developing a training plan to fit your organization’s needs?

We're here to help.

Contact Us
Talk with an expert

Parsing Zeek JSON Logs with JQ

Authored byJoshua Wright
Joshua Wright

Authored by Joshua Wright | josh@willhackforsushi.com

slingshot:~$ cat package.json
{"name": "metroezpark","version": "1.0.0","description": "Metro EZ Park Gate Status","main": 
"server.js","dependencies": {"npm": "^6.0.1","socket.io": "^2.0.1"},"devDependencies": {},
"scripts": {"test": "echo \
"Error: no test specified\" && exit 1","start": "node server.js"},"author": "",
"license": "ISC"}

... into something nicer to look at:

slingshot:~$ cat package.json | jq
{
  "name": "metroezpark",
  "version": "1.0.0",
  "description": "Metro EZ Park Gate Status",
  "main": "server.js",
  "dependencies": {
    "npm": "^6.0.1",
    "socket.io": "^2.0.1"
  },
  "devDependencies": {},
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1",
    "start": "node server.js"
  },
  "author": "",
  "license": "ISC"
}

 

Zeek and JSON


Zeek (formerly Bro) is a network security monitoring system. Among other things, it allows us to take a packet capture and summarize the network events into several different log files. By default, Zeek exports the logging data in a tab-delimited format. With a little tweaking, Zeek can also export logs in JSON format:

slingshot:~$ bro -Cr merged.pcap -e 'redef LogAscii::use_json=T;'
slingshot:~$ head -1 conn.log
{"ts":1554410064.698965,"uid":"CMreaf3tGGK2whbqhh","id.orig_h":"192.168.144.130","id.orig_p"
:64277,
"id.resp_h":"192.168.144.2","id.resp_p":53,"proto":"udp","service":"dns","duration":
0.320463,"orig_bytes"
:94,"resp_bytes":316,"conn_state":"SF","missed_bytes":0,"history":"Dd","orig_pkts":2,
"orig_ip_bytes":150,
"resp_pkts":2,"resp_ip_bytes":372,"tunnel_parents":[]}

Once the Zeek logs are in JSON format, we're ready to start extracting data using JQ!

 

JQ and Zeek Object Access


Zeek's conn.log file is used to summarize TCP/UDP/ICMP connections. We can use JQ to examine the fields in the connection objects:

slingshot:~$ head -1 conn.log | jq
{
  "ts": 1554410064.698965,
  "uid": "CMreaf3tGGK2whbqhh",
  "id.orig_h": "192.168.144.130",
  "id.orig_p": 64277,
  "id.resp_h": "192.168.144.2",
  "id.resp_p": 53,
  "proto": "udp",
  "service": "dns",
  "duration": 0.320463,
  "orig_bytes": 94,
  "resp_bytes": 316,
  "conn_state": "SF",
  "missed_bytes": 0,
  "history": "Dd",
  "orig_pkts": 2,
  "orig_ip_bytes": 150,
  "resp_pkts": 2,
  "resp_ip_bytes": 372,
  "tunnel_parents": []
}

I used head -1 here just to look at the first conn.log record. The Zeek log summarizes the connection including source and destination addresses, ports, protocol (TCP, UDP, or ICMP), service (DNS, HTTP, etc.), packets transferred, bytes exchanged, and more.

With JQ you can select specific records from the Zeek log in your query. For example, to obtain the duration value for all connections, add the '.duration' argument:

slingshot:~$ head -10 conn.log | jq '.duration'
0.320463
0.000602
0.000923
0.00061
0.000602
0.00106
0.271645
0.000756
0.001645
0.001305

(For brevity, I've limited the output here to 10 records. We'll change that shortly.)

Notice here that I've taken the field name duration and added a dot (.) to the field name to reference it with JQ (.duration). This is necessary to access the object member in the JSON record produced by Zeek.

So far, you might wonder if this is terribly useful, since we could probably accomplish similar functionality with grep. However, consider adding additional fields to the query:

slingshot:~$ head -10 conn.log | jq -j '.duration, ", ", .proto, "\n"'
0.320463, udp
0.000602, udp
0.001859, udp
0.000654, udp
0.019871, udp
0.001863, udp
0.000951, udp
0.037681, tcp
0.000341, tcp
0.00068, udp

Here I've added an argument to jq -- -j, which causes the output to be joined together without adding a newline. I've also added a delimiter of ", " and a newline at the end of the query.

This isn't terribly useful yet, particularly without adding the originating and responding IP addresses. Because Zeek includes a . in these field names, the syntax for accessing these members is a little different:

slingshot:~$ head -10 conn.log | jq -j '.duration, ", ", .proto, ", ", \
    .["id.orig_h"], ":", .["id.orig_p"], ", ", \
    .["id.resp_h"], ":", .["id.resp_p"], "\n"'
0.320463, udp, 192.168.144.130:64277, 192.168.144.2:53
0.000602, udp, 192.168.144.130:55106, 192.168.144.2:53
0.001859, udp, 192.168.144.130:53881, 192.168.144.2:53
0.000654, udp, 192.168.144.130:53785, 192.168.144.2:53
0.019871, udp, 192.168.144.130:60696, 192.168.144.2:53
0.001863, udp, 192.168.144.130:59251, 192.168.144.2:53
0.000951, udp, 192.168.144.130:58172, 192.168.144.2:53
0.037681, tcp, 192.168.52.130:49965, 216.58.217.35:443
0.000341, tcp, 192.168.52.130:49960, 173.194.152.39:80
0.00068, udp, 192.168.52.130:57233, 192.168.52.2:53

Note that in this example I've broken up this long command into multiple lines with a backslash at the end of each line. If you type this on one long line, omit the backslashes.

In order to reference JSON object fields that include a ., we have to use the familiar leading-dot syntax, followed by square brackets and quotation marks. (For example, accessing id.orig_h shown above is denoted as .["id.orig_h"].

Now that you know the basics of accessing Zeek JSON objects with JQ, let's take a look at using functions.

 

JQ Functions and Zeek


The JQ select function allows us to perform a Boolean operation on an identified field, returning the record if the operation returns true. For example, we can select all of the records where the number of response bytes (resp_bytes) is greater or less than a specified value:

sans@slingshot:~$ cat conn.log | jq 'select(.resp_bytes > 300000)'
{
  "ts": 1555622865.402479,
  "uid": "ClfquL1f58gwiJGY32",
  "id.orig_h": "192.168.52.132",
  "id.orig_p": 8,
  "id.resp_h": "13.107.21.200",
  "id.resp_p": 0,
  "proto": "icmp",
  "duration": 20135.048521,
  "orig_bytes": 607936,
  "resp_bytes": 600096,
  "conn_state": "OTH",
  "missed_bytes": 0,
  "orig_pkts": 18998,
  "orig_ip_bytes": 1139880,
  "resp_pkts": 18753,
  "resp_ip_bytes": 1125180,
  "tunnel_parents": []
}

The Boolean expression accepts and and or modifiers to add additional query elements. Here we apply a similar query, limiting the results to TCP streams:

sans@slingshot:~$ cat conn.log | jq 'select(.resp_bytes > 100000 and .proto == "tcp")'
{
  "ts": 1555622836.884612,
  "uid": "CHq8ln1G4itLTu76d2",
  "id.orig_h": "192.168.52.130",
  "id.orig_p": 49970,
  "id.resp_h": "216.58.217.54",
  "id.resp_p": 443,
  "proto": "tcp",
  "service": "ssl",
  "duration": 9.978087,
  "orig_bytes": 1276,
  "resp_bytes": 296403,
  "conn_state": "SF",
  "missed_bytes": 0,
  "history": "ShADadFf",
  "orig_pkts": 98,
  "orig_ip_bytes": 5208,
  "resp_pkts": 225,
  "resp_ip_bytes": 305407,
  "tunnel_parents": []
}

Another useful JQ feature is the sort_by function, allowing you to sort the query results in the order in a predictable order. In this example I sort the results of the Zeek log by the stream duration:

slingshot:~$ cat conn.log | jq -s 'sort_by(.duration)'
[
  {
    "ts": 1555622836.134114,
    "uid": "C6qkkP27JNdwc63GKf",
    "id.orig_h": "192.168.52.130",
    "id.orig_p": 49960,
    "id.resp_h": "173.194.152.39",
    "id.resp_p": 80,
    "proto": "tcp",
    "duration": 0.000341,
    "orig_bytes": 0,
    "resp_bytes": 0,
    "conn_state": "SF",
    "missed_bytes": 0,
    "history": "fAFa",
    "orig_pkts": 2,
    "orig_ip_bytes": 80,
    "resp_pkts": 2,
    "resp_ip_bytes": 80,
    "tunnel_parents": []
  },
  {
    "ts": 1554410461.862738,
    "uid": "Chblb02SMQnKBiPPdg",
    "id.orig_h": "192.168.144.130",
    "id.orig_p": 55106,
... snip

If you only want the first record you can pipe the results (within the JQ expression) to .[0] as an indexed object (e.g. 0 is first record, 1 would follow, etc.)

slingshot:~$ cat conn.log | jq -s 'sort_by(.duration) | .[0]'
{
  "ts": 1555622836.134114,
  "uid": "C6qkkP27JNdwc63GKf",
  "id.orig_h": "192.168.52.130",
  "id.orig_p": 49960,
  "id.resp_h": "173.194.152.39",
  "id.resp_p": 80,
  "proto": "tcp",
  "duration": 0.000341,
  "orig_bytes": 0,
  "resp_bytes": 0,
  "conn_state": "SF",
  "missed_bytes": 0,
  "history": "fAFa",
  "orig_pkts": 2,
  "orig_ip_bytes": 80,
  "resp_pkts": 2,
  "resp_ip_bytes": 80,
  "tunnel_parents": []
}

In this sort order, JQ is showing us the smallest duration first. Piping the results to the reverse function will reverse the sort order:

slingshot:~$ cat conn.log | jq -s 'sort_by(.duration) | reverse | .[0]'
{
  "ts": 1554410064.698965,
  "uid": "CMreaf3tGGK2whbqhh",
  "id.orig_h": "192.168.144.130",
  "id.orig_p": 64277,
  "id.resp_h": "192.168.144.2",
  "id.resp_p": 53,
  "proto": "udp",
  "service": "dns",
  "duration": 0.320463,
  "orig_bytes": 94,
  "resp_bytes": 316,
  "conn_state": "SF",
  "missed_bytes": 0,
  "history": "Dd",
  "orig_pkts": 2,
  "orig_ip_bytes": 150,
  "resp_pkts": 2,
  "resp_ip_bytes": 372,
  "tunnel_parents": []
}

 

Conclusion


JQ is a powerful tool, and a little experience in using it to access JSON data can be a valuable asset in your toolbox. The best way to learn JQ is to experiment: one terminal open with a JSON file and jq, and a browser window open to the JQ manual.

Got ideas or questions about JQ? Leave a comment or email me.

Parsing Zeek JSON Logs with JQ | SANS Institute