Originally published January 31, 2019 @ 12:12 am

OK, so both of these things have been around forever and will be around long after we’re gone. It’s worth your time to learn how to use the to together.

Frequency Histogram

I have a DHCP server that logs to /var/log/messages. Relevant lines in the log file look something like this:

[root@ncc1711:~] # zgrep -h "$(date +'%b').*dhcpd" /var/log/messages* | head -5
Jan 27 03:42:36 ncc1711 dhcpd: DHCPDISCOVER from 00:06:80:01:2f:83 via eth1
Jan 27 03:42:37 ncc1711 dhcpd: DHCPOFFER on 192.168.122.161 to 00:06:80:01:2f:83 via eth1
Jan 27 03:42:37 ncc1711 dhcpd: DHCPREQUEST for 192.168.122.161 (192.168.122.167) from 00:06:80:01:2f:83 via eth1
Jan 27 03:42:37 ncc1711 dhcpd: DHCPACK on 192.168.122.161 to 00:06:80:01:2f:83 via eth1
Jan 27 03:42:37 ncc1711 dhcpd: DHCPRELEASE of 192.168.122.161 from 00:06:80:01:2f:83 via eth1 (found)

Let’s say I want to get a frequency histogram (a bar chart showing the number of occurrences of something during a given period of time) of DHCPREQUEST events every day for the current month. Here’s how we do it:

zgrep -h $(date +'%b').*DHCPREQUEST /var/log/messages* | awk '{print $1,$2,$3}' | sort -k1M -k2n -k3V | uniq -c -w6 | gnuplot -e "set terminal dumb 84 28; unset key; set style data labels; set xdata time; set xlabel 'Day'; set ylabel 'DHCP Requests'; set autoscale; set timefmt '%b %d'; plot '-' using 2:1:ytic(1) with lines"

And the result:

DHCP Requests
   150 +++-+-+-+--+-+-+-+-+-+-+-*-+--+-+-+-+-+-+-+-+--+-+-+-+-+-+-+-+-+--+-+-+-+++
       +              +         *    +             +              +              +
       |                        *                                                |
       |                       **                                                |
       |                       **                                                |
       |                       * *                                               |
       |                       * *                                               |
       |                       * *                                               |
       |                       * *                                               |
   124 ++                     *  *                    *     **                  ++
   119 ++                     *  *                    **    * *                 ++
   116 ++           *         *  *             *     * *    * *                 ++
       |            **       *   *             **    *  *  *  *                  |
   112 ++          * *       *    *    *      *  *  *   *  *   *                ++
   109 ++          *  *     *     *   * *     *   * *   *  *   *  *             ++
   105 ++          *  *     *     *   * *   **     *     * *   *  **            ++
   102 ++*     *** *   *   *      * **   ***             **    * *  *           ++
   100 ++**    *  *    *   *      **                      *     **              ++
    96 +*  *  *   *     ***                               *     *               ++
    94 +*   * *                                                 *               ++
       |*   **                                                                   |
    89 +*    *                                                                  ++
       *              +              +             +              +              +
    82 *++-+-+-+--+-+-+-+-+-+-+-+-+--+-+-+-+-+-+-+-+--+-+-+-+-+-+-+-+-+--+-+-+-+++
     01/01          01/08          01/15         01/22          01/29          02/05
                                           Day

Lets disassemble the syntax a bit. This first part extracts relevant log lines from /var/log/messages*, prints only the date portion of the line, and sorts it chronologically:

zgrep -h $(date +'%b').*DHCPREQUEST /var/log/messages* | awk '{print $1,$2,$3}' | sort -k1M -k2n -k3V

The next piece is a scarcely used but very handy feature of the `uniq` command that tells it to compare only the first six characters (-w6) of each line and prepend (-c) each line with the number of matched occurrences:

uniq -c -w6

Finally, we tell gnuplot that our time format is short month name and day number (%b %d) with a space in between (considered as a single column) and that we want to generate a timeline chart using field 2 for the date (x-axis) and field 1 for data (y-axis).

Just keep in mind that, since we specified date/time format as %b %d, the space is part of this specification, therefore, month and day are considered one column. This is why it is field 2 and not 2-3. Really, understand this. This is easily the most common mistake people make when doing gnuplot time charts.

Depending on the type of the timestamp used by the log file, you may run into some issues. It just so happens that today in January 30th and I am looking at /var/adm/messgaes. The standard syslog time format is %b %d %H:%M:%S. There is no year. The sort-by-month option of sort will have some issues with this. Not knowing what year it is, it will logically assume that December comes after January. So, you may need to get creative in how you sort the timestamps.

Just quick departure from the main subject. Regretfully, rsyslog follows the RFC 3164 (The BSD Syslog Protocol) and has no provision for a year component in the log timestamp. If this is something you really need, you probably should switch to syslog-ng.

Now, let’s do the same chart but aggregate data on hourly rather than daily basis:

zgrep -h $(date +'%b').*DHCPREQUEST /var/log/messages* | awk '{print $1,$2,$3}' | sort -k1M -k2n -k3V | uniq -c -w9 | gnuplot -e "set terminal dumb 84 28; unset key; set style data labels; set xdata time; set xlabel 'Day'; set ylabel 'DHCP Requests'; set autoscale; set timefmt '%b %d'; plot '-' using 2:1:ytic(1) with lines"

  DHCP Requests
   20 +++-+-+--+-+-+-+-+-+-+--+-*-+-+-+-+-+--+-*-+-+-+-+-+--+-+-+-+-+-+-+--+-+-+++
      +              +          *   +          *   +              +              +
   18 ++                        *              *            *                   ++
      |                         *              *            *                    |
      |                         *              *            *                    |
   16 ++                        *              **           *                   ++
   15 ++                        *              *** *        *                   ++
   14 ++                        *              *** *   *    *                   ++
   13 ++                        *              *** *   *    *     *             ++
      |                         *              *****   *    *     *              |
   12 ++                   *  * *              ***** * *   **     *             ++
   11 ++                   *  * *   *          ******* *   ** *   *             ++
   10 ++                   *  * *   * * *    * ***** * * ** * *   *             ++
    9 ++*      *   * *   * *  * * * * * * *  * ***** *** ** * *   ***           ++
      | *      *   * *   * *  * * * * * * *  * * *** * * *  * *   ***            |
    8 *+*      ********  * *  * * * * *****  * * * * * * *  * * * * *           ++
    7 *+* * *  *** ***** * ** ***** * ****** * * * * * ***  * * *** *           ++
    6 *** * ** *** *** ***** ********** *** **** * * * * *  ***** * *           ++
    5 *** *** **** *** * ***  * * * * * *** **** * * * * *  ***** * *           ++
    4 *******  * * * * * ***  * * * * * * *  *** * * * * *  * * * * *           ++
      * *** *  * * * * * * *  * * * * * * *  * * * * * * *  * * * * *            |
    3 *+* * *  * * * * * * *  * * * * * * *  * * * * * * *  * * * * *           ++
    2 *+* * *  * * * * * * *  * * * * * * *  * * * * * * *  * * * * *           ++
    1 *+*-*-*--+-*-*-*-*-*-+--*-*-*-*-*-+-*--*-*-*-*-*-*-*--*-+-*-*-*-+-+--+-+-+++
    01/01          01/08          01/15          01/22          01/29          02/05
                                          Day

Remembering this syntax every time you want to review a log file is not practical. A better option would be write a little helper script. As input, the script will take whatever relevant log file lines you throw at it from STDIN, as well as a couple of command-line arguments to help it correctly sort and summarize the data. You can download this script here.

#!/bin/bash

while getopts ":hf:k:s:p:" OPTION
do
  case "${OPTION}" in
  h) cat << EOF
  Usage: gtl [option] [arg]
  Generate frequency histogram from log data.

  Options :
     -h : shows the help page
     -s [arg] : Specify sorting options (default is '-k1M -k2n -k3V')
     -f [arg] : Specify the fields to analyze (default is all)
     -k [arg] : Specify the field separator (default is space)
     -p [arg] : Specify the character position to summarize (default is 6)

  Example :
     gtl -f " " -p 6 -k "@" -s "-k1M -k2n -k3V"

EOF
     exit 0
     ;;
  f) f="${OPTARG}"
     ;;
  k) k="${OPTARG}"
     ;;
  s) s="${OPTARG}"
     ;;
  p) p="${OPTARG}"
    ;;
  \?) printf "%s\n" "Invalid option. Type gtl -h for help"
      exit 191
      ;;
  *) printf "%s\n" "Invalid argument. Type gtl -h for help"
     exit 192
      ;;
  esac
done

if [ -z "${f}" ]; then f=" "; fi
if [ -z "${p}" ]; then p=6; fi
if [ -z "${k}" ]; then k="@"; fi
if [ -z "${s}" ]; then s="-k1M -k2n -k3V"; fi

awk -F"$f" -v k=$k '{split(k,a,","); for (key in a) { printf $a[key]" " } print"\n"}' | \
grep . | \
sort ${s} | \
uniq -c -w${p} | \
gnuplot -e "set terminal dumb 84 28; unset key; set style data labels; set xdata time; set xlabel 'Timeline'; set ylabel 'Events'; set autoscale; set timefmt '%b %d'; plot '-' using 2:1:ytic(1) with histeps"

The default options for gtl are -f " " -p 6 -k "1,2,3" -s "-k1M -k2n -k3V", which works fine for your standard syslog format. So now I can use it like so:

zgrep -h Jan.*DHCPREQUEST.*192.168.122.161 /var/log/messages* | gtl
  Events
   37 +++-+-+--+-+-+-+-+-+-+--+-+-+-+-+-+-+--+-+***+-+-+-+--+-+-+-+-+-+-+--+-+-+++
      +              +              +           * *+              +              +
      |                                         * *                              |
      |                                         * *                              |
      |                                         * *                              |
   34 ++                                        * * ***                         ++
      |                                         * * * *                          |
      |                                         * * * *                          |
      |                                         * * * *                          |
      |                                         * * * *                          |
      |                                         * * * *                          |
      |                                         * * * *                          |
      |                                         * * * *                          |
      |                                         * * * *                          |
      |                                         * * * *                          |
      |                                         * * * *                          |
   28 ++                       ***              * * * *              ***        ++
      |                        * *              * * * *              * *         |
   27 **                       * *              * * * *              * *        ++
   26 +*              ***      * *              * * * *              * *        ++
      |*              * *      * *              * * * *              * *         |
   25 +* ***  ******* * * **** * ******* ******** *** ***  *** *** *** *        ++
      +* * *  *     *+* * *  * *    +  * *         +    *  * * * *+*   *         +
   24 +***+****+-+-+***+***+-***+-+-+-+***+--+-+-+-+-+-+****+***+***+-+*+--+-+-+++
    01/01          01/08          01/15          01/22          01/29          02/05
                                       Timeline

But for other common log formats you would need to specify some parameters. For example, the xferlog used by vsftpd has a slightly different timestamp format: Mon Nov 6 17:52:03 2017, adding the day of week to the beginning of the timestamp. In this case the options for the gtl command would change a bit:

zgrep 2018.*jsmith /var/log/xferlog* | gtl -k 2,3,4
  Events
   19 +++-+-+--+-+********************+-+-+--+-+-+-+-+-+-+--+-+-+-+-+-+-+--+-+-+++
      +           *  +              +*             +              +              +
      |           *                  *                                           |
      |           *                  *                                           |
      |           *                  *                                           |
      |           *                  *                                           |
      |           *                  *                                           |
      |           *                  *                                           |
      |           *                  *                                           |
      |           *                  *                                           |
      |           *                  *                                           |
   10 ++        ***                  *                                          ++
      |         *                    *                                           |
      |         *                    *                                           |
      |         *                    *                                           |
    7 ++        *                    *                  **********              ++
      |         *                    *                  *        *               |
      |         *                    *                  *        *               |
      |         *                    *                  *        *               |
      |         *                    *                  *        *               |
      |         *                    *                  *        *               |
      |         *                    *                  *        *               |
    2 ++      ***    +              +*             +    *        *+             ++
    1 +++-+-+-*+-+-+-+-+-+-+--+-+-+-+********************+--+-+-+****************+
    01/01          01/08          01/15          01/22          01/29          02/05
                                       Timeline