Originally published July 29, 2017 @ 9:07 pm

Consider a common situation: you would like to select lines from a log file that match a specific string, but only during a specific time frame. For example, my /var/log/messages contains entries from one of the firewalls. I would like to see everything that was ACCESS BLOCK with a source external to my network on July 29 2017 between nine and ten in the morning.

Here’s a sample of the data from /var/log/messages:

Jul 29 10:57:47 2017 ds9 src="182.16.249.18:32204" dst="64.233.179.139:445" msg="Match default rule, DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=6 proto="others"
Jul 29 10:58:10 2017 ds9 src="203.178.148.19:0" dst="64.233.179.139:0" msg="Match default rule, ICMP Type:8,  DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=1 proto="others"
Jul 29 10:58:20 2017 ds9 src="94.177.226.168:5097" dst="64.233.179.139:5060" msg="Match default rule, DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=17 proto="others"
Jul 29 10:58:23 2017 ds9 src="118.69.72.134:22316" dst="64.233.179.139:23" msg="Match default rule, DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=6 proto="others"
Jul 29 10:59:00 2017 ds9 src="17.173.254.223:16387" dst="64.233.179.139:57625" msg="Match default rule, DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=17 proto="others"

The simplest way is to use regex for the timestamp. For example:

grep -P "^Jul 29 (09|10):.*ACCESS BLOCK" /var/log/messages | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)"

A more laborious approach is to convert the timestamp fields from the usual date +'%b %d %H:%M:%S' format to the epoch time (date +'%s'). Then you can use integer comparison in bash to select the relevant lines from the log. Here’s an example:

l=/var/log/messages
range_begin="$(date --date="Jul 29 2017 9am" +'%s')"
range_end="$(date --date="Jul 29 2017 10am" +'%s')"
cat ${l} | grep "ACCESS BLOCK" | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)" | while read line; do
t="$(date --date="$(echo ${line} | awk '{print $1,$2,$3}')" +'%s')"
echo "${t} $(echo ${line} | awk '{$1=$2=$3=""; print $0}')"
done | awk -v range_begin="$${range_begin}" -v range_end="${range_end}" '{ if ($1 >= range_begin && $1 <= range_end) print $0 }'

The advantage here is you can work with log files where entries are not in chronological order, as may often be the case when multiple remote syslog sources are writing to the same file. Not only can you identify the correct time range, you can then also easily rearrange matched entries in a chronological order. Working with timestamps that are integers offers many other advantages.

Let’s imagine a slightly more sophisticated task: we want to run the search above and then build a frequency table that would show a count of matched records for multiple sixty-minute intervals from the current time going back three hours. So, something like this (assuming right now is Jul 29 15:45:23):

Time Range Blocked Connections
Jul 29 14:45:23 – Jul 29 15:45:23 3246
Jul 29 13:45:23 – Jul 29 14:45:23 2435
Jul 29 12:45:23 – Jul 29 13:45:23 4582

Here’s one way to get this done:

l="/var/log/messages"
time_now="$(date +'%b %d %H:%M:%S')"
interval=1 # hours
l_temp=$(mktemp /tmp/$(echo ${l} | awk -F'/' '{print $NF}')_epoch.XXXXXXXXXX) || { echo "Failed to create temp file"; exit 1; }
cat ${l} | grep "ACCESS BLOCK" | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)" | while read line; do
t="$(date --date="$(echo ${line} | awk '{print $1,$2,$3}')" +'%s')"
echo "${t} $(echo ${line} | awk '{$1=$2=$3=""; print $0}')" >> "${l_temp}"
done
for i in $(seq 1 3); do
range_begin="$(echo "$(date --date="${time_now}" +'%s')-${i}*3600"|bc -l)"
range_end="$(echo "$(date --date="${time_now}" +'%s')-(${i}-${interval})*3600"|bc -l)"
c="$(awk -v range_begin="$${range_begin}" -v range_end="${range_end}" '{ if ($1 >= range_begin && $1 <= range_end) print $0 }' ${l_temp}| wc -l)"
echo "$(date --date=@"${range_begin}" +'%b %d %H:%M:%S') - $(date --date=@"${range_end}" +'%b %d %H:%M:%S')@${c}"
done | column -s'@' -t
if [ -f "${l_temp}" ]; then /bin/rm -f "${l_temp}"; fi

The time-consuming portion of this process is the conversion of the original log file into a version using epoch timestamps. On a multi-core system you can speed things up using xargs. Just keep in mind that you’ll need to sort -k1,1n the output of all xargs threads if you want your resulting file to remain chronological. The more cores you have, the faster this will work. Here’s an example:

l="/var/log/messages"
time_now="$(date +'%b %d %H:%M:%S')"
interval=1 # hours
l_temp=$(mktemp /tmp/$(echo ${l} | awk -F'/' '{print $NF}')_epoch.XXXXXXXXXX) || { echo "Failed to create temp file"; exit 1; }
time_convert() {
read line
echo "$(date --date="$(echo ${line} | awk '{print $1,$2,$3}')" +'%s') $(echo ${line} | awk '{$1=$2=$3=""; print $0}')"
}
export -f time_convert
grep "ACCESS BLOCK" "${l}" | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)" | \
xargs -n 1 -P $(grep -c '^processor' /proc/cpuinfo) -I % bash -c 'echo % | time_convert' _ | sort -k1,1n > "${l_temp}"
for i in $(seq 1 3); do
range_begin="$(echo "$(date --date="${time_now}" +'%s')-${i}*3600"|bc -l)"
range_end="$(echo "$(date --date="${time_now}" +'%s')-(${i}-${interval})*3600"|bc -l)"
c="$(awk -v range_begin="$${range_begin}" -v range_end="${range_end}" '{ if ($1 >= range_begin && $1 <= range_end) print $0 }' ${l_temp} | wc -l)"
echo "$(date --date=@"${range_begin}" +'%b %d %H:%M:%S') - $(date --date=@"${range_end}" +'%b %d %H:%M:%S')@${c}"
done | column -s'@' -t
if [ -f "${l_temp}" ]; then /bin/rm -f "${l_temp}"; fi

Finally and mostly for fun: a quick ASCII gnuplot chart of the data. I added a few more data points for visual impact:

l="/var/log/messages"
time_now="$(date +'%b %d %H:%M:%S')"
interval=1 # hours
l_temp=$(mktemp /tmp/$(echo ${l} | awk -F'/' '{print $NF}')_epoch.XXXXXXXXXX) || { echo "Failed to create temp file"; exit 1; }
time_convert() {
read line
echo "$(date --date="$(echo ${line} | awk '{print $1,$2,$3}')" +'%s') $(echo ${line} | awk '{$1=$2=$3=""; print $0}')"
}
export -f time_convert
grep "ACCESS BLOCK" "${l}" | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)" | \
xargs -n 1 -P $(grep -c '^processor' /proc/cpuinfo) -I % bash -c 'echo % | time_convert' _ | sort -k1,1n > "${l_temp}"
for i in $(seq 1 3); do
range_begin="$(echo "$(date --date="${time_now}" +'%s')-${i}*3600"|bc -l)"
range_end="$(echo "$(date --date="${time_now}" +'%s')-(${i}-${interval})*3600"|bc -l)"
c="$(awk -v range_begin="$${range_begin}" -v range_end="${range_end}" '{ if ($1 >= range_begin && $1 <= range_end) print $0 }' ${l_temp} | wc -l)"
echo "$(date --date=@"${range_begin}" +'%b %d %H:%M:%S') - $(date --date=@"${range_end}" +'%b %d %H:%M:%S')@${c}"
done | column -s'@' -t | gnuplot -e "set terminal dumb; unset key; set style data labels; set xdata time; set xlabel 'Time'; set ylabel 'Blocks'; set autoscale; set timefmt '%b %d %H:%M:%S'; plot '-' using 1:8:ytic(8) with histeps"
if [ -f "${l_temp}" ]; then /bin/rm -f "${l_temp}"; fi

And the result:

 Blocks
   3147 ++-+--+--+--+-+--+--+--+--+--+--+--+-+-*******+--+--+--+--+-+-*******+
        +     +     +    +     +     +     +   *+    *+     +     +   *+    *+
        |                                      *     *                *     *|
        |                                      *     *                *     *|
   2554 ++                         *******     *     *                *     *+
   2390 ++                         *     *     *     *          *******     *+
   2342 ++                         *     *     *     *    *******           *+
   2155 ++                         *     *     *     ******                 *+
   2010 ++                    ******     *     *                            *+
        |                     *          *     *                            *|
        |                     *          *     *                            *|
        |                     *          *     *                            *|
   1473 ++              *******          *******                            *+
        |               *                                                   *|
   1066 ++  *************+     +     +     +    +     +     +     +    +    *+
    983 *****-+--+--+-+--+--+--+--+--+--+--+-+--+--+--+--+--+--+--+-+--+--+-*+
      05:00 06:00 07:0008:00 09:00 10:00 11:0012:00 13:00 14:00 15:0016:00 17:00
                                        Time