Originally published July 29, 2017 @ 9:07 pm
Consider a common situation: you would like to select lines from a log file that match a specific string, but only during a specific time frame. For example, my /var/log/messages
contains entries from one of the firewalls. I would like to see everything that was ACCESS BLOCK
with a source external to my network on July 29 2017 between nine and ten in the morning.
Here’s a sample of the data from /var/log/messages
:
Jul 29 10:57:47 2017 ds9 src="182.16.249.18:32204" dst="64.233.179.139:445" msg="Match default rule, DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=6 proto="others" Jul 29 10:58:10 2017 ds9 src="203.178.148.19:0" dst="64.233.179.139:0" msg="Match default rule, ICMP Type:8, DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=1 proto="others" Jul 29 10:58:20 2017 ds9 src="94.177.226.168:5097" dst="64.233.179.139:5060" msg="Match default rule, DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=17 proto="others" Jul 29 10:58:23 2017 ds9 src="118.69.72.134:22316" dst="64.233.179.139:23" msg="Match default rule, DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=6 proto="others" Jul 29 10:59:00 2017 ds9 src="17.173.254.223:16387" dst="64.233.179.139:57625" msg="Match default rule, DROP" note="ACCESS BLOCK" user="unknown" devID="5cf4ab67c448" cat="Firewall" class="Access Control" ob="0" ob_mac="000000000000" dir="ANY:ANY" protoID=17 proto="others"
The simplest way is to use regex for the timestamp. For example:
grep -P "^Jul 29 (09|10):.*ACCESS BLOCK" /var/log/messages | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)"
A more laborious approach is to convert the timestamp fields from the usual date +'%b %d %H:%M:%S'
format to the epoch time (date +'%s'
). Then you can use integer comparison in bash
to select the relevant lines from the log. Here’s an example:
l=/var/log/messages range_begin="$(date --date="Jul 29 2017 9am" +'%s')" range_end="$(date --date="Jul 29 2017 10am" +'%s')" cat ${l} | grep "ACCESS BLOCK" | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)" | while read line; do t="$(date --date="$(echo ${line} | awk '{print $1,$2,$3}')" +'%s')" echo "${t} $(echo ${line} | awk '{$1=$2=$3=""; print $0}')" done | awk -v range_begin="$${range_begin}" -v range_end="${range_end}" '{ if ($1 >= range_begin && $1 <= range_end) print $0 }'
The advantage here is you can work with log files where entries are not in chronological order, as may often be the case when multiple remote syslog sources are writing to the same file. Not only can you identify the correct time range, you can then also easily rearrange matched entries in a chronological order. Working with timestamps that are integers offers many other advantages.
Let’s imagine a slightly more sophisticated task: we want to run the search above and then build a frequency table that would show a count of matched records for multiple sixty-minute intervals from the current time going back three hours. So, something like this (assuming right now is Jul 29 15:45:23):
Time Range | Blocked Connections |
Jul 29 14:45:23 – Jul 29 15:45:23 | 3246 |
Jul 29 13:45:23 – Jul 29 14:45:23 | 2435 |
Jul 29 12:45:23 – Jul 29 13:45:23 | 4582 |
Here’s one way to get this done:
l="/var/log/messages" time_now="$(date +'%b %d %H:%M:%S')" interval=1 # hours l_temp=$(mktemp /tmp/$(echo ${l} | awk -F'/' '{print $NF}')_epoch.XXXXXXXXXX) || { echo "Failed to create temp file"; exit 1; } cat ${l} | grep "ACCESS BLOCK" | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)" | while read line; do t="$(date --date="$(echo ${line} | awk '{print $1,$2,$3}')" +'%s')" echo "${t} $(echo ${line} | awk '{$1=$2=$3=""; print $0}')" >> "${l_temp}" done for i in $(seq 1 3); do range_begin="$(echo "$(date --date="${time_now}" +'%s')-${i}*3600"|bc -l)" range_end="$(echo "$(date --date="${time_now}" +'%s')-(${i}-${interval})*3600"|bc -l)" c="$(awk -v range_begin="$${range_begin}" -v range_end="${range_end}" '{ if ($1 >= range_begin && $1 <= range_end) print $0 }' ${l_temp}| wc -l)" echo "$(date --date=@"${range_begin}" +'%b %d %H:%M:%S') - $(date --date=@"${range_end}" +'%b %d %H:%M:%S')@${c}" done | column -s'@' -t if [ -f "${l_temp}" ]; then /bin/rm -f "${l_temp}"; fi
The time-consuming portion of this process is the conversion of the original log file into a version using epoch timestamps. On a multi-core system you can speed things up using xargs
. Just keep in mind that you’ll need to sort -k1,1n
the output of all xargs
threads if you want your resulting file to remain chronological. The more cores you have, the faster this will work. Here’s an example:
l="/var/log/messages" time_now="$(date +'%b %d %H:%M:%S')" interval=1 # hours l_temp=$(mktemp /tmp/$(echo ${l} | awk -F'/' '{print $NF}')_epoch.XXXXXXXXXX) || { echo "Failed to create temp file"; exit 1; } time_convert() { read line echo "$(date --date="$(echo ${line} | awk '{print $1,$2,$3}')" +'%s') $(echo ${line} | awk '{$1=$2=$3=""; print $0}')" } export -f time_convert grep "ACCESS BLOCK" "${l}" | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)" | \ xargs -n 1 -P $(grep -c '^processor' /proc/cpuinfo) -I % bash -c 'echo % | time_convert' _ | sort -k1,1n > "${l_temp}" for i in $(seq 1 3); do range_begin="$(echo "$(date --date="${time_now}" +'%s')-${i}*3600"|bc -l)" range_end="$(echo "$(date --date="${time_now}" +'%s')-(${i}-${interval})*3600"|bc -l)" c="$(awk -v range_begin="$${range_begin}" -v range_end="${range_end}" '{ if ($1 >= range_begin && $1 <= range_end) print $0 }' ${l_temp} | wc -l)" echo "$(date --date=@"${range_begin}" +'%b %d %H:%M:%S') - $(date --date=@"${range_end}" +'%b %d %H:%M:%S')@${c}" done | column -s'@' -t if [ -f "${l_temp}" ]; then /bin/rm -f "${l_temp}"; fi
Finally and mostly for fun: a quick ASCII gnuplot
chart of the data. I added a few more data points for visual impact:
l="/var/log/messages" time_now="$(date +'%b %d %H:%M:%S')" interval=1 # hours l_temp=$(mktemp /tmp/$(echo ${l} | awk -F'/' '{print $NF}')_epoch.XXXXXXXXXX) || { echo "Failed to create temp file"; exit 1; } time_convert() { read line echo "$(date --date="$(echo ${line} | awk '{print $1,$2,$3}')" +'%s') $(echo ${line} | awk '{$1=$2=$3=""; print $0}')" } export -f time_convert grep "ACCESS BLOCK" "${l}" | grep -vP "src=\"(192\.168\.|0\.|10\.|127\.|172\.16\.)" | \ xargs -n 1 -P $(grep -c '^processor' /proc/cpuinfo) -I % bash -c 'echo % | time_convert' _ | sort -k1,1n > "${l_temp}" for i in $(seq 1 3); do range_begin="$(echo "$(date --date="${time_now}" +'%s')-${i}*3600"|bc -l)" range_end="$(echo "$(date --date="${time_now}" +'%s')-(${i}-${interval})*3600"|bc -l)" c="$(awk -v range_begin="$${range_begin}" -v range_end="${range_end}" '{ if ($1 >= range_begin && $1 <= range_end) print $0 }' ${l_temp} | wc -l)" echo "$(date --date=@"${range_begin}" +'%b %d %H:%M:%S') - $(date --date=@"${range_end}" +'%b %d %H:%M:%S')@${c}" done | column -s'@' -t | gnuplot -e "set terminal dumb; unset key; set style data labels; set xdata time; set xlabel 'Time'; set ylabel 'Blocks'; set autoscale; set timefmt '%b %d %H:%M:%S'; plot '-' using 1:8:ytic(8) with histeps" if [ -f "${l_temp}" ]; then /bin/rm -f "${l_temp}"; fi
And the result:
Blocks 3147 ++-+--+--+--+-+--+--+--+--+--+--+--+-+-*******+--+--+--+--+-+-*******+ + + + + + + + *+ *+ + + *+ *+ | * * * *| | * * * *| 2554 ++ ******* * * * *+ 2390 ++ * * * * ******* *+ 2342 ++ * * * * ******* *+ 2155 ++ * * * ****** *+ 2010 ++ ****** * * *+ | * * * *| | * * * *| | * * * *| 1473 ++ ******* ******* *+ | * *| 1066 ++ *************+ + + + + + + + + *+ 983 *****-+--+--+-+--+--+--+--+--+--+--+-+--+--+--+--+--+--+--+-+--+--+-*+ 05:00 06:00 07:0008:00 09:00 10:00 11:0012:00 13:00 14:00 15:0016:00 17:00 Time
Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.