While I am on the subject of selecting time ranges from logs, a practical application presented itself. A lonely server in a far-away land with a habit of running out of memory could only be monitored via stunnel and reverse SSH. The server crashed again today and, after bringing it back to life, I looked at whatever meager data the SSH monitor collected to see if there was anything useful.

The SSH monitoring script is simple: every fifteen minutes a connection is made to the server via reverse SSH tunnel and the uptime command is executed. If connection cannot be established in a couple of minutes, an alert is dispatched. So the log file looks something like this:

I thought I saw a pattern in the system load numbers, but then it could’ve been just lack of sleep. Basically, I needed to plot the 15-minute average load and I had no fancy tools like Splunk. There are two problems: the log has no day – only hours and minutes – and the timestamp is in a timezone different from mine. So the solution was to select just those lines covering just the right number of 15-minute intervals since last midnight in the log’s local timezone.

And the result:

So I wasn’t imagining things: there was a pattern.

And now a bit of explanation. The following command tells you how many seconds have elapsed since midnight in the specified timezone (SGT, in this case):

Then it’s a simple matter of dividing it into 15-minute intervals and then grabbing that many lines from the end of the log file. And the gnuplot bit is pretty standard.

Leave A Reply

Please enter your comment!
Please enter your name here