Home Monitoring Performance Atop Script with Scheduling and Logging

Atop Script with Scheduling and Logging

January 2, 2024

414

Originally published January 2, 2018 @ 10:18 am

When something is going down on a server, the first thing most sysadmins will run is the venerable top utility. This happens automatically: if you suspect the server is being sluggish, your fingers just type top without you even thinking about it. Unfortunately, top and many similar tools will only show you the current state of the system. So if a problem came and went before you even logged into the server, you’re out of luck.

It doesn’t help that most cenetralized system performance monitoring tools (OpenView, Solarwinds, Observium, Big Brother, etc), while collecting tons of historical performance data, do not monitor the systems on a per-process basis. And this can be very important when troubleshooting application issues. On the historical performance charts you can see that disk I/O was high and system load went through the roof, but the data about the misbehaving process is long gone.

The atop utility has one killer feature: ability to write everything it sees to a compressed log file. You can later replay this log file, skip to the time index of interest and see exactly what you would have seen, if you were sitting at the console window at that exact moment. Below is a script I wrote to make this logging process a little easier to schedule and run when you want and for as long as you want.

A few things to keep in mind:

Never kill -9 an atop process. From within the utility use q to exit. From console, use kill -15 or pkill atop. The pkill by default uses -15.
While atop creates a compressed log file, it can still get pretty big, so be mindful of available disk space. The rule of thumb is: every hour of atop logging will consume about 50MB of filesystem space at one-second sampling interval
The script below requires the atd service to be installed and active. On RHEL/CentOS 5/6: yum -y install at ; /sbin/chkconfig atd on; /sbin/service atd restart. Some versions of CentOS/RHEL had a buggy atd, so, even if you have it installed, it never hurts to update: yum -y update at ; /sbin/service atd restart
You should run the script as root, so make sure /etc/at.allow contains the root username and the /etc/at.deny doesn’t.

The syntax is fairly simple:

atoplog -t "7:30am tomorrow" -d 480 -i 15 -w /var/log/atop_log

This will start atop at 7:30 tomorrow morning and will keep it going for eight hours, every 15 seconds writing to /var/log/atop_log directory.

And here’s the script. Syntax and examples are included. You can download it here: atop_log. Uncompress and save it to, say, /var/adm/bin and create a convenient link: ln -s /var/adm/bin/atop_log.sh /usr/bin/atoplog

#!/bin/bash
#
#                                      |
#                                  ___/"\___
#                          __________/ o \__________
#                            (I) (G) \___/ (O) (R)
#                                   Igor Os
#                           igor@comradegeneral.com
#                             www.krazyworks.com
#                                 2016-08-03
# ----------------------------------------------------------------------------
# Record atop output in the background for future analysis
# ----------------------------------------------------------------------------

usage() {
cat << EOF
Syntax:
---------------------
atoplog -d <duration_minutes> [-t "<time when to run>" Default: in a minute] [-i <interval_seconds> Default: 5] [-w <target_directory> Default: /var/log/atop]

Example:
---------------------
atoplog -t "2:30pm today" -d 30 -i 2 -w /var/tmp/atop
EOF
exit 1
}

atop_check() {
	if [ ! -x /usr/bin/atop ]
	then
		echo "Can't find /usr/bin/atop. Exiting..."
		exit 1
	fi
	
	if [ ! -x /usr/bin/timeout ]
	then
		echo "Can't find /usr/bin/timeout. Exiting..."
		exit 1
	fi
	
	if [ $(ps -ef | egrep -c "[a]top\w[1-9].*log") -ne 0 ]
	then
		echo "Just FYI, there's another atop already running:"
		ps -ef | egrep "[a]top\w[1-9].*log"
	fi
}

while getopts ":d:t:i:w:" OPTION; do
	case "${OPTION}" in
		d)
			duration_minutes="${OPTARG}"
			;;
		t)
			when_to_run="${OPTARG}"
			;;
		i)
			interval_seconds="${OPTARG}"
			;;
		w)
			logdir="${OPTARG}"
			;;
		\? ) echo "Unknown option: -$OPTARG" >&2; usage;;
        :  ) echo "Missing option argument for -$OPTARG" >&2; usage;;
        *  ) echo "Unimplemented option: -$OPTARG" >&2; usage;;
	esac
done

configure() {
	if [ -z "${duration_minutes}" ] ; then usage ; fi
	if [ -z "${when_to_run}" ] ; then when_to_run="now" ; fi
	datetime="$(date -d "${when_to_run}" +'%Y-%m-%d_%H%M%S')"
	if [ -z "${interval_seconds}" ] ; then interval_seconds=5 ; fi
	if [ -z "${logdir}" ] ; then logdir="/var/log/atop" ; fi
	if [ ! -d "${logdir}" ] ; then mkdir -p "${logdir}" ; fi
	outfile="${logdir}/atop_${datetime}.log"
	if [ -f "${outfile}" ] ; then /bin/rm -f "${outfile}" ; fi
	(( duration_seconds = duration_minutes * 60 ))
	(( duration_samples = duration_seconds / interval_seconds ))
	
}

atop_do() {
	at ${when_to_run} <<<"atop ${interval_seconds} ${duration_samples} -w ${outfile}"
	echo "Running atop at $(atq 2>/dev/null | tail -1 | awk '{print $2,$3}') for ${duration_minutes} minutes at ${interval_seconds}-second intervals with output saved to ${outfile}"
}

atop_help() {
cat << EOF

  You can read this file like so: atop -r ${outfile}
 --------------------------------------------------------------------------------------------------
|                                                                                                  |
| You access this file at any time: no need to wait for recording to finish.                       |
|                                                                                                  |
| Here are some of the useful filtering options:                                                   |
|                                                                                                  |
|  t - Skip forward in time to next snapshot                                                       |
|  T - Skip back in time to previous snapshot                                                      |
|  P - Filter by process name regex                                                                |
|  U - Filter by username regex                                                                    |
|  b - [hh:mm] - jump to specified timestamp                                                       |
|  r - skip back to start of file with current filter applied                                      |
|                                                                                                  |
| For more help, press "?" in atop                                                                 |
|                                                                                                  |
 --------------------------------------------------------------------------------------------------
 
EOF
}

# RUNTIME
atop_check
configure
atop_do
atop_help

Igor

Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.

Bitcoin $ 37,157	Bitcoin 2.50 %
Ethereum $ 1,716.5	Ethereum 3.66 %
Litecoin $ 53.16	Litecoin 0.18 %
XRP $ 0.3813	XRP 0.63 %

IMDb Movie Title Parser in Bash

Managing Mapped Network Drives in Windows

Squeezing Video Files

Adding and Removing sshd instances on CentOS 7

Adding and Removing sshd instances on CentOS 6

LLM Collapse Explained

Notes on ownCloud configuration

Removing Chef Server Installation

Curated Downloads

Sending Windows Logs to Remote Syslog

Plugging iPhone’s Privacy Holes

Managing Mapped Network Drives in Windows

Squeezing Video Files

Late Night Rant: College Admissions Scandal

Measure DNS Server Performance

Resizing Photos for Instagram

QNAP NAS Performance Analysis

Adding and Removing sshd instances on CentOS 7

Adding and Removing sshd instances on CentOS 6

Measure DNS Server Performance

Inventory Network Services with Nmap

Finding Duplicate Photos

Maryland Renaissance Festival

Focus Stacking with Lightroom and Photoshop

Longwood Gardens, April 2018

Atop Script with Scheduling and Logging

Sunrise at Cape Henlopen

Check Filesystem Mount Status

Extracting Email Addresses from TCP Streams

Raising Dead Services

Samba Flexible Character Mapping

Fixing NIC Name in Cloned VMWare Linux Machines