Originally published March 19, 2020 @ 10:37 pm

The result of my morbid fascination with the coronavirus situation is this quick bash script that parses Johns Hopkins University coronavirus data to generate a quick report for the current date for the specified countries.

The plan is to add some statistical analysis to spot potential anomalies in the reported data. For now, just a simple summary for the current day.

The script is below. You can also download it from my GitHub repo here. Here’s an example of how to run it:

./covid19_stats_mk2.sh -c US -c Italy -c Spain -c China -c "United Kingdom"

COUNTRY         DATE        CONFIRMED  DEATHS  RECOVERED  ACTIVE  MORTALITY  RECOVERY
US              03-19-2020  13680      200     108        13372   1.4%       .7%
Italy           03-19-2020  41035      3405    4440       33190   8.2%       10.8%
Spain           03-19-2020  17963      830     1107       16026   4.6%       6.1%
China           03-19-2020  81156      3249    70535      7372    4.0%       86.9%
United Kingdom  03-19-2020  2716       138     67         2511    5.0%       2.4%

And this is the script:

An update

It would seem Johns Hopkins University Center for Systems Science and Engineering has issues with maintaining consistent format of their COVID-19 data files. For unknown reasons they rearranged the columns differently for data file from different dates. They also made other arbitrary changes, such as renamed ‘Country_Region’ column to ‘Country/Region’. Well, I hope that made someone very happy.

In any case, I made a couple of changes to my script to compensate for someone’s lack of experience handling data. From bash scripting standpoint you may find interesting the use of *_field variables that dynamically change to identify the correct data column based on the exact or approximate header name. So, as long JHU CSSE doesn’t rename “Deaths” to “Casualties” or “Confirmed” to “Verified”, we should be fine…

#!/bin/bash

while getopts ":c:" opt
do
	case ${opt} in
		c  ) countries+=("${OPTARG}") ;;
		\? ) echo "Unknown option: -$OPTARG" >&2; exit 1;;
:  ) echo "Missing option argument for -$OPTARG" >&2; exit 1;;
*  ) echo "Unimplemented option: -$OPTARG" >&2; exit 1;;
	esac
done
shift $((OPTIND -1))

url="https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports"
url_raw="https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports"

if [ -z "${countries}" ]
then
	echo "You need to specify country code. Exiting..."
	exit 1010
fi

curl_get() {
	curl -s0 -k "${url_raw}/${e}.csv" 2>/dev/null | grep -vE "404: Not Found" > "${tmpfile}"
}

rulem ()  {
	if [ $# -eq 0 ]; then
		echo "Usage: rulem MESSAGE [RULE_CHARACTER]"
		return 1
	fi
	printf -v _hr "%*s" $(tput cols) && echo -en ${_hr// /${2--}} && echo -e "\r3[2C$1"
}

tmpfile="$(mktemp)"
tmpfootnotes="$(mktemp)"
e="$(date +'%m-%d-%Y')"
curl_get

if [ ! -s "${tmpfile}" ]
then
	e="$(date -d'-1 days' +'%m-%d-%Y')"
	curl_get
fi

if [ ! -s "${tmpfile}" ]
then
	echo "Unable to download CSV file. Exiting..."
	exit 1030
fi

for ((i = 0; i < ${#countries[@]}; i++))
do
	c="${countries[$i]}"
	c="$(echo ${c} | sed 's/^ //g')"

	case ${c} in
		US) echo -e "* ${c} recovery rates are no longer tracked as of 2020-12-14" >> "${tmpfootnotes}" ;;
	esac

	country_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Country.Region/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1)
	confirmed_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Confirmed/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1)
	deaths_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Deaths/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1)
	recovered_field=$(awk -F, 'NR==1{for(i=1;i<=NF;i++)if($i~/Recovered/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",f[i];print""}' "${tmpfile}" | sort -u | head -1)
	if [ ! -z "${country_field}" ] && [ ! -z "${confirmed_field}" ] && [ ! -z "${deaths_field}" ] && [ ! -z "${recovered_field}" ]
	then
		confirmed=$(awk -F, -v c="$c" -v field=$country_field '$field == c' "${tmpfile}" | awk -v field=$confirmed_field -F, '{s+=$field}END{print s}')
		deaths=$(awk -F, -v c="$c" -v field=$country_field '$field == c' "${tmpfile}" | awk -v field=$deaths_field -F, '{s+=$field}END{print s}')
		recovered=$(awk -F, -v c="$c" -v field=$country_field '$field == c' "${tmpfile}" | awk -v field=$recovered_field -F, '{s+=$field}END{print s}')
		death_pct="$(echo "scale=1;(${deaths}*100)/${confirmed}"|bc -l)"
		recovery_pct="$(echo "scale=1;(${recovered}*100)/${confirmed}"|bc -l)"
		active_cases="$(echo "scale=0;${confirmed}-(${deaths}+${recovered})"|bc -l)"
		echo "${c},${e},${confirmed},${deaths},${recovered},${active_cases},${death_pct}%,${recovery_pct}%"
	fi
done | (echo "COUNTRY,DATE,CONFIRMED,DEATHS,RECOVERED,ACTIVE,MORTALITY,RECOVERY" && cat) | column -s',' -t
echo ""

if [ -s "${tmpfootnotes}" ]
then
	cat << EOF

$(rulem FOOTNOTES)
$(cat "${tmpfootnotes}")

EOF
fi

/bin/rm -f "${tmpfile}" "${tmpfootnotes}" 2>/dev/null