Home Data Disaster Recovery Basic Data Recovery in Linux

Basic Data Recovery in Linux

November 1, 2022

130

Have you noticed how the letters “e” and “r” are neighbors on your keyboard? And an unrelated question: do you know the difference between crontab -e and crontab -r commands? Look it up if you don’t. Honestly, I’ve never used the crontab -r command in my many years as a Unix sysadmin. Not intentionally, that is.

But it was bound to happen sooner or later. The /var/spool/cron/crontabs/root file contains the root user’s cron jobs (the exact location may vary depending on the Linux flavor). Losing this file may be a big deal on some servers, and this is the only copy of this file. Naturally, you will check your backups, assuming you have backups. But what if you don’t?

This is not my first run-in with the crontab file. In the past, I’ve mentioned a script I wrote to rebuild an approximation of the original crontab based on the cron log entries. Here’s a command that will check the entire device partition where your /var filesystem is mounted for anything that looks like a cron job entry:

grep -m100 -aP "^(([0-9]|,|\/|\*|\-|#){1,}(\s|\t){1,}){5}.*\/dev\/null" \
$(df /var/spool/cron | grep -oP "(?<=^)/([[:alnum:]]|/){1,}(?=(\s|\t))") | sort -u

Depending on the size of the device, this may take a while. And you will likely get duplicate and slightly different entries (hence the sort -u at the end). This is because you have modified the crontab file a few times in the past days, weeks, months, and all those old versions – although deleted – still exist in the “free” areas of the storage device.

Technically, you can use this method to recover some recently-deleted data. How recently – depends on how much free disk space you have and the amount of filesystem I/O. If you have lots of free filesystem space and the filesystem is reasonably quiet, you may be able to recover data that was deleted or overwritten years ago.

I’ve used this method in the past to bring back to life old versions of the /etc/shadow and .bash_history and other potentially important files that were accidentally (or intentionally) deleted or modified. So this is also a good reminder to use special utilities (like the srm that comes from the secure-delete package) when you need to really delete something.

Here’s a short list of tools for securely deleting data:

# Install the packages
yum -y install coreutils srm wipe

# Using shred
shred -zvu -n  5 /tmp/dir1/secret_file

# Using wipe
wipe -rfi /tmp/dir1/*

# Using srm
srm -vz /tmp/dir1/*

# Using a standard shell commands
cat list_of_files.txt | while read f
do
  sync
  for i in $(seq 1 10); do 
    head -c $(stat -c %s "${f}") </dev/urandom >"${f}"
  done
  /bin/rm -f "${f}"
done

There are several open-source utilities that can help you recover lost data. Let’s take a quick look at these two: foremost and scalpel

In the following example, I have a PDF file /var/tmp/2022 Tacoma(1).pdf that I “accidentally” delete. I then use foremost to try to extract any PDF file it finds on my disk.

ls -lash /var/tmp/*pdf
#> 4.4M -rwxrwxrwx 1 igor igor 4.4M Oct 19  2021 '/var/tmp/2022 Tacoma(1).pdf'

# "Oops..."
/bin/rm -f '/var/tmp/2022 Tacoma(1).pdf'

# Let's see where /var/tmp is mounted
df /var/tmp | grep ^/
#> /dev/sdb       263174212 23218592 226517464  10% /

# Here, I am telling foremost to look for PDF files on /dev/sdb
# and recover them to a subfolder in /mnt/cdrom called 
# "foremost_<date>" (-T option). This will be a quick scan (-q)
# with indirect block detection (-d) running in a verbose mode (-v)

foremost -t pdf -i /dev/sdb -o /mnt/cdrom/foremost -T -d -q -v

# This process may take a while, so grab some coffee.
# In the end you may end up with thousands of recovered files:

find /mnt/cdrom/foremost_* -type f -name "*pdf" | wc -l
#> 7318

# In this example, I know the approximate size of the original file
# and I know it may contain the word "Tacoma," so I can try to
# narrow down my search using the pdfgrep utility:

find /mnt/cdrom/foremost_* -type f -name "*pdf" -size +3M -size -6M | while read f
do
  echo "Checking ${f}"
  if [ $(pdfgrep -c Tacoma "${f}") -gt 0 ]
  then
    echo "String found in ^^^^^^"
  fi
done

The scalpel utility is actually based on foremost but has some enhancements: multithreading, asynchronous I/O, and regex support, among others. Here’s a quick overview based on the previous example:

# The first step is to comment out the specific file types you're
# look for in the scalpel config file. In my case, it is the pdf
sed -i -r 's/^#\s+(pdf\s+y\s+[0-9])//g' /etc/scalpel/scalpel.conf

# Now, let's create the output folder as scalpel lacks foremost's
# handy folder timestamping feature.
outdir=$"/mnt/cdrom/tmp/scalpel_$(date +'%s')" && mkdir -p "${outdir}"

# The rest is simple:
scalpel /dev/sdb -o "${outdir}"
# The nice feature in scalpel is the progress bar that shows the ETA

# You can now delete the files outside of the known size envelop
# (assuming you have some idea of how big the lost file was). And
# then you can use pdfgrep recursively on the entire folder:

find "${outdir}" -type f -name "*pdf" -size -4M -o -size +5M -delete
pdfgrep -r Tacoma "${outdir}" 2>/dev/null

# Another approach is to take advantage of your computer's multiple
# cores to speed up the search process:

find "${outdir}" -type f -name "*pdf" -size +3M -size -6M -print0 | \
xargs -0 -n1 -P$(grep -c processor /proc/cpuinfo) -I% pdfgrep -H Tacoma % 2>/dev/null

# When you're done, don't forget to return the config file to its
# original state, lest you forget the changes you made the next
# time you use scalpel
sed -i -r 's/^(pdf\s+y\s+[0-9])/# /g' /etc/scalpel/scalpel.conf

Other noteworthy data recovery tools are TestDisk, ddrescue, PhotoRec, SafeCopy, extundelete, ext4magic, and R-Undelete.

Igor

Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.

Symbol	USD	% 1h	% 24h	% 7d
BTC	37,157	0.55	2.50	7.72
ETH	1,716.5	0.31	3.66	4.71
USDT	1.000	0.01	0.02	0.01
XRP	0.3813	0.14	0.63	2.13
BNB	658.83	0.28	0.65	2.04
SOL	147.93	0.13	1.23	6.13
USDC	0.9999	0.01	0.00	0.00
TRX	0.2854	0.31	1.22	5.19
	?	---	0.00	0.00
	?	---	0.00	0.00

Bitcoin $ 37,157	Bitcoin 2.50 %
Ethereum $ 1,716.5	Ethereum 3.66 %
Litecoin $ 53.16	Litecoin 0.18 %
XRP $ 0.3813	XRP 0.63 %

IMDb Movie Title Parser in Bash

Managing Mapped Network Drives in Windows

Squeezing Video Files

Adding and Removing sshd instances on CentOS 7

Adding and Removing sshd instances on CentOS 6

LLM Collapse Explained

Notes on ownCloud configuration

Removing Chef Server Installation

Curated Downloads

Sending Windows Logs to Remote Syslog

Plugging iPhone’s Privacy Holes

Managing Mapped Network Drives in Windows

Squeezing Video Files

Late Night Rant: College Admissions Scandal

Measure DNS Server Performance

Resizing Photos for Instagram

QNAP NAS Performance Analysis

Adding and Removing sshd instances on CentOS 7

Adding and Removing sshd instances on CentOS 6

Measure DNS Server Performance

Inventory Network Services with Nmap

Finding Duplicate Photos

Maryland Renaissance Festival

Focus Stacking with Lightroom and Photoshop

Longwood Gardens, April 2018

Basic Data Recovery in Linux

The Coronavirus Digest

Finding Passwords in Logs and Shell History

Monitoring DNS Queries

Removing Chef Server Installation

File Compression Testing

Focus Stacking with Lightroom and Photoshop