Originally published January 7, 2018 @ 10:46 pm
I am not talking about hundreds or thousands of files. I am talking about hundreds of thousands. The usual “/bin/rm -rf *” may work but will take a while. Or it may fail with the “argument list too long” error. So here are a few examples showing you how to delete many files in a hurry.
First, we need something to work with, like maybe a million empty files dumped into a single folder:
dir=/home/tmp ; mkdir -p ${dir} ; cd ${dir} for i in `seq 1 1000000` ; do printf "file_${i}\n" ; done | xargs touch
Method 1: find + xargs (40 seconds)
time find ${dir} -type f | xargs -L 100 -P 100 /bin/rm -f
Method 2: find + delete (51 seconds)
time find ${dir} -type f -delete
Method 3: rsync (22 seconds)
mkdir /tmp/empty ; time rsync -a --delete-before /tmp/empty/ ${dir}/
Method 4: Perl (23 seconds)
cd ${dir} ; time perl -e 'for(<*>){((stat)[9]<(unlink))}'
Method 5: rm (19 seconds)
time /bin/rm -f ${dir}
Moral of the story: if the number of files doesn’t exceed rm
‘s argument size limit – don’t try to be clever and just use it. Otherwise, rsync
seems a viable alternative.
A different approach should be taken when deleting files from NFS mounts. Each delete operation generates a lot of network overhead. The answer is to parallelize the deletion process. For example, imagine your NFS-mounted directory looks like so:
/mnt/nas01/bigshare/bigfolder_{01..10}
Now, also imagine that each bigfolder_##
contains a large number of files. You can do something like this:
mkdir /tmp/empty find /mnt/nas01/bigshare/ -maxdepth 1 -mindepth 1 -type d | while read line; do rsync -a --delete-before /tmp/empty/ ${line}/ & done
This would start ten rsync
threads in parallel. This will work faster than a single thread, but there is a way of improving performance by mounting each bigfolder_##
individually (if the NAS allows you this option):
for i in `seq -w 01 10`; do mount nas01:/bigshare/bigfolder_${i} /mnt/nas01/bigshare/bigfolder_${i}; done df -hP | grep bigfolder nas01:/bigshare/bigfolder_01 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_01 nas01:/bigshare/bigfolder_02 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_02 nas01:/bigshare/bigfolder_03 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_03 nas01:/bigshare/bigfolder_04 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_04 nas01:/bigshare/bigfolder_05 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_05 nas01:/bigshare/bigfolder_06 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_06 nas01:/bigshare/bigfolder_07 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_07 nas01:/bigshare/bigfolder_08 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_08 nas01:/bigshare/bigfolder_09 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_09 nas01:/bigshare/bigfolder_10 763T 351T 405T 47% /mnt/nas01/bigshare/bigfolder_10
Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.