Find Duplicate Files
Duplicate files take up unnecessary space on the disk. Fortunately, there are tools that automate the search for duplicates.
Fdupes
Install fdupes
:
julio@acer ~> sudo pacman -S fdupes
Run fdupes
in recursive mode (-r
) and redirect the output to a file:
julio@acer ~/Documents/Ebooks> fdupes -r . > dupes1.txt
On my computer, this command took only 7 minutes to analyze 23500 files. The output file,
dupes1.txt
, had 5714 lines!
julio@acer ~/Documents/Ebooks> fdupes -rf . > dupes2.txt
It took about 7 minutes to analyze 23500 files: dupes2.txt
: 3878 lines
Removing blank lines from dupes2.txt
using sed -i '/^$/d' dupes2.txt
, the file ended up with
2054 lines.
Many of the files it recognized as duplicates were intentionally identical. Examples of programming
books are often repeated. Some version control files (git
, svn
, etc.) were recognized as
duplicates, but should not be deleted.
If you want to reduce disk space usage but avoid breaking anything, you can create a script that replaces all duplicate files with hard links.
Delete all duplicate files (be careful with this script):
julio@acer ~/Documents/Ebooks> while read f; do rm "$f"; done < dupes2.txt
Gemini
A good paid alternative for Mac is Gemini, which lists all duplicates in a user-friendly interface and allows you to preview them before sending them to the trash.