Find Duplicate Files
Duplicate files take up unnecessary space on the disk. Fortunately, there are tools that automate the search for duplicates.
Fdupes
Install fdupes:
julio@acer ~> sudo pacman -S fdupes
Run fdupes in recursive mode (-r) and redirect the output to a file:
julio@acer ~/Documents/Ebooks> fdupes -r . > dupes1.txt
On my computer, this command took only 7 minutes to analyze 23500 files. The output file,
dupes1.txt, had 5714 lines!
julio@acer ~/Documents/Ebooks> fdupes -rf . > dupes2.txt
It took about 7 minutes to analyze 23500 files: dupes2.txt: 3878 lines
Removing blank lines from dupes2.txt using sed -i '/^$/d' dupes2.txt, the file ended up with
2054 lines.
Many of the files it recognized as duplicates were intentionally identical. Examples of programming
books are often repeated. Some version control files (git, svn, etc.) were recognized as
duplicates, but should not be deleted.
If you want to reduce disk space usage but avoid breaking anything, you can create a script that replaces all duplicate files with hard links.
Delete all duplicate files (be careful with this script):
julio@acer ~/Documents/Ebooks> while read f; do rm "$f"; done < dupes2.txt
Gemini
A good paid alternative for Mac is Gemini, which lists all duplicates in a user-friendly interface and allows you to preview them before sending them to the trash.