Searching for duplicate files is a computing and I/O intensive operation that requires
both a fast hard disk and a powerful CPU. Depending on the number of files that should be
processed and the number of existing duplicates, a search operation may take from a couple
of minutes for a few hundreds of files to several hours in the case that you need to scan
many thousands of files located on multiple disks or enterprise storage systems.
The main purpose of the performance review is to provide our customers with an estimate of performance
and an expected scalability level of DiskBoss's built-in duplicate files finder on different hardware
configurations and data sets. In addition, we have compared our own software to latest versions of other
popular duplicate file finders - NoClone and Duplicate Files Detective 2 (DFD).
All performance tests were performed using DiskBoss v1.4.20 on a PC machine equipped with a dual-core 2.4 GHz
E6600 Intel CPU and 2 GB of system memory running Windows XP Professional (32-Bit). In order to analyze
performance on different types of files we have prepared the following three data sets:
File Set #1 - 15GB, 5,000 Medium-Sized files with 10% duplicates
File Set #2 - 3GB, 55,000 Small-sized files with 10% duplicates
File Set #3 - 32GB, 120,000 Files of various sizes with 30% duplicates
In order to analyze duplicate files search performance on different hardware architectures, we have replicated
all the three data sets to the following storage devices:
Storage Device #1 - 150GB, Western Digital Raptor
Storage Device #2 - 2x150GB, Western Digital Raptor in RAID0 configuration
Storage Device #3 - 2TB NAS Storage connected through Gigabit Ethernet
Storage Device #4 - 500GB, Western Digital USB Disk
Each software tool was executed once for each data set on each hardware configuration with a system reboot before
each benchmark - resulting in 12 benchmarks per tool and 36 benchmarks in total. Individual benchmark results from all four
different hardware configurations were averaged and finally three graphs were prepared representing average
duplicate files search performance for small files, medium-sized files and mixed files.
As it is clear from the first graph, all three tools deliver very close performance results while processing a small
amount of medium-sized (2MB-5MB) files. From the performance point of view, this is the best case scenario and if you need
to process a couple of hundreds of files any software tool will do the job.
The second data set contains 55,000 small (1KB-200KB) files, mostly Word documents and images. Here the situation changes
dramatically and we can see the performance of NoClone and DFD dropping very significantly. During performance tests we have
identified a common pattern that exists in NoClone, DFD and other tools, which are not mentioned in this review. The issue is
that all tools that we have tested are starting the duplicate files search operation very fast, but getting slower and slower
as more files should be processed.
From the beginning, DiskBoss's built-in duplicate files finder was designed as a scalable solution capable of processing
millions of files at a sustainable speed. In addition, DiskBoss's built-in duplicate files finder is capable of parallelizing
all processing operations and effectively utilizing modern hardware architectures including multi-core CPUs,
disk RAIDs and Gigabit Ethernet networks.
The last data set confirms the identified pattern - DiskBoss's built-in duplicate files finder is capable of processing large
data sets significantly faster and more effectively.
* This performance review has been prepared for information purposes only and we are strongly advise
you to make your own performance evaluations using your specific hardware components and data sets.