The Scanning Process

After selecting folders to scan and a test profile to apply to the images, the scan process can be started with the Start button in the command panel.

image

The scan will begin and the command panel will deactivate except for the Stop button, which will stop the scan process.  If you stop a scan, you may process the files already loaded, but you cannot resume the scan.  Clicking Start will begin a fresh scan.

While the scan is active, the status bar will display the progress of the scan.  This status bar counts the number of files loaded, the number of images processed, how many duplicate groups were identified, and the time the scan has been running.

image

Initially, the image files are scanned, and that progress will increase first and fastest.  As files are loaded, they are passed to the processor to be tested.  The images will be processed while files are being scanned, so the image group panel will begin populating with results during the scan.

How long does it take?

The performance of the scan will vary widely between different computers and also image sets.  Some things to consider:

  • AntiDitto performs massive hard drive reads at the beginning of its scan, in order to get all the files into memory for testing.
  • After the files have been loaded, the hard drive activity will drop off and the CPU usage will increase as the images are compared.
  • The peak memory usage of AntiDitto will be right after the file load is complete.  As images are compared, duplicate images are released.
  • The final time and final memory usage of AntiDitto is heavily dependent on the number of unique images.  As a worst case scenario, having 100 unique images requires reading 100 images into memory, then comparing each image against each other, with a result of 100 images still in memory.  As a best case scenario, 100 dupes would load 100 images into memory, 1 image group would be created and 99 images would only ever be checked against that one group. 
  • After those examples, it should be understandable that the scan will appear to slow to a crawl towards the end because there are many unique groups that an image has to be compared against.  For this reason, a “time remaining” feature was not added since it could not really be accurate.

Anecdotal Measurements

During testing, these results were captured.  Your results will likely vary.  The computer used is an AMD Ryzen 5 1600 with 16GB of RAM running Windows 10 64-bit.

  • 11k files, 1700 duplicate groups – 13 mins
  • 31k files, 2400 duplicate groups – 1 hr, 15 mins
  • 20k files, 3400 duplicate groups – 25 mins
  • 108k files, 1820 duplicate groups – 9 hrs, 42 mins