Skip to content

cosmic-bytes/doppel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Doppel

A high performance, concurrent command-line tool written in Go that scans directories for visually similar duplicate images and organizes them automatically.

Unlike traditional duplicate finders that rely on exact byte matching or file hashes, Doppel uses perceptual hashing (via the images4 library) to find images that look the same, even if they have different resolutions, formats, or slight modifications.

Features

  • Visual Similarity Detection: Uses advanced perceptual hashing to identify visually identical or highly similar images.
  • Blazing Fast: Built from the ground up for speed, utilizing a bounded worker pool and highly parallelized graph algorithms O(N^2) to maximize CPU usage.
  • Safe Organization: Never deletes your files. Duplicate images are safely moved to a doppelgangers/ subdirectory within the target folder.
  • Smart Retention: Groups duplicate images and automatically keeps the first one (alphabetically sorted), moving the rest.
  • Collision Handling: Automatically resolves filename collisions when moving duplicates.
  • Broad Format Support: Works seamlessly with .jpg, .jpeg, .png, .gif, .bmp, .tif, and .tiff files.

Installation

Ensure you have Go installed (version 1.24+ recommended).

Clone the repository and build the binary:

git clone <your-repo-url>
cd doppel
go build -o doppel main.go

To make the command globally available, you can install it to your Go bin directory:

go install

Usage

Run doppel by providing the target directory you wish to scan.

./doppel -dir /path/to/your/images

Flags

Flag Description Default
-dir (Required) The absolute or relative path to the directory containing images to scan. ""
-threshold Maximum Hamming distance for visual similarity (kept for legacy compatibility). 5

Example output

Found 142 images. Processing...
Duplicate group found (keeping photo_01.jpg):
  Moved -> photo_01_copy.jpg
  Moved -> img_9921.png
Duplicate group found (keeping vacation_sunset.jpg):
  Moved -> vacation_sunset_edited.jpg

How It Works

  1. Discovery: Scans the target directory recursively (ignoring the doppelgangers/ folder to prevent loops).
  2. Icon Generation: Spawns a pool of worker goroutines matching your CPU core count to decode images and generate compact mathematical representations (icons) of each image.
  3. Parallel Comparison: Compares all generated icons against each other using an optimized, cyclic-distributed parallel algorithm to build an undirected graph of similarities.
  4. Graph Traversal: Uses Breadth-First Search (BFS) to find connected components within the graph, grouping all matching images together.
  5. Organization: Iterates over each duplicate group, keeps the first image alphabetically, and safely relocates the others to the doppelgangers/ folder.

Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

License

This project is open-source and available under the MIT License.

About

A high performance, concurrent command-line tool written in Go that scans directories for visually similar duplicate images and organizes them automatically.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages