Talk:PHASH
Calculating Perceptual hashes (PHASH) and comparing image similarity)
Some thoughts - What I do when I get a list of similar pictures I create two bat files. The first renames the files so that each of the duplicates are sequential in a directory and include the names and maybe other information. I will use the size so that the larger file will be first and the smaller next so that when I use an image viewer I can just delete the second after verifying it is similar while walking the directory. Afterwords I run the restore bat file to move them back to the correct directories. I am still playing around with the algorithm. Currently re-sizing the image to 34x34 and dropping the outside edge in an attempt to remove borders and lettering. I have found that once the bit difference exceeds 10 the files are usually different. If it is under 8 then there is a good chance it is a duplicate.
Note JPG files are lossy format so you need to be aware that EACH time you save the file you loose some quality. Also note that just because a image is larger does not mean that the image is better. It could be that the other image was enlarged or adjusted and you find the quality of the larger image is worse than the smaller image. The best way to compare and select the best picture is for you to do the comparison.
Unit tests
Do you have any unit tests or statements of expected results? Especially for things like the DCT Matrix. I found this code while looking for the pHash algorithm because I want to implement it in Nim. Your code is quite clear so I can fairly easily port it, I think. However it would be very handy to be able to verify that it actually performs the same as yours. Kwhitefoot (talk) 20:27, 4 December 2022 (CET)