Ask questionsDe-duplicate downloaded images (mostly for fiction.live)
fiction.live stories often have duplicate images in the downloaded epub, as every uploaded image will always receive a new URL, preventing simple deduping. If an author repeatedly reuses images this can significantly bloat the ebook.
Although they don't have the same URL, identical images will be byte-for-byte identical even after FanFicFare/Calibre image processing (resizing, grayscaling). Duplicates can therefore be detected by use of a fast cryptographic hash and only one copy kept in the ebook, all other sources rewritten to point to it.
This isn't by any means critical to supporting F.L, but it's nice to have smaller ebooks.
(I have an only-partly-functional F.L downloader written in Rust and I do this to keep sizes down; I use the BLAKE2 hash since it is fast and its length is configurable. Thanks for FanFicFare, it's the only way I read fanfics now.)
Answer questions AlyoshaVasilieva
Size saved depends on the story. To test I grabbed some stories from F.L using FFF and manually removed duplicate images (note I have a larger-than-default image_max_size):
|Story||Initial size||De-duplicated size|
|Star Wars: The Chosen||7.01MB||6.02MB|
|Your Hero Academia||7.4MB||No duplicates|
|A Shadow Resides||27.9MB||23.4MB|
Entirely possible it's more effort than is worth it, especially since every other site I'm aware of either doesn't use images or expects users to provide their own hosting (leading to identical URLs).