profile
viewpoint

Ask questionsDe-duplicate downloaded images (mostly for fiction.live)

fiction.live stories often have duplicate images in the downloaded epub, as every uploaded image will always receive a new URL, preventing simple deduping. If an author repeatedly reuses images this can significantly bloat the ebook.

Although they don't have the same URL, identical images will be byte-for-byte identical even after FanFicFare/Calibre image processing (resizing, grayscaling). Duplicates can therefore be detected by use of a fast cryptographic hash and only one copy kept in the ebook, all other sources rewritten to point to it.

This isn't by any means critical to supporting F.L, but it's nice to have smaller ebooks.

(I have an only-partly-functional F.L downloader written in Rust and I do this to keep sizes down; I use the BLAKE2 hash since it is fast and its length is configurable. Thanks for FanFicFare, it's the only way I read fanfics now.)

JimmXinu/FanFicFare

Answer questions AlyoshaVasilieva

Size saved depends on the story. To test I grabbed some stories from F.L using FFF and manually removed duplicate images (note I have a larger-than-default image_max_size):

Story Initial size De-duplicated size
Star Wars: The Chosen 7.01MB 6.02MB
Your Hero Academia 7.4MB No duplicates
A Shadow Resides 27.9MB 23.4MB
Fate/Grand Quest 28.6MB 25.1MB
Dictator Quest 10.8MB 10.6MB
EUN-1CE 4.33MB 4.11MB

Entirely possible it's more effort than is worth it, especially since every other site I'm aware of either doesn't use images or expects users to provide their own hosting (leading to identical URLs).

useful!

Related questions

Having trouble with soupsieve, which crashes fanficfare. hot 1
source:https://uonfu.com/
Github User Rank List