profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/fcorbelli/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Franco Corbelli fcorbelli Italy www.francocorbelli.it Seasoned Delphi/C++ developer, physical, virtual and cloud storage manager. Highly paranoid on backups. Mainly UNIX server, sometimes Linux ones

fcorbelli/zpaqfranz 12

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix

fcorbelli/ugo 1

Hardware-accelerated SHA-1 hasher

fcorbelli/sblake 0

BLAKE3 (C) for FreeBSD

fcorbelli/sha1collisiondetection 0

Marc Stevens's sha1collisiondetection, for use by git.git as its submodule

fcorbelli/unzpaq 0

Reference decompressor for ZPAQ archive

fcorbelli/zpaq 0

ZPAQ's complete code history mirror

fcorbelli/zsfx 0

zpaqfranz/zpaq SFX module for Windows 32 and 64 bit

issue commentfcorbelli/zpaqfranz

cannot read all files from deduplicated volume under Windows

WORK I am unable to mount the VHXD: it says "RAW"

Can you please do this, reporting the output file?

zpaqfranz sum H:\vmware\test10\* -xxh3 -verbose -debug -summary -all >1.txt

Thank you

mirogeorg

comment created time in 25 minutes

issue commentfcorbelli/zpaqfranz

Slow dedup speed

Using decent Xeon machines with about 1GB/s real bandwith I get ~500GB/hour for .vmdk updating, so ~10TB for day, or ~110MB/s sustained.

To get more speed I use more than one process in parallel (NVMe drive on zfs, so no problem of latency with concurrent access), for different virtual machines, capped by -t2 (no more than 2 threads each), so I can run 2 to 8 (different hardware) updates

As stated I will embed some (optional) profiling to see where the software consume the time

mirogeorg

comment created time in a day

issue commentfcorbelli/zpaqfranz

Slow dedup speed

Please check if "slow adding" is by... reading (from task manager). If zpaq/zpaqfranz, during add of something big (.vmdk etc), read constantly by (for example) 400MB/s, so it is mainly a media-bandwidth limitation (non cache here). If read @ 20MB/s (just an example) something weird is running.

Adding require re-reading everything from the filesystem, hashing, then "do the rest". With vmdks, for example, is rather normal to get 1 hour of about nothing (read...read...read...) and maybe 5 minute of writing on the archive.

I'll add a timer for the dedup stage, with something like "starting dedup stage"... "ended dedup in 2000 s, let's do something..."

mirogeorg

comment created time in a day

issue commentfcorbelli/zpaqfranz

cannot read all files from deduplicated volume under Windows

It will help a lot (an example) Microsoft seems to love make easy things very hard

mirogeorg

comment created time in a day

issue commentfcorbelli/zpaqfranz

Slow dedup speed

About t (test): there are two stages. In the first (as 7.15) the check is done on the SHA-1-stored In the second (zpaqfranz) a CRC-32 (much faster) runs to detect SHA-1 collisions. If you test against a directory you will get the max speed of SHA-1 (in fact you are re-reading in variable-sized chunks from disk, calc SHA1, compare with stored). This is in fact fast, very fast, for slow media.

For much faster one (ex. xxhash64 or XXH3) the v (verify) command run much faster, but it is a check-against-the-filesystem and not a archive-integrity-check (you need something, the original files, online)

zpaqfranz a z:\1.zpaq c:\dropbox\dropbox

will create the z:\1.zpaq, with xxhash64 (by default on zpaqfranz)

running

zpaqfranz v z:\1.zpaq

a single-threaded verify against filesystem for xxhash64 will run

Note: if you are paranoid you can do

zpaqfranz a z:\1.zpaq c:\dropbox\dropbox -sha3

or -sha2, or blake3, or whatever

OK, then

zpaqfranz t z:\1.zpaq

will do a integrity file check (as said two stages, first as 7.15 and second for collision) but

zpaqfranz t z:\1.zpaq c:\dropbox\dropbx

will invoke the SHA1-chunked verify against the filesytem (something similar to 7.15)

If you are really paranoid

zpaqfranz p z:\1.zpaq

and more

zpaqfranz p z:\1.zpaq -verify
mirogeorg

comment created time in 2 days

issue commentfcorbelli/zpaqfranz

Slow dedup speed

I have just the same configuration, and as you can see the SHA-1 is running @ about 900MB/s. zpaq deduplicate in a single thread (I am working on it to become multithread, but it is not so easy). Read all the file, 4K at time, then calculate SHA-1 and do a lot of things. The problem is that you need to read the entire file from the media. If the file is huge (ex a vmdk) you will get a maximum speed of about 900MB/s (the SHA-1) in the deduplication stage. If you use a spinning drive (or SSD) you will have no bottleneck in this case (about 150MB/s from disk, 500MB/s from SSD)

If the files are small (say thousands of .DOC) then the multithread can benefit. You can check yourself, try

zpaqfranz sum d:\something -sha1 -summary

and

zpaqfranz sum d:\something -sha1 -summary -all

So yes, a multithread deduplicator will be better on small files, a fast NVMe and many CPU, but, in fact, not much better (total time).

To speed up a lot of work is needed, not only a faster deduplicator

mirogeorg

comment created time in 2 days

issue commentfcorbelli/zpaqfranz

cannot read all files from deduplicated volume under Windows

zpaqfranz.zip

Please try this one. Now work in this way

 	if (t=="." || t=="..") 
		edate=0;  // don't add, of course
	

	if ((ffd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) && (ffd.dwFileAttributes & FILE_ATTRIBUTE_SPARSE_FILE))
	{
			/// Houston, we have a strange deduplicated .vhdx file?
			/// add as by default
			if (flagverbose)
				printf("Verbose: found something strange (VHDX?) %s\n",t.c_str());
	}
	else
	{
		///	A junction?
		if (ffd.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT)
			edate=0;  // don't add
	}

Remember: -verbose and -debug to see what is running "BTS"

mirogeorg

comment created time in 2 days

issue commentfcorbelli/zpaqfranz

Missing option to run BAT after making VSS snapshot.

yes, you are right. I will implement

mirogeorg

comment created time in 2 days

issue commentfcorbelli/zpaqfranz

Slow dedup speed

The dedup algo is very fast, very well implemented by mr. Mahoney. Using a faster one (for example BLAKE3 with hardware acceleration, XXH3, even SHA-1 with hardware acceleration on AMD Ryzen) get only limited benefits (~10%), not worth the broken compatibility.

The deduplication stage takes little time, in fact, in the overall time. For big files the major problem is the bandwith: reading back 10TB will take a long time (even 400GB). zpaqfranz can use multithread hashing with the sum() command (the hasher) and -all switch, way faster with solid state drives (on my PC up to 17GB/s).

But, reading a single file, it cannot be done. Or, better, I have to think on it.

Check your (max) SHA-1 speed with the b (benchmark command) with -sha1

zpaqfranz b -sha1

On my PC it is more than 900MB/s, faster than SSD bandwith (not for NVMe)

mirogeorg

comment created time in 2 days

issue commentfcorbelli/zpaqfranz

cannot read all files from deduplicated volume under Windows

OK, the problem is the FILE_ATTRIBUTE_REPARSE_POINT attribute, skipped to ignore junctions and symbolic links. For some strange "feature" those files are marked as FILE_ATTRIBUTE_REPARSE_POINT and, therefore, skipped. I can easily add a switch like -forceall or whatever, BUT it will go crazy on junctions. I am now checking Windows' documentation on checking if symbolic links can be on files, or only folders.

Stay tuned, work in progress... :)

mirogeorg

comment created time in 2 days

issue commentfcorbelli/zpaqfranz

cannot read all files from ReFS

  1. made a .vhdx
  2. init as GPT
  3. format as refs
  4. mount as E:
  5. copied some data into it
  6. install dedup role
  7. enable dedup on E: as vdi-something
  8. start by powershell dedup
  9. PS C:\Users\Administrator> Get-DedupStatus

FreeSpace SavedSpace OptimizedFiles InPolicyFiles Volume


8.24 GB 1.12 GB 1480 1480 E:

  1. The .vhdx is backuppable 1

Note: NO .avhxd here

Can you please explain more?

mirogeorg

comment created time in 4 days

issue commentfcorbelli/zpaqfranz

cannot read all files from ReFS

Thank you for this segnalation. I do not have any ReFS machine, typically I run BSD server and esxi (no microsoft hyperv). I will try in the WE to find some win server

mirogeorg

comment created time in 5 days

release fcorbelli/zpaqfranz

54.6

released time in 5 days

created tagfcorbelli/zpaqfranz

tag54.6

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix

created time in 5 days

push eventfcorbelli/zpaqfranz

Franco Corbelli

commit sha 3f0f242cb8f1b612fdb207003a65cd73ecdad615

Update CHANGELOG.md

view details

push time in 5 days

push eventfcorbelli/zpaqfranz

Franco Corbelli

commit sha c00d462c1d42e4a7ee714f45feb5815be8e03271

Add files via upload

view details

push time in 5 days

push eventfcorbelli/zpaqfranz

Franco Corbelli

commit sha 7621447e75a118732e6c5631a9b4f351c7dedeef

Add files via upload

view details

push time in 6 days

push eventfcorbelli/zpaqfranz

Franco Corbelli

commit sha b03cfbae3ffb7ebe5dc23b289b8f1861e2a970c8

Add files via upload

view details

push time in 6 days

create barnchfcorbelli/zpaqfranz

branch : macos

created branch time in 6 days

release fcorbelli/ugo

2.1

ugo.exe 0.20MB

released time in 10 days

push eventfcorbelli/ugo

Franco Corbelli

commit sha 01d2a519c6528c76887b826a1f65c88c71962eb9

Add files via upload

view details

push time in 10 days

created tagfcorbelli/ugo

tag2.1

Hardware-accelerated SHA-1 hasher

created time in 10 days

release fcorbelli/ugo

2.1

ugo.exe 0.20MB

released time in 10 days

push eventfcorbelli/ugo

Franco Corbelli

commit sha a591d9e3a840705a20eb05cee3c885cc89843ed6

Add files via upload

view details

push time in 10 days

create barnchfcorbelli/ugo

branch : main

created branch time in 10 days

created repositoryfcorbelli/ugo

Hardware-accelerated SHA-1 hasher

created time in 10 days

push eventfcorbelli/zpaqfranz

Franco Corbelli

commit sha f5061d59e72eaa8e374e8acd1e4ccae6f45d7a85

Update README.md

view details

push time in 14 days

release fcorbelli/zsfx

52.15

released time in 15 days

created tagfcorbelli/zsfx

tag52.15

zpaqfranz/zpaq SFX module for Windows 32 and 64 bit

created time in 15 days

push eventfcorbelli/zsfx

Franco Corbelli

commit sha 0677136682b29b8d4da55e8a6fd11b42f4f7ec62

Add files via upload

view details

push time in 15 days