What is old is new again: Solid compression with tar
Compressing that Bovespa archive with 7-zip reduced my archive size to a very nice 145 GB, managing a 2.5x compression rate.
Turns out that 7-zip uses a naive volume splitting method.
That's a problem for my use case. I want to store those archives into S3 Glacier, but I don't want to restore them all to get just one file if needed. That means time and money. It would be nice to restore just one or two parts of the archive. Less time, and specially money. I thought of writing a script that indexes the 7-zip archive and creates a sparse file structure to replace the missing parts, but it would be too much work and it doesn't seem too safe for data preservation.
I must use something old enough to care about multi-volumes in a proper way. You know what is old and supports multi-volumes? ZIP. In the right way. What it doesn't do? Solid compression. I could store the files using ZIP multi-volumes and then compress those files with xz. Unfortunately it seems Info-Zip (the usual *nix ZIP implementation) doesn't support multi-volumes like WinZip. Oh well.
What has around the same age as Zip to consider multi-volumes something useful to implement? Tar. Tape Archiver. Tapes. Glacier's spiritual grandpa.
Very well. GNU Tar does have what seems to be a reasonable approach for doing multi-volumes, and I've found pixz, a parallel xz implementation that also indexes tarballs.
Now I have to figure out the right incantations and I might have what I want.