Block Level Changes At The File System Level?

Home » CentOS » Block Level Changes At The File System Level?

July 2, 2014 Lists CentOS 21 Comments

I’m trying to streamline a backup system using ZFS. In our situation, we’re writing pg_dump files repeatedly, each file being highly similar to the previous file. Is there a file system (EG: ext4? xfs?) that, when re-writing a similar file, will write only the changed blocks and not rewrite the entire file to a new set of blocks?

Assume that we’re writing a 500 MB file with only 100 KB of changes. Other than a utility like diff, is there a file system that would only write 100KB and not 500 MB of data? In concept, this would work similarly to using the ‘diff’ utility…

-Ben

21 thoughts on - Block Level Changes At The File System Level?

says:

July 2, 2014 at 2:57 pm

Lists wrote:
I think the buzzword you want is dedup.

mark
Les Mikesell says:

July 2, 2014 at 3:11 pm

There is something called rdiff-backup
(http://www.nongnu.org/rdiff-backup/ and packaged in EPEL) that does reverse diffs at the application level. If it performs well enough it might be easier to manage than a de-duping filesystem. Or BackupPC
– which would store a complete copy if there are any changes at all between dumps but would compress them and automatically manage the number you need to keep.
Lists says:

July 3, 2014 at 2:01 pm

dedup works at the file level. Here we’re talking about files that are highly similar but not identical. I don’t want to rewrite an entire file that’s 99% identical to the new file form, I just want to write a small set of changes. I’d use ZFS to keep track of which blocks change over time.

I’ve been asking around, and it seems this capability doesn’t exist
*anywhere*.

-Ben
Jack Bailey says:

July 3, 2014 at 2:06 pm

Check this link.

https://blogs.oracle.com/bonwick/entry/zfs_dedup

…

*What to dedup: Files, blocks, or bytes?*

Data can be deduplicated at the level of files, blocks, or bytes.
…
says:

July 3, 2014 at 2:06 pm

Lists wrote:

I was under the impression from a few years ago that at least the then-commercial versions operated at the block level, *not* at the file level. rsync works at the file level, and dedup is supposed to be fancier.

mark
John R says:

July 3, 2014 at 2:19 pm

you do realize, adding/removing or even changing the length of a single line in a block of that pg_dump file will change every block after it as the data will be offset ?

may I suggest that instead of pg_dump, you use pg_basebackup and WAL
archiving… this is the best way to do delta backups of a sql database server.
Les Mikesell says:

July 3, 2014 at 2:23 pm

Yes, basically it would keep a table of hashes of the content of existing blocks and do something magic to map writes of new matching blocks to the existing copy at the file system level. Whether that turns out to be faster/better than something like rdiff-backup would be up to the implementations. Oh, and I forgot to mention that there is an alpha version of BackupPC4 at http://sourceforge.net/projects/BackupPC/files/BackupPC-beta/4.0.0alpha3/
that is supposed to do deltas between runs.

But, since this is about PostgreSQL, the right way is probably just to set up replication and let it send the changes itself instead of doing frequent dumps.
Rainer Duffner says:

July 3, 2014 at 2:26 pm

Am 03.07.2014 um 21:19 schrieb John R Pierce :

Additionally, I
Lists says:

July 3, 2014 at 2:48 pm

Whatever we do, we need the ability to create a point-in-time history. We commonly use our archival dumps for audit, testing, and debugging purposes. I don’t think PG + WAL provides this type of capability. So at the moment we’re down to:

A) run PG on a ZFS partition and snapshot ZFS. B) Keep making dumps (as now) and use lots of disk space. C) Cook something new and magical using diff, rdiff-backup, or related tools.

-Ben
Les Mikesell says:

July 3, 2014 at 3:10 pm

I think it does. You should be able to have a base dump plus some number of incremental logs that you can apply to get to a point in time. Might take longer than loading a single dump, though.

Depending on your data, you might be able to export it as tables in sorted order for snapshots that would diff nicely, but it is painful to develop things that break with changes in the data schema.

Disk space is cheap – and pg_dumps usually compress pretty well. But if you have time to experiment, I’d like to know how rdiff-backup or BackupPC4 performs.
Stephen Harris says:

July 3, 2014 at 3:47 pm

You can recover WAL files up until the point in time specified in the restore file

See, for example

http://opensourcedbms.com/dbms/how-to-do-point-in-time-recovery-with-PostgreSQL-9-2-pitr-3/

#recovery_target_time = ” # e.g. ‘2004-07-14 22:39:00 EST’
Ljubomir Ljubojevic says:

July 3, 2014 at 3:51 pm

Check out 7z from p7zip package. I use command:

7za a -t7z $YearNum-$MonthNum.7z -i@include.lst -mx$CompressionMetod
-mmt$ThreadNumber -mtc=on

for compressing a lot of similar files from sysinfo and kernel.log, files backuped every hour that do not change much. And I find out that it reuses already existing blocks/hashes/whatever and I guess just reference them with a pointer instead of storing them again.

So, 742 files that uncompressed have 179 MB, compressed ocupy only 452
KB, which is only 0.2% of original size, 442 TIMES smaller :

Listing archive: 2014-03.7z
Devin Reade says:

July 4, 2014 at 3:41 pm

–I have to back up Stephen on this one:

1. The most efficient way to get minimal diffs is generally to get
the program that understands the semantics of the data to make
the diffs. In the DB world this is typically some type of
baseline + log shipping. It comes in various flavours and names,
but the concept is the same across the various enterprise-grade
databases.

Just as algorithmic changes to an application to increase performance
are always going to be much better than trying to tune OS-level
parameters, doing “dedup” at the application level (where the capability
exists) is always going to be more efficient than trying to do it
at the OS level.

2. Recreating a point-in-time image for audits, testing, etc, then
becomes the process of exercising your recovery/DR procedures (which
is a very good side effect). Want to do an audit? Recover the
db by starting with the baseline and rolling the log forward to
the desired point.

3. Although rolling the log forward can take time, you can find a
suitable tradeoff between recover time and disk space by periodically
taking a new baseline (weekly? monthly? depends on your write load)
Then anything older than that baseline is only of interest for
audit data/retention purposes, and no longer factors into the
recovery/DR/test scenarios.

4. Using baseline + log shipping generally results in smaller storage
requirements for offline / offsite backups. (Don’t forget that
you’re not exercising your DR procedure unless you sometimes recover
from your offsite backups, so maybe it would be good to have a policy
that all audits are performed based on recovery from offsite media,
only.)

5. With the above mechanisms in place, there’s basically zero need for
block- or file-based deduplication, so you can save yourself from
that level of complexity / resource usage. You may decide that
filesystem-level snapshots of the filesystem holding the log files
still plays a part in your backup strategy, but that’s separate from
the dedup issue.

Echoing one of John’s comments, I would be very surprised if doing dedup on database-type data would realize any significant benefits for common configurations/loads.

Devin
lee says:

July 4, 2014 at 3:59 pm

Ljubomir Ljubojevic writes:

Perhaps there is a file system that supports compression and would do a good job with the snapshots transparently. Maybe even ZFS or btrfs do?
Martín Cigorraga says:

July 5, 2014 at 5:19 am

Hi, I agree with Lee. Btrfs actually does sport built-in compression as a mount argument/flag and the delta snapshots works beautifully well but I wouldn’t dare to use it with any kernel previous to 3.13; in fact best practice would be to use it with kernel 3.14.x or later, preferably 3.15.x and later. B.R.
SilverTip257 says:

July 7, 2014 at 7:36 am

Seems to be that 7zip uses LZMA for the 7z archives (though it supports other compression types). Confirmed below.

Exactly … LZMA … Grab the “xz” package for CentOS 6

There’s also an –lzma option for tar.

I am inclined to use xz utils as opposed to 7zip since 7zip comes from a
3rd party repo.
Ljubomir Ljubojevic says:

July 7, 2014 at 7:54 am

xz needs to be checked for support of block optimization/reuse, I do not think that is part of LZMA protokol (I might be wrong of course). Also, check needs to be made if xz supports multitrheading like pk7zip.

Btw., p7zip package is in EPEL, should be safe enough,but I understand your concerns.
Ljubomir Ljubojevic says:

July 7, 2014 at 8:20 am

Ok, I spent several minutes on this.

xz –long-help and man xz are much more informative.

https://en.wikipedia.org/wiki/Xz:

“Design xz compresses single files as input, and does not bundle multiple files into a single archive. It is therefore common to compress a file that is itself an archive, such as those created by the tar or cpio Unix programs.[2]”

“xz is a lossless data compression program and file format which incorporates the LZMA compression algorithm.

xz can be thought of as a stripped down version of the 7-Zip program. xz has its own file format rather than the .7z format used by 7-Zip (which lacks support for Unix-like file system metadata[2]).”

https://en.wikipedia.org/wiki/7-Zip:

“By default, 7-Zip creates 7z format archives with a .7z file extension. Each archive can contain multiple directories and files. As a container format, security or size reduction are achieved using a stacked combination of filters. These can consist of pre-processors, compression algorithms, and encryption filters.”

So either tar.xz or 7z container, and looks like LZM2 is only option, with several switches for xz for block reduction.

And, no, –help does not actually help with assessing if xz is viable replacement for some particular purpose, testing is needed.

P.S. You should REALLY fix your mail client, to sent replies to CentOS@CentOS.org so that conversations is seen by others (it looks like you reply 99% to the person instead back to the list).
Markus Falb says:

July 7, 2014 at 3:54 pm

No, it think it does not. There is a threads option but in the manpage is stated

… Multithreaded compression and decompression are not implemented yet, so this option has no effect for
now.
…
Ljubomir Ljubojevic says:

July 7, 2014 at 4:56 pm

OH, that’s a bummer. Thanks, now i do not have to waste time on experimenting.
fred smith says:

July 7, 2014 at 5:48 pm

Have you seen pigz? similar to gzip but multithreaded.

Block Level Changes At The File System Level?

21 thoughts on - Block Level Changes At The File System Level?

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta