Block Level Changes At The File System Level?

Home » CentOS » Block Level Changes At The File System Level?
CentOS 21 Comments

I’m trying to streamline a backup system using ZFS. In our situation, we’re writing pg_dump files repeatedly, each file being highly similar to the previous file. Is there a file system (EG: ext4? xfs?) that, when re-writing a similar file, will write only the changed blocks and not rewrite the entire file to a new set of blocks?

Assume that we’re writing a 500 MB file with only 100 KB of changes. Other than a utility like diff, is there a file system that would only write 100KB and not 500 MB of data? In concept, this would work similarly to using the ‘diff’ utility…

-Ben

21 thoughts on - Block Level Changes At The File System Level?

  • There is something called rdiff-backup
    (http://www.nongnu.org/rdiff-backup/ and packaged in EPEL) that does reverse diffs at the application level. If it performs well enough it might be easier to manage than a de-duping filesystem. Or BackupPC
    – which would store a complete copy if there are any changes at all between dumps but would compress them and automatically manage the number you need to keep.

  • dedup works at the file level. Here we’re talking about files that are highly similar but not identical. I don’t want to rewrite an entire file that’s 99% identical to the new file form, I just want to write a small set of changes. I’d use ZFS to keep track of which blocks change over time.

    I’ve been asking around, and it seems this capability doesn’t exist
    *anywhere*.

    -Ben

  • Lists wrote:

    I was under the impression from a few years ago that at least the then-commercial versions operated at the block level, *not* at the file level. rsync works at the file level, and dedup is supposed to be fancier.

    mark

  • you do realize, adding/removing or even changing the length of a single line in a block of that pg_dump file will change every block after it as the data will be offset ?

    may I suggest that instead of pg_dump, you use pg_basebackup and WAL
    archiving… this is the best way to do delta backups of a sql database server.

  • Yes, basically it would keep a table of hashes of the content of existing blocks and do something magic to map writes of new matching blocks to the existing copy at the file system level. Whether that turns out to be faster/better than something like rdiff-backup would be up to the implementations. Oh, and I forgot to mention that there is an alpha version of BackupPC4 at http://sourceforge.net/projects/BackupPC/files/BackupPC-beta/4.0.0alpha3/
    that is supposed to do deltas between runs.

    But, since this is about PostgreSQL, the right way is probably just to set up replication and let it send the changes itself instead of doing frequent dumps.

  • Whatever we do, we need the ability to create a point-in-time history. We commonly use our archival dumps for audit, testing, and debugging purposes. I don’t think PG + WAL provides this type of capability. So at the moment we’re down to:

    A) run PG on a ZFS partition and snapshot ZFS. B) Keep making dumps (as now) and use lots of disk space. C) Cook something new and magical using diff, rdiff-backup, or related tools.

    -Ben

  • I think it does. You should be able to have a base dump plus some number of incremental logs that you can apply to get to a point in time. Might take longer than loading a single dump, though.

    Depending on your data, you might be able to export it as tables in sorted order for snapshots that would diff nicely, but it is painful to develop things that break with changes in the data schema.

    Disk space is cheap – and pg_dumps usually compress pretty well. But if you have time to experiment, I’d like to know how rdiff-backup or BackupPC4 performs.

  • Check out 7z from p7zip package. I use command:

    7za a -t7z $YearNum-$MonthNum.7z -i@include.lst -mx$CompressionMetod
    -mmt$ThreadNumber -mtc=on

    for compressing a lot of similar files from sysinfo and kernel.log, files backuped every hour that do not change much. And I find out that it reuses already existing blocks/hashes/whatever and I guess just reference them with a pointer instead of storing them again.

    So, 742 files that uncompressed have 179 MB, compressed ocupy only 452
    KB, which is only 0.2% of original size, 442 TIMES smaller :

    Listing archive: 2014-03.7z

  • –I have to back up Stephen on this one:

    1. The most efficient way to get minimal diffs is generally to get
    the program that understands the semantics of the data to make
    the diffs. In the DB world this is typically some type of
    baseline + log shipping. It comes in various flavours and names,
    but the concept is the same across the various enterprise-grade
    databases.

    Just as algorithmic changes to an application to increase performance
    are always going to be much better than trying to tune OS-level
    parameters, doing “dedup” at the application level (where the capability
    exists) is always going to be more efficient than trying to do it
    at the OS level.

    2. Recreating a point-in-time image for audits, testing, etc, then
    becomes the process of exercising your recovery/DR procedures (which
    is a very good side effect). Want to do an audit? Recover the
    db by starting with the baseline and rolling the log forward to
    the desired point.

    3. Although rolling the log forward can take time, you can find a
    suitable tradeoff between recover time and disk space by periodically
    taking a new baseline (weekly? monthly? depends on your write load)
    Then anything older than that baseline is only of interest for
    audit data/retention purposes, and no longer factors into the
    recovery/DR/test scenarios.

    4. Using baseline + log shipping generally results in smaller storage
    requirements for offline / offsite backups. (Don’t forget that
    you’re not exercising your DR procedure unless you sometimes recover
    from your offsite backups, so maybe it would be good to have a policy
    that all audits are performed based on recovery from offsite media,
    only.)

    5. With the above mechanisms in place, there’s basically zero need for
    block- or file-based deduplication, so you can save yourself from
    that level of complexity / resource usage. You may decide that
    filesystem-level snapshots of the filesystem holding the log files
    still plays a part in your backup strategy, but that’s separate from
    the dedup issue.

    Echoing one of John’s comments, I would be very surprised if doing dedup on database-type data would realize any significant benefits for common configurations/loads.

    Devin

  • Ljubomir Ljubojevic writes:

    Perhaps there is a file system that supports compression and would do a good job with the snapshots transparently. Maybe even ZFS or btrfs do?

  • Hi, I agree with Lee. Btrfs actually does sport built-in compression as a mount argument/flag and the delta snapshots works beautifully well but I wouldn’t dare to use it with any kernel previous to 3.13; in fact best practice would be to use it with kernel 3.14.x or later, preferably 3.15.x and later. B.R.

  • Seems to be that 7zip uses LZMA for the 7z archives (though it supports other compression types). Confirmed below.

    Exactly … LZMA … Grab the “xz” package for CentOS 6

    There’s also an –lzma option for tar.

    I am inclined to use xz utils as opposed to 7zip since 7zip comes from a
    3rd party repo.

  • xz needs to be checked for support of block optimization/reuse, I do not think that is part of LZMA protokol (I might be wrong of course). Also, check needs to be made if xz supports multitrheading like pk7zip.

    Btw., p7zip package is in EPEL, should be safe enough,but I understand your concerns.

  • Ok, I spent several minutes on this.

    xz –long-help and man xz are much more informative.

    https://en.wikipedia.org/wiki/Xz:

    “Design xz compresses single files as input, and does not bundle multiple files into a single archive. It is therefore common to compress a file that is itself an archive, such as those created by the tar or cpio Unix programs.[2]”

    “xz is a lossless data compression program and file format which incorporates the LZMA compression algorithm.

    xz can be thought of as a stripped down version of the 7-Zip program. xz has its own file format rather than the .7z format used by 7-Zip (which lacks support for Unix-like file system metadata[2]).”

    https://en.wikipedia.org/wiki/7-Zip:

    “By default, 7-Zip creates 7z format archives with a .7z file extension. Each archive can contain multiple directories and files. As a container format, security or size reduction are achieved using a stacked combination of filters. These can consist of pre-processors, compression algorithms, and encryption filters.”

    So either tar.xz or 7z container, and looks like LZM2 is only option, with several switches for xz for block reduction.

    And, no, –help does not actually help with assessing if xz is viable replacement for some particular purpose, testing is needed.

    P.S. You should REALLY fix your mail client, to sent replies to CentOS@CentOS.org so that conversations is seen by others (it looks like you reply 99% to the person instead back to the list).

  • No, it think it does not. There is a threads option but in the manpage is stated

    … Multithreaded compression and decompression are not implemented yet, so this option has no effect for
    now.

  • OH, that’s a bummer. Thanks, now i do not have to waste time on experimenting.