Remote Backup

Home » CentOS » Remote Backup
CentOS 19 Comments

Hi list, i’ve need to backup a partition of ~200GB with a local connection of 8/2
mbps.

Tool like bacula, amanda can’t help me due to low bandwidth in local server.

I’m thinking rsync will be a good choice.

What do you think about?

Thanks in advance.

19 thoughts on - Remote Backup

  • If you want pseudo-snapshots (not real point-in-time snapshots) you can use rsnapshot or BackupPC, both of which use rsync under the hood. You get the advantages of rsync along with having an archive of previous backups.

    –keith

  • Il 07/06/2016 21:35, Keith Keller ha scritto:
    Thank you for your reply and sorry for late.

    My needs is only get a copy of large dataset a make sure that it is not broken after transfer. After transfer, this data will be stored on local backup server where there is bacula installation.

    For file transfer, to save time and bandwidth I will use rsync but I
    don’t know how to check if those file will be corrupted.

    How I can perform this check?
    I can make an md5 for each file but for a great number of file this can be a problem.

    Thanks in advance.

  • Will this be very slow if Alessandro has a large number of files? OTOH
    if he really needs to ensure integrity there likely isn’t a better option.

    –keith

  • where Rsync falls down, is if you need a point in time snapshot… Rsync processes one file at a time, so if the files are being updated while its running, the differnet files will be copied at different times.
    This is usually fine for static archives of files and such, but unsuitable for a database server where random updates are being made of various files and they all have to be consistent.

  • When databases are concerned, I would never rely on a snapshot of their storage files. Either stop relevant daemon(s), then do fs snapshot, or better though do dbdump and restore databases from dump when you need to restore it. Also: databases usually have “hold transactions” flag or similar, post this flag before making dump, and remove flag after dump has been done. This last will ensure consistent state of everything in your dump. I usually use combination: I do dbdump, and back up these dump files on regular backup schedule (and exclude db files from backup).

    I hope, this helps.

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • for PostgreSQL, you can use rsync style copies of the file system if you bracket the rsync in pg_start_backup(); and pg_stop_backup() calls.
    dumps (pg_dump) are fine for smaller databases, but become really unwieldy for very large ones, and a straight database file system copy like rsync is required to initialize a streaming replication slave
    (although this can be done with the pg_basebackup command line util, there’s times when doing it manually is appropriate).

    filesystem level snapshots such as provided by ZFS are also very workable for such databases, as they are point-in-time views of the filesystem. even if transactions are in process when the snapshot is made, if its later restored and the database server is restarted, the results are exactly the same as if the reset button had been pulled at that instant, postgres does a transaction recovery/cleanup, and resumes normal operation, with any committed transactions intact, and any transactions that were in progress rolled back.

  • This.

    If your DBMS does not recover a filesystem-level snapshot, you should stop using that DBMS because it will fail you under other real-world error scenarios, too.

    This is not specific to ZFS. The same can be said for UFS2, btrfs, NTFS, ReFS…

    (The latter two via Volume Shadow Copy Service, a feature of Windows often used specifically for its ability to snapshot a running DBMS store.)

  • Yes, but no slower than any other method of checking the files on each side for corruption.

  • Dumping and restoring files can be *really* slow, so “better” is highly subjective.

    Instead, you could quiesce your databases to get a filesystem snapshot.
    I wrote a framework for doing this that is filesystem and application agnostic, which I mentioned in a previous message.

  • Gordon Messmer wrote:

    That’s what “middle of the night maintenance window” is for.

    mark

  • in today’s 24/7 global business world, there is no middle of the night, its midday *somewhere*.

  • Agree. In my case I used it in a meaning of least questionable consistency of database records (and least questionable again from my own point of view of a sysadmin who definitely has restricted knowledge of database server in question internals). I didn’t write it in that level of detail, sorry: usually I’m tempted to write as short and simple as I can – at the expense of accuracy -, just to save effort to whoever is kind enough to read what I wrote ;-)

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • John R Pierce wrote:
    You pick the least buy day/time. For example, when I was supporting the City of Chicago 911 system, 16 years ago, they’d let us schedule software upgrades for Mon night, between 02:00 and 06:00. Tuesday, and then Thursday, were second and third choice – that’s when there were the least number of calls. And allow me to venture to suggest that life-or-death calls to a 911 ctr (the calltakers went back to cards during that time)
    wasn’t more important than world-wide business….

    Besides, if it’s that big a business, management *really* needs to spring for a hot spare complete system, to deal with hardware outages, and there should be mirrored d/bs, and *those* could be taken down, copied, and then brought back online and all the transactions that had been done while it was down updated to the backup.

    mark

  • But how many directors and ‘management’ of IT know anything about practical IT or understand the concept of effective, reliable and essential resilience despite claiming to be ‘computer professionals’ ?

  • Is there any chance you could switch to running ZFS or maybe BTRFS? They are ridiculously more efficient at sending reliable, incremental updates to a file system over a low bandwidth link and the load of rsync “at scale” can be enormous.

    In our case, with about half a billion files, doing rsync over a local Gb LAN
    took well over 24 hours – simply doing the discovery stage of rsync was nearly all the overhead due to IOPs limitations.

    Switching our primary backup method to using ZFS and send/receive of incremental snapshots cut the time to backup/replicate to under 30 minutes, with no significant change in server load.

    And don’t let the name “incremental snapshots” fool you – the end result is identical to doing a full backup / copy, with all files being verified as binary perfect as of the moment the snapshot was made.

    Really, if you can do this and you care about your data, you want to do this, even if you don’t know it yet. The learning curve is significant but the results are well worth it.