Comparing Directories Recursively

Home » CentOS » Comparing Directories Recursively
CentOS 14 Comments

What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.

I found a mention of hashdeep on the ‘net which means first running it against the first directory generating a file with checksums and then running it a second time against the second directory using this checksum file. Hashdeep, however, is not in the CentOS repository and, according to the ‘net, is possibly no longer maintained.\

I also found md5deep which seems similar.

Are there other tools for this automatic compare where I am really looking for a list of files that exist in only one place or where checksums do not match?

14 thoughts on - Comparing Directories Recursively

  • diff –brief -r dir1/ dir2/

    might do what you need.

    If you also want to see differences for files that may not exist in either directory:

    diff –brief -Nr dir1/ dir2/

  • The standard unix diff will show if the files are the same or not:

    $diff 1.bin 2.bin Binary files 1.bin and 2.bin differ

    If there is no output from the command, it means that the files have no differences.

    Since you don’t need to know exactly how the files are different (the mere fact that they are different is what you do want to know), that should do it.

  • source:

    find . -type f -exec md5sum \{\} \; > checksum.list

    destination:

    md5sum -c checksum.list

  • Wouldn’t diff be faster because it doesn’t have to read to the end of every file and it isn’t really calculating anything? Or am I looking at this in the wrong way.

  • Hi,

    [snip]

    rsync obviously offers the ‘exist in only one place’ feature but also offers checksum comparisons (in version 3 and higher, I understand)…

    -c, –checksum
    This changes the way rsync checks if the files have been changed
    and are in need of a transfer. Without this option, rsync uses
    a “quick check” that (by default) checks if each file’s size and
    time of last modification match between the sender and receiver.
    This option changes this to compare a 128-bit checksum for each
    file that has a matching size. Generating the checksums means
    that both sides will expend a lot of disk I/O reading all the
    data in the files in the transfer (and this is prior to any
    reading that will be done to transfer changed files), so this
    can slow things down significantly.

    The sending side generates its checksums while it is doing the
    file-system scan that builds the list of the available files.
    The receiver generates its checksums when it is scanning for
    changed files, and will checksum any file that has the same size
    as the corresponding sender’s file: files with either a changed
    size or a changed checksum are selected for transfer.

    Note that rsync always verifies that each transferred file was
    correctly reconstructed on the receiving side by checking a
    whole-file checksum that is generated as the file is trans‐
    ferred, but that automatic after-the-transfer verification has
    nothing to do with this option’s before-the-transfer “Does this
    file need to be updated?” check.

    For protocol 30 and beyond (first supported in 3.0.0), the
    checksum used is MD5. For older protocols, the checksum used is
    MD4.

    Rich.

  • Fast was not communicated as requirement, albeit md5 algo works by design “fast” and it was asked to “to compare file hashes”. Nevertheless I also use diff to compare. It depends on your needs …

  • In article <20171027175431.e265479c4f9b4658fe2179bf@sasktel.net>, Frank Cox wrote:

    If the files are the same (which is what the OP is hoping), then diff does indeed have to read to the end of both files to be certain of this. Only if they differ can it stop reading the files as soon as a difference between them is found.

    Cheers Tony