Comparing Directories Recursively

Home » CentOS » Comparing Directories Recursively

October 27, 2017 Hakan CentOS 14 Comments

What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.

I found a mention of hashdeep on the ‘net which means first running it against the first directory generating a file with checksums and then running it a second time against the second directory using this checksum file. Hashdeep, however, is not in the CentOS repository and, according to the ‘net, is possibly no longer maintained.\

I also found md5deep which seems similar.

Are there other tools for this automatic compare where I am really looking for a list of files that exist in only one place or where checksums do not match?

14 thoughts on - Comparing Directories Recursively

Frank Cox says:

October 27, 2017 at 4:36 pm

diff –brief -r dir1/ dir2/

might do what you need.

If you also want to see differences for files that may not exist in either directory:

diff –brief -Nr dir1/ dir2/
Hakan says:

October 27, 2017 at 4:52 pm

But is diff not best suited for text files?
Frank Cox says:

October 27, 2017 at 5:24 pm

The standard unix diff will show if the files are the same or not:

$diff 1.bin 2.bin Binary files 1.bin and 2.bin differ

If there is no output from the command, it means that the files have no differences.

Since you don’t need to know exactly how the files are different (the mere fact that they are different is what you do want to know), that should do it.
Leon Fauster says:

October 27, 2017 at 5:47 pm

source:

find . -type f -exec md5sum \{\} \; > checksum.list

destination:

md5sum -c checksum.list
Frank Cox says:

October 27, 2017 at 6:54 pm

Wouldn’t diff be faster because it doesn’t have to read to the end of every file and it isn’t really calculating anything? Or am I looking at this in the wrong way.
Rich says:

October 28, 2017 at 7:10 am

Hi,

[snip]

rsync obviously offers the ‘exist in only one place’ feature but also offers checksum comparisons (in version 3 and higher, I understand)…

-c, –checksum
This changes the way rsync checks if the files have been changed
and are in need of a transfer. Without this option, rsync uses
a “quick check” that (by default) checks if each file’s size and
time of last modification match between the sender and receiver.
This option changes this to compare a 128-bit checksum for each
file that has a matching size. Generating the checksums means
that both sides will expend a lot of disk I/O reading all the
data in the files in the transfer (and this is prior to any
reading that will be done to transfer changed files), so this
can slow things down significantly.

The sending side generates its checksums while it is doing the
file-system scan that builds the list of the available files.
The receiver generates its checksums when it is scanning for
changed files, and will checksum any file that has the same size
as the corresponding sender’s file: files with either a changed
size or a changed checksum are selected for transfer.

Note that rsync always verifies that each transferred file was
correctly reconstructed on the receiving side by checking a
whole-file checksum that is generated as the file is trans‐
ferred, but that automatic after-the-transfer verification has
nothing to do with this option’s before-the-transfer “Does this
file need to be updated?” check.

For protocol 30 and beyond (first supported in 3.0.0), the
checksum used is MD5. For older protocols, the checksum used is
MD4.

Rich.
Leon Fauster says:

October 28, 2017 at 8:10 am

Fast was not communicated as requirement, albeit md5 algo works by design “fast” and it was asked to “to compare file hashes”. Nevertheless I also use diff to compare. It depends on your needs …
Hakan says:

October 28, 2017 at 11:30 am

I did end up using diff which seemed to work well.
Hakan says:

October 28, 2017 at 11:31 am

Thank you, this time I used diff.
Hakan says:

October 28, 2017 at 11:32 am

Thank you, saving this for the future.
Hakan says:

October 28, 2017 at 11:33 am

Ok!
Hakan says:

October 28, 2017 at 11:35 am

Great, used as suggested!
Tony Mountifield says:

October 30, 2017 at 5:27 am

In article <20171027175431.e265479c4f9b4658fe2179bf@sasktel.net>, Frank Cox wrote:

If the files are the same (which is what the OP is hoping), then diff does indeed have to read to the end of both files to be certain of this. Only if they differ can it stop reading the files as soon as a difference between them is found.

Cheers Tony
Lamar Owen says:

November 7, 2017 at 8:21 am

I typically use ‘rsync -av -c –dry-run ${dir1}/ ${dir2}/ ‘ (or some variation) for this.

Comparing Directories Recursively

14 thoughts on - Comparing Directories Recursively

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta