yes but depends on the usage, if you are comparing files to find a match (lets say trying to find a file in a database of millions of files) - you hash the file then search for files that have the first N characters in the hash. it's "fuzzy" but performs better within reasonable certainty
if you're checking something like a password or something more secure like a key, yes you need to match the entire string because it's much easier to brute force and get a partial string correct.
so to answer the original question it's really about accepting your level of risk vs. performance/convenience benefits.
for total accuracy you have to check the whole string, and even that is only a mathematical probability that you have the correct file. as someone mentioned already, it is now possible to have two files generate the same md5 sum (md5 is broken - known as "collision") however extremely improbable.
Source: Git (sha1 but still relevant) - https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection#Short-SHA-1
Generally, eight to ten characters are more than enough to be unique within a project. As an example, the Linux kernel, which is a pretty large project with over 450k commits and 3.6 million objects, has no two objects whose SHA-1s overlap more than the first 11 characters.
[–] captbrogers ago (edited ago)
Yes, cryptography is just math. Very advanced math, but it is math. If you forget to carry a number when adding two numbers of giant fractions, how much can it affect the outcome of the equation?
That's just a math equation with two variables and one operation, now imagine a very complex formula with dozens of variables doing multiple operations. If one number is off at all, how much can it affect the outcome? That is the reason it has to check the entire string (or file, whatever the case may be). If it didn't check the entire string, something could be changed somewhere and then it would defeat the purpose of using it to check integrity. When checking that two things match, it doesn't help much to only partially check thing.
[–] omegletrollz 0 points 1 point 1 point (+1|-0) ago
Yes it is necessary to check the entire string. Chances are very high that the entire string will change for just one bit but it's not guaranteed - maybe only the last character will change. Also if I recall correctly it's not even guaranteed that two files with the same md5sum are actually the same - I know, mind blowing but it's one of those kind of things that are as likely to happen as a solar flare destroying all life on Earth. You can be pretty sure it won't occur but it could. It's just something that is bound to happen if you reduce files that could have gigabytes of data into 32 characters (roughly 32 bytes). For most uses you'll be fine but don't use md5 as a map index or anything of the kind.
[–] 4324992? 0 points 1 point 1 point (+1|-0) ago
I heard Brian Greene on the YMIW podcast give a similar answer... he had been asked at a conference if there is any truth to the statement that 'hot water freezes faster than cold water.'
He said he wishes people with access to hot water, cold water, and a freezer would actually try to find out on their own rather than driving to a conference and waiting for the Q&A section to ask someone else.
[–] abear 0 points 1 point 1 point (+1|-0) ago
I love that i got downvoted in a linux thread for giving a proper answer. how fuckin hard is it to change a bit in a file and run an md5sum against it? less time than it would take to post a shitty question on here. this is the essence of linux and yet we're still coddling imbeciles.
[–] DrBunsen ago
I do not understand the question?
But md5sums are generated by all the data of a given file or string, so leaving one bit out changes the output.