MD5 해시 값을 되돌릴 수없는 이유는 무엇입니까?

Question 1

제가 항상 궁금해했던 한 가지 개념은 암호화 해시 함수와 값의 사용입니다. 이러한 함수가 고유하고 사실상 되돌릴 수없는 해시 값을 생성 할 수 있다는 것을 이해합니다.하지만 여기에 항상 궁금했던 것이 있습니다.

내 서버에서 PHP에서 다음을 생성합니다.

md5("stackoverflow.com") = "d0cc85b26f2ceb8714b978e07def4f6e"

MD5 함수를 통해 동일한 문자열을 실행하면 PHP 설치에서 동일한 결과를 얻을 수 있습니다. 일부 시작 값에서 일부 값을 생성하는 데 프로세스가 사용됩니다.

이것은 무슨 일이 일어나고 있는지를 분해하고 해시 값을 역전시킬 수있는 방법이 있다는 것을 의미하지 않습니까?

결과 문자열을 다시 추적 할 수 없게 만드는 이러한 함수는 무엇입니까?

Question 2

The input material can be an infinite length, where the output is always 128 bits long. This means that an infinite number of input strings will generate the same output.

If you pick a random number and divide it by 2 but only write down the remainder, you'll get either a 0 or 1 -- even or odd, respectively. Is it possible to take that 0 or 1 and get the original number?

Question 3

If hash functions such as MD5 were reversible then it would have been a watershed event in the history of data compression algorithms! Its easy to see that if MD5 were reversible then arbitrary chunks of data of arbitrary size could be represented by a mere 128 bits without any loss of information. Thus you would have been able to reconstruct the original message from a 128 bit number regardless of the size of the original message.

Question 4

Contrary to what the most upvoted answers here emphasize, the non-injectivity (i.e. that there are several strings hashing to the same value) of a cryptographic hash function caused by the difference between large (potentially infinite) input size and fixed output size is not the important point – actually, we prefer hash functions where those collisions happen as seldom as possible.

Consider this function (in PHP notation, as the question):

function simple_hash($input) {
     return bin2hex(substr(str_pad($input, 16), 0, 16));
}

This appends some spaces, if the string is too short, and then takes the first 16 bytes of the string, then encodes it as hexadecimal. It has the same output size as an MD5 hash (32 hexadecimal characters, or 16 bytes if we omit the bin2hex part).

print simple_hash("stackoverflow.com");

This will output:

737461636b6f766572666c6f772e636f6d

This function also has the same non-injectivity property as highlighted by Cody's answer for MD5: We can pass in strings of any size (as long as they fit into our computer), and it will output only 32 hex-digits. Of course it can't be injective.

But in this case, it is trivial to find a string which maps to the same hash (just apply hex2bin on your hash, and you have it). If your original string had the length 16 (as our example), you even will get this original string. Nothing of this kind should be possible for MD5, even if you know the length of the input was quite short (other than by trying all possible inputs until we find one that matches, e.g. a brute-force attack).

The important assumptions for a cryptographic hash function are:

it is hard to find any string producing a given hash (preimage resistance)
it is hard to find any different string producing the same hash as a given string (second preimage resistance)
동일한 해시 (충돌 저항)를 가진 문자열 쌍을 찾기가 어렵습니다.

분명히 내 simple_hash기능은 이러한 조건을 충족하지 않습니다. (사실, 입력 공간을 "16 바이트 문자열"로 제한하면 내 함수가 주입 형이되어 2 차 이미지 및 충돌 방지가 입증 될 수 있습니다.)

이제 MD5에 대한 충돌 공격이 존재합니다 (예 : 동일한 접두사를 사용하여 동일한 해시를 사용하고 상당한 작업을 수행하지만 불가능하지는 않은 작업으로도 문자열 쌍을 생성 할 수 있음). 따라서 사용하지 않아야합니다. 중요한 모든 것을위한 MD5. 아직 사전 이미지 공격은 없지만 공격은 더 나아질 것입니다.

실제 질문에 답하려면 :

결과 문자열을 다시 추적 할 수 없게 만드는 함수는 무엇입니까?

MD5 (및 Merkle-Damgard 구조에 구축 된 기타 해시 함수)가 효과적으로 수행하는 작업은 결과 암호문을 해시로 사용하여 메시지를 키로, 고정 값을 "일반 텍스트"로 사용하여 암호화 알고리즘을 적용하는 것입니다. (그 전에는 입력이 패딩되고 블록으로 분할되며,이 각 블록은 이전 블록의 출력을 암호화하는 데 사용되며, 역 계산을 방지하기 위해 입력과 XOR 처리됩니다.)

최신 암호화 알고리즘 (해시 함수에 사용되는 알고리즘 포함)은 일반 텍스트와 암호문 (또는 공격자가 둘 중 하나를 선택하는 경우에도)이 주어 지더라도 키를 복구하기 어렵게 만드는 방식으로 만들어졌습니다. 일반적으로 각 출력 비트가 각 키 비트 (여러 번) 및 각 입력 비트에 의해 결정되는 방식으로 많은 비트 셔플 링 작업을 수행합니다. 이렇게하면 전체 키와 입력 또는 출력을 알고있는 경우에만 내부에서 일어나는 일을 쉽게 되돌아 갈 수 있습니다.

MD5와 같은 해시 함수 및 사전 이미지 공격 (일을 쉽게하기 위해 단일 블록 해시 문자열 사용)의 경우 암호화 함수의 입력 및 출력 만 있고 키는 없습니다 (찾고있는 것입니다).

Question 5

Cody Brocious's answer is the right one. Strictly speaking, you cannot "invert" a hash function because many strings are mapped to the same hash. Notice, however, that either finding one string that gets mapped to a given hash, or finding two strings that get mapped to the same hash (i.e. a collision), would be major breakthroughs for a cryptanalyst. The great difficulty of both these problems is the reason why good hash functions are useful in cryptography.

Question 6

MD5 does not create a unique hash value; the goal of MD5 is to quickly produce a value that changes significantly based on a minor change to the source.

E.g.,

"hello" -> "1ab53"
"Hello" -> "993LB"
"ZR#!RELSIEKF" -> "1ab53"

(Obviously that's not actual MD5 encryption)

Most hashes (if not all) are also non-unique; rather, they're unique enough, so a collision is highly improbable, but still possible.

Question 7

해시 알고리즘을 생각하는 좋은 방법은 Photoshop에서 이미지 크기를 조정하는 것입니다. 현재 가지고있는 것은 여전히 원본 이미지의 표현이지만 훨씬 더 작고 이미지 데이터의 특정 부분을 효과적으로 "버려서"더 작은 크기에 맞 춥니 다. 따라서 32x32 이미지의 크기를 5000x5000으로 다시 조정하면 흐릿한 엉망이됩니다. 그러나 32x32 이미지는 그다지 크지 않기 때문에 이론적으로는 정확히 동일한 픽셀을 생성하기 위해 다른 이미지를 축소 할 수 있습니다.

이는 비유 일 뿐이지 만 해시가 수행하는 작업을 이해하는 데 도움이됩니다.

Question 8

A hash collision is much more likely than you would think. Take a look at the birthday paradox to get a greater understanding of why that is.

Question 9

As the number of possible input files is larger than the number of 128-bit outputs, it's impossible to uniquely assign an MD5 hash to each possible.

Cryptographic hash functions are used for checking data integrity or digital signatures (the hash being signed for efficiency). Changing the original document should therefore mean the original hash doesn't match the altered document.

These criteria are sometimes used:

Preimage resistance: for a given hash function and given hash, it should be difficult to find an input that has the given hash for that function.
Second preimage resistance: for a given hash function and input, it should be difficult to find a second, different, input with the same hash.
Collision resistance: for a given has function, it should be difficult to find two different inputs with the same hash.

이러한 기준은 주어진 해시와 일치하는 문서를 찾기 어렵게하기 위해 선택됩니다. 그렇지 않으면 원본을 해시와 일치하는 문서로 대체하여 문서를 위조 할 수 있습니다. (교체 내용이 횡설수설 인 경우에도 원본을 교체하는 것만으로도 중단 될 수 있습니다.)

3 번은 2 번을 의미합니다.

특히 MD5의 경우 결함이있는 것으로 나타났습니다. How to break MD5 및 기타 해시 함수 .

Question 10

But this is where rainbow tables come into play. Basically it is just a large amount of values hashed separetely and then the result is saved to disk. Then the reversing bit is "just" to do a lookup in a very large table.

Obviously this is only feasible for a subset of all possible input values but if you know the bounds of the input value it might be possible to compute it.

Question 11

중국 과학자는 서로 다른 두 문자열 사이에 충돌을 일으키는 "선택 접두사 충돌"이라는 방법을 발견했습니다.

예 : http://www.win.tue.nl/hashclash/fastcoll_v1.0.0.5.exe.zip
소스 코드 : http://www.win.tue.nl/hashclash/fastcoll_v1.0.0. 5_source.zip

Question 12

가장 많이 투표 된 답변이 무엇을 의미하는지 이해하는 가장 좋은 방법은 실제로 MD5 알고리즘을 되 돌리는 것입니다. 분명히 불가능하기 때문에 원본 메시지를 복구하는 것이 아니라 원본 해시와 동일한 해시를 생성하는 메시지를 생성하기 위해 몇 년 전에 MD5crypt 알고리즘 을 되돌리려했던 것을 기억 합니다. 이것은 적어도 이론적으로는 원래 메시지를 사용하는 대신 생성 된 메시지 (암호)를 사용하여 / etc / passwd 파일에 user : password를 저장 한 Linux 장치에 로그인하는 방법을 제공합니다. 두 메시지 모두 동일한 결과 해시를 가지므로 시스템은 내 암호 (원래 해시에서 생성됨)를 유효한 것으로 인식합니다. 전혀 작동하지 않았습니다. 몇 주 후에 올바르게 기억하면 소금 처음 메시지에서 합니다. 나는 유효한 초기 메시지를 생성 할뿐만 아니라 솔트 처리 된 유효한 초기 메시지를 생성해야했지만 결코 할 수 없었습니다. 하지만이 실험에서 얻은 지식은 좋았습니다.

Question 13

대부분이 이미 말했듯이 MD5는 가변 길이 데이터 스트림이 고정 길이 데이터 청크로 해시되도록 설계되었으므로 단일 해시가 많은 입력 데이터 스트림에서 공유됩니다.

그러나 체크섬에서 원본 데이터를 찾아야하는 경우 (예 : 암호 해시가 있고 원래 암호를 찾아야하는 경우) 해시를 Google (또는 원하는 검색 자) 만 검색하는 것이 더 빠릅니다. 무차별 대입보다 대답을 위해. 이 방법을 사용하여 몇 가지 암호를 성공적으로 찾았습니다.

Question 14

정의에 따라 Hash (cryptographic Hash) 함수 : 가역적이지 않아야하며 충돌이 없어야합니다 (최소한 가능).

regd 귀하의 질문 : 단방향 해시입니다. 입력 (길이에 관계없이)은 고정 된 크기의 출력을 생성합니다 (알고 (MD5의 경우 512 비트 경계)에 따라 패딩 됨). 정보는 압축 (손실)되어 역변환에서 생성 할 수 없습니다.

MD5에 대한 추가 정보 : 충돌에 취약합니다. 이 기사를 최근에 읽었습니다. http://www.win.tue.nl/hashclash/Nostradamus/

암호화 해시 구현 (MD5 및 SHA)을위한 오픈 소스 코드는 Mozilla 코드에서 찾을 수 있습니다. (freebl 라이브러리).

Question 15

이제 며칠 동안 MD5 해시 또는 해당 문제에 대한 다른 해시가 가능한 모든 문자열에 대해 미리 계산되고 쉽게 액세스 할 수 있도록 저장됩니다. 이론상 MD5는 되돌릴 수 없지만 이러한 데이터베이스를 사용하면 어떤 텍스트가 특정 해시 값을 생성했는지 알 수 있습니다.

예를 들어 http://gdataonline.com/seekhash.php 에서 다음 해시 코드를 시도하여 해시 를 계산하는 데 사용한 텍스트를 찾으십시오.

aea23489ce3aa9b6406ebb28e0cda430

Question 16

f (x) = 1은 되돌릴 수 없습니다. 해시 함수는 되돌릴 수 없습니다.

이것은 실제로 누군가가 해시 된 데이터의 손상되지 않은 사본을 소유하고 있는지 여부를 결정하는 기능을 수행하는 데 필요 합니다. 이것은 무차별 대입 공격에 대한 취약성을 가져 오며, 요즘에는 특히 MD5에 대해 매우 강력합니다.

수학적 지식은 있지만 암호 해독 지식은 거의없는 사람들 사이에서도 여기와 다른 곳에서 혼란이 있습니다. 여러 암호는 단순히 키 스트림을 사용하여 데이터를 XOR하므로 키 스트림을 사용할 수 있기 때문에 암호 텍스트가 해당 길이의 모든 일반 텍스트에 해당한다고 말할 수 있습니다.

그러나 이것은 시드에서 생성 된 합리적인 평문 이 두 번째가 가능성이라고 주장하는 사람이 비웃을 정도로 password시드 Wsg5Nm^bkI4EgxUOhpAjTmTjO0F!VkWvysS6EEMsIJiTZcvsh@WI$IH$TYqiWvK!%&Ue&nk55ak%BX%9!NnG%32ftud%YkBO$U6o에 의해 생성 된 다른 텍스트 보다 훨씬 더 가능성이 높다는 것을 무시합니다 .

당신이 두 개의 잠재적 인 암호 사이에 결정하려고하는 경우 같은 방식으로, password그리고 Wsg5Nm^bkI4EgxUO어떤 수학자가 당신이 믿는 것, 그것은 할 어려운 것처럼 아니다.

Question 17

나는 모든 다양한 주장을 좋아합니다. 해시 된 값의 실제 값은 암호와 같은 문자열에 사람이 읽을 수없는 자리 표시자를 제공하는 것입니다. 특별히 향상된 보안 이점이 없습니다. 공격자가 해시 된 암호가있는 테이블에 대한 액세스 권한을 얻은 경우 다음을 수행 할 수 있습니다.

자신이 선택한 암호를 해시하고 테이블에 대한 쓰기 / 편집 권한이있는 경우 암호 테이블에 결과를 배치합니다.
공통 암호의 해시 값을 생성하고 암호 테이블에 유사한 해시 값이 있는지 테스트합니다.

이 경우 취약한 암호는 해시된다는 사실만으로는 보호 할 수 없습니다.