# The Merkle–Damgård Transform#

Let (Gen, $$h$$ ) be a compression function for inputs of length $$n+n^{\prime} \geq 2 n$$ with output length $$n$$. Fix $$\ell \leq n^{\prime}$$ and $$I V \in\{0,1\}^{n}$$.

Construct hash function (Gen, $$H$$ ) as follows:

• Gen: remains unchanged.

• $$H$$ : on input a key $$s$$ and a string $$x \in\{0,1\}^{*}$$ of length $$L<2^{\ell}$$, do:

1. Append a 1 to $$x$$, followed by enough zeros so that the length of the resulting string is $$\ell$$ less than a multiple of $$n^{\prime}$$. Then append $$L$$, encoded as an $$\ell$$-bit string. Parse the resulting string as the sequence of $$n^{\prime}$$-bit blocks $$x_{1}, \ldots, x_{B}$$.

2. Set $$z_{0}:=I V$$.

3. For $$i=1, \ldots, B$$, compute $$z_{i}:=h^{s}\left(z_{i-1} \| x_{i}\right)$$.

4. Output $$z_{B}$$.

Theorem 1

If $$(\mathrm{Gen}, h)$$ is collision resistant, then so is $$(\mathrm{Gen}, H)$$

Proof. Let $$\mathcal{A}$$ be a probabilistic polynomial-time adversary and $$\varepsilon$$ a function such that $$\mathcal{A}$$ succeeds in the $$\mathrm{Hash}$$-$$\mathrm{coll}$$ experiment with (Gen, $$H$$ ) with probability $$\varepsilon$$. We construct an adversary $$\mathcal{A}^{\prime}$$ that succeeds in the $$\mathrm{Hash}$$-$$\mathrm{coll}$$ experiment with (Gen,$$h)$$ with probability $$\varepsilon$$.

Upon receiving $$s$$, $$\mathcal{A}^{\prime}$$ invokes $$\mathcal{A}$$ upon $$s$$ and receives back a pair $$x, x^{\prime} .$$

We show that for any $$s$$, a collision in $$H^{s}$$ yields a collision in $$h^{s}$$. Let $$x$$ and $$x^{\prime}$$ be two different strings of length $$L$$ and $$L^{\prime}$$, respectively, such that $$H^{s}(x)=H^{s}\left(x^{\prime}\right)$$. Let $$x_{1}, \ldots, x_{B}$$ be the $$B$$ blocks of the padded $$x$$, and let $$x_{1}^{\prime}, \ldots, x_{B^{\prime}}^{\prime}$$ be the $$B^{\prime}$$ blocks of the padded $$x^{\prime}$$. Let $$z_{0}, z_{1}, \ldots, z_{B}$$ (resp., $$z_{0}^{\prime}, z_{1}^{\prime}, \ldots, z_{B^{\prime}}^{\prime}$$ ) be the intermediate results during computation of $$H^{s}(x)$$ (resp., $$\left.H^{s}\left(x^{\prime}\right)\right)$$. There are two cases to consider:

Case 1: $$L \neq L^{\prime}$$. In this case, the last step of the computation of $$H^{s}(x)$$ is $$z_{B}:=h^{s}\left(z_{B-1} \| x_{B}\right)$$, and the last step of the computation of $$H^{s}\left(x^{\prime}\right)$$ is $$z_{B^{\prime}}^{\prime}:=h^{s}\left(z_{B^{\prime}-1}^{\prime} \| x_{B^{\prime}}^{\prime}\right)$$. Since $$H^{s}(x)=H^{s}\left(x^{\prime}\right)$$ we have $$h^{s}\left(z_{B-1} \| x_{B}\right)=$$ $$h^{s}\left(z_{B^{\prime}-1}^{\prime} \| x_{B^{\prime}}^{\prime}\right)$$. However, $$L \neq L^{\prime}$$ and so $$x_{B} \neq x_{B^{\prime}}^{\prime}$$. (Recall that the last $$\ell$$ bits of $$x_{B}$$ encode $$L$$, and the last $$\ell$$ bits of $$x_{B^{\prime}}^{\prime}$$ encode $$L^{\prime}$$.) Thus, $$z_{B-1} \| x_{B}$$ and $$z_{B^{\prime}-1}^{\prime} \| x_{B^{\prime}}^{\prime}$$ are a collision with respect to $$h^{s}$$.

Case 2: $$L=L^{\prime}$$. This means that $$B=B^{\prime}$$. Let $$I_{i} \stackrel{\text { def }}{=} z_{i-1} \| x_{i}$$ denote the $$i$$ th input to $$h^{s}$$ during computation of $$H^{s}(x)$$, and define $$I_{B+1} \stackrel{\text { def }}{=} z_{B}$$. Define $$I_{1}^{\prime}, \ldots, I_{B+1}^{\prime}$$ analogously with respect to $$x^{\prime}$$. Let $$N$$ be the largest index for which $$I_{N} \neq I_{N}^{\prime}$$. Since $$|x|=\left|x^{\prime}\right|$$ but $$x \neq x^{\prime}$$, there is an $$i$$ with $$x_{i} \neq x_{i}^{\prime}$$ and so such an $$N$$ certainly exists. Because

$I_{B+1}=z_{B}=H^{s}(x)=H^{s}\left(x^{\prime}\right)=z_{B}^{\prime}=I_{B+1}^{\prime},$

we have $$N \leq B$$. By maximality of $$N$$, we have $$I_{N+1}=I_{N+1}^{\prime}$$ and in particular $$z_{N}=z_{N}^{\prime}$$. But this means that $$I_{N}, I_{N}^{\prime}$$ collide under $$h^{s}$$.

• If $$H^{s}(x) \neq H^{s}\left(x^{\prime}\right)$$ then $$\mathcal{A}^{\prime}$$ just halts.

Otherwise,

• IF $$L \neq L^{\prime}$$ then $$\mathcal{A}^{\prime}$$ outputs $$z_{B} \| L$$ and $$z_{B^{\prime}}^{\prime} \| L^{\prime}$$ and halts.

• IF $$L=L^{\prime}$$, then $$\mathcal{A}^{\prime}$$ works backwards from the last block to the first to find the maximal $$N$$.

• Then, $$\mathcal{A}^{\prime}$$ outputs $$I_N$$ and $$I^{\prime}_N$$ and halts.

Note that $$\mathcal{A}^{\prime}$$ can compute all the $$z$$-values by computing $$H$$ and storing all the intermediate values. So whenever $$\mathcal{A}$$ finds a collision, $$\mathcal{A}^{\prime}$$ also finds a collision. Thus, $$\mathcal{A}^{\prime}$$ succeeds in $$\mathrm{Hash}$$-$$\mathrm{coll}$$ for (Gen, $$\left.h\right)$$ with probability $$\varepsilon$$, implying that $$\varepsilon(n)$$ is negligible. Thus, (Gen, $$H)$$ is collision-resistant as required.