cryptcoffee

Last Update: Dec 16, 2015
          -----[ Exploiting some weaknesses of PBKDF2 ]-----
----[ 0x00 ] Intro
----[ 0x01 ] PBKDF
----[ 0x02 ] SHA1
----[ 0x03 ] HMAC
----[ 0x04 ] RSA Labs' PBKDF2
----[ 0x05 ] Weaknesses
---------[+] Precomputing a message block
---------[+] Zero based
---------[+] Same value based
---------[+] Precomputing a word expansion
----[ 0x06 ] Conclusions
----[ 0x07 ] References



----[ 0x00 ] Intro

This article is based on a project we started in 2014 @ CLUB (Cryptography and 
Coding Laboratory) @ UNIMI. It has been presented in CANS 2015 [12] and 
published in [13].


+++ The paper can be found here +++


----[ 0x01 ] PBKDF

For several known reasons, in password-based cryptography a humanly manageable
password can't be directly used as a key for a given cryptographic system;
a Password-Based Key Derivation Function (PBKDF), is then used to generate a
proper cryptographic key of the desired length, starting from a user password.
PBKDFs are also used, instead of a simple hash function, for password hashing,
although they weren't originally designed for this purpose.

Generally speaking, a Key Derivation Function, KDF, is a function that take a
source of initial keying material, and derive from it one or more pseudorandom
keys. Key material can be any bit sequence from a source of more or less entropy
[1]. When the initial key material is from a source of low entropy,
specifically a user typed password, these KDFs are known as PBKDFs,
password-based KDFs.

PBKDFs should be relatively expensive to compute, in order to increase the 
computational effort of an exhaustive search attack. In cryptography this is
known as key stretching, a phrase that is now widely used the same as
key strengthening [2].
A typical approach consists in using a salt to prevent building universal
dictionaries, and an iteration count that specifies how many times the
underlying function should be iterated in order to slow down the computation of
the derived key.



----[ 0x2 ] SHA-1

We will not completely explain SHA-1 here, for those who are not familiar with
this hash function please refer to [9]. However to make a recap, in a
pre-processing phase SHA1 appends one bit set to 1 to the original message then
pad it with some bit set to 0, until the resulting message length is congruent
to 448 (mod 512), then appends L, the length of the original message,
represented in a 64-bit big-endian integer.
Therefore the message length is multiple of 512 bits.

The message is broken up into message blocks of 512 bits and then SHA1 processes
each message block as follow:

+ Divides it in sixteen 32bit words
+ Expands this 16 words to a total of 80 through bitwise operation like
  circular shift and exclusive-or
+ Compresses down this 80 words into 5 words applying some other functions
+ These 5 words are known as intermediate hash values

Now the next message block is processed, and each of its intermediate hash value
is added to each one calculated in the previous message block, and so on.
When the last message block is consumed, a concatenation of these 5 words is
returned, consisting in a 160 bit hash value.



----[ 0x3 ] HMAC

HMAC (Hash-based Message Authentication Code) is a specific algo for
calculating a message authentication code (MAC) involving a cryptographic hash
function and a secret key. As any MAC, it provides a way to verify both the
integrity and the authentication of a message.
HMAC can be used with any iterative cryptographic hash function, in combination
with a secret shared key. The cryptographic strength of HMAC depends on the
properties of the underlying hash function [5].
In 2006 M. Bellare proves that HMAC is a PRF under the sole assumption that the
underlying compression function is a PRF [11].
A Pseudo Random Function (PRF), is a function which emulates a random oracle:
no efficient algorithm can distinguish between a PRF and a random oracle output.
Typically, MACs are used between two parties that share a secret key in order to
validate information transmitted between these parties, but in PBKDF2 HMAC is
used as a PRF.

HMAC can be defined as follow:

	HMAC = H(K xor opad, H(K xor ipad, text))

where:
	
	H	is the hash function
	B	is the byte-length of H's internal block. 
	K	is the shared key
	ipad	is the byte 0x36 repeated B times
	opad	is the byte 0x5C repeated B times.
	text	is the message to be authenticated
	
Read the definition of HMAC as follow:

	h = H(K xor ipad || text)
	HMAC = H(K xor opad || h)

If K is shorter than B it's padded with 0s to fit the block size, before xoring
it with either ipad or opad.



----[ 0x04 ] RSA Labs' PBKDF2

RSA Laboratories, in PKCS5, defines widely referenced recommendations for the 
implementation of password-based cryptography [3].
They recommend one algorithm to derive keys: PBKDF2. It is commonly adopted in
many security related applications, for instance authentication on mobile
devices, key derivation for full disk encryption and password hashing.
[cite del paper di LUKS?]

PBKDF2 takes a user passphrase p, a random salt s, an iteration counter c,
and dkLen, the derived key length, and return the derived key DK.

	DK = PBKDF2(p,s,c,dkLen)

Unlike the earlier standard, PBKDF2 can derives keys of arbitrary length,
it generates as many block Ti as needed to cover dkLen, the desired key length.

Each Ti is generated iterating the underlying PRF as many times as specified by
an iteration counter c.
The length of the Ti is bounded by hLen, the length of the PRF output.


The DK is the concatenation of i blocks where i = ceiling(dkLen/hLen).

	DK = T1 || T2 || ... || Ti

where:

	T1 = F(p,s,c,1)
	T2 = F(p,s,c,2)
	.
	.
	.
	Ti = F(p,s,c,i)

Each Ti is computed as:

	Ti = U1 xor U2 xor ... xor Uc

where:

	U1 = PRF(p,S||i)
	U2 = PRF(p,U1)
	.
	.
	.
	Uc = PRF(p,Uc-1)


The numbers of iterations will be a burning question, choosing a reasonable 
value depends on environment and varies from an application to another, it 
should be selected as large as possible, as long the time required to generate 
the key is acceptable for the user.
For very critical keys on very powerful system an iteration count of 10.000.000 
may be appropriate, while a minimum of 1000 iterations is recommended for 
general purpose [4].

Unlike the password, the salt and the iteration count do not need to be kept
secret.

We focus on PBKDF2 with HMAC [5], we show that is possible to avoid most of the
CPU intensive operations which are the core of the PBKDF2 derivation process,
by substituting them with a set of equivalent less onerous instructions, this
way the algorithm strength is significantly reduced.
This entail a consequent decrease of the time one needs to spend performing any
such exhaustive key search attack on password derived keys.
Note that the time needed to try all password derived keys is significantly less
then the time needed to try all possible keys.

The same method can be used with HMAC with many hash functions such as SHA-2
and MD5 but for simplicity we focus on HMAC-SHA1, that is the default for
PBKDF2.

This way it is possible to derive a key in HALF TIME with respect to the
vanilla method.

Moreover we present some other minor optimizations that reduce even more the
derivation time. As a consequence an exhaustive key search attack against a key
derived with PBKDF2 is significantly less expensive, and this is an additional
reason to look for new password based key derivation schemes [6].



----[ 0x5 ] Weaknesses

As we previously said PBKDF2 aims to slow down brute force attacks through CPU
intensive operations in the key derivation process.

In this section we present different improvements in the computation of the
PBKDF2 algorithm. Some of these concern the precomputation of some values,
that can be reused instead of recalculating them during the derivation process.
These precomputations are independent from the cryptographic hash function used
in HMAC, because they are applied directly to the HMAC definition;
other optimizations aim to reduce the number of instructions avoiding useless
operation exactly in the HMAC's hash function, these are clearly dependent from
the specific hash function adopted. 
We will show how it is possible to reduce the strength of PBKDF2 by more than
50% against exhaustive key search attacks.

The improvements we present are not related to any specific implementation of
PBKDF2.

For the following improvements as an example we will use as Salt length the
one define in the LUKS format [7][8].



----[ Precomputing a message block

Deeply analyzing PBKDF2, HMAC, and their interaction, we noticed that it is
possible to halve the time one need to spend on deriving a key from a password,
by simply calculating once and only once the intermediate hash values of the
first message block for the two SHA1 calls in the first HMAC invocation.

These precomputed intermediate hash values will be the same for all the SHA-1
calls in all HMAC invocations of the derivation process.

The main problem is that PBKDF2 calls HMAC always with the same password P,
which pads this password always with the same two values to create a string
as long as the message block length that its underlying hash function works
with. This allows us to precompute the first message block of the two SHA1
calls, and then reuse them in all the subsequent HMAC invocations.

Since we can avoid to process one of the two message block in every SHA-1
call, this optimization reduce by 50% the operations involved in the
derivation process.

As previously said because this improvement exploits a weakness in the way
PBKDF2 uses HMAC, it can be reused with several hash functions such as SHA-2
family [9], MD5, and many others.

Note that only in the case of the most first HMAC call the advantage gained
applying this idea depends on the salt length (defined by the application).
In fact if the salt length exceeds the hash function message block length,
it will be unavoidable to compute more than two message blocks.
In all the other calls the SHA1 of the previous message block is used as salt,
and we will never have to compute more than two message blocks.
Hence this particular most first case will not affect the global gain.





----[ Zero based optimization

Given that every SHA1 call, except the first one in U1, are performed on a
string formatted according to the scheme in the figure below, we can observe
that every time there are 287 zeros in the last 512 bits of the string,
these 512 bits are the second message block processed in every SHA1 call.
There are more than 287 zeros to be precise, because the string end with the
binary representation of L, the length in bit of the message to be hashed,
that in our case, according to HMAC scheme [5], is always 672, this
value is represented on 64 bit, leaving room for more 22 consecutive zeros.

SHA1, in the main word expansion phase, expands its 512-bits message block
iterating t from 16 to 79 using the following function:

	W[t] = ROTL((W[t-3] xor W[t-8] xor W[t-14] xor W[t-16]), 1)

The result is that for a few steps, the word expansion function is working
with some W[t] filled with zeros. We can simply not execute a few XOR operations
because XORing a value with zero result in the same value. This optimize a
little the procedure as suggested by Atom [10].
In the standard word expansion are carried out 192 XOR, knowing this
optimization can be implemented an ad-hoc word expansion that use 165 XOR,
saving 27.
It should be remembered that this word expansion is executed in each SHA1 call,
there are two SHA1 call for each HMAC, and many HMAC call as the number of the
iteration chosen by the application.
This means that there will be probably many SHA1 calls. :)





----[ Same value based optimization

It is possible to avoid some other useless operations in the word expansion of
the first message block too. As we can see from the figure below, the first
message block of each SHA1 call has the same padding values in every invocations,
so, like in the previous zero-based optimization, we can avoid some useless
operations. We should know that XORing a value to itself result in zero.
Supposing that an attacker can organize its password list in same length classes,
for each class the SHA1 word expansion will work with the same number of W[t]'s
filled with the HMAC padding value, 0x36 or 0x5C, hence, depending on the
password length we can avoid more or less XORs, like in the previous 0 based.
The organization in same length classes is essential for the sake of write a
reasonable hypothetical ad-hoc word expansion, that, based on password length,
avoids a few XOR sum.

Note that if used in combination with our precomputed-message-block
optimization, the gain obtained due to this trick is far too low because we
process the first message block only once for each derivation, independently
from the iteration count.

In the standard word expansion are carried out 192 XOR, knowing this
optimization can be implemented an ad-hoc word expansion that use 162 XOR,
saving 30.





----[ Precomputing a word expansion

This is a little improvement strongly related to HMAC-SHA1.
In PBKDF2 the derived key is obtained extracting the first dkLen bytes from the
concatenation of the computed T[i]:

	DK = T1 || T2 || ... || Ti

where:	

	i = ceiling(dkLen/hLen)

Hence to derive a key we have first to compute a certain number of T[i] and then
extract the key from their concatenation. In the computation of each T[i],
more specifically in the first HMAC call of each T[i], are calculated two SHA1
hashes, as specified by the HMAC algorithm; deeply analyzing the first SHA1
invocation we noticed that, as highlighted in the following figure, the second
message block does not depend on the password at all.
Hence, given a salt, the word expansion of this message block produce the same
values independently from the password. This means that an attacker can avoid
to compute this word expansion in each try of its password list, reusing the
same precomputed values.






----[ 0x06 ] Conclusions

Implementation note in RFC2104 [5] describes our "Precomputing message block"
as a performance improvement for message authentication, but in PBKDF2
HMAC should help to slow down the derivation process instead.

We found that in 2015 this weakness affects two of the main crypto libraries: 
Libgcrypt v1.6.3 and ARM mbed TLS v1.3.11.

It's important that every crypto library implements this improvement, to 
better understand how this could affect real world applications we will post 
another interesting article about the LUKS format and Cryptsetup asap.


To be continued ...



----[ 0x07 ] References

[1]  Hugo Krawczyk, Cryptographic Extraction and Key Derivation: The HKDF Scheme 
[2]  Strengthening passwords, Abadi, Martin and Lomas, T.Mark A and Needham,
     Roger
[3]  RSA Laboratories, PKCS #5 V2.1: Password Based Cryptography Standard 
[4]  NIST, SP800-132: Recommendation for password-based key derivation 
[5]  RFC 2104: HMAC: Keyed-hashing for message authentication 
[6]  Password Hashing Competition 
[7]  C. Fruhwirth, New methods in Hard Disk Encryption 
[8]  C. Fruhwirth, LUKS On-Disk Format Specification Version 1.2.1
[9]  NIST, FIPS PUB 180-4: Secure Hash Standard
[10] Jens Steube, Optimizing computation of Hash-Algorithms as an attacker
[11] Bellare, New roofs for NMAC and HMAC: Security without Collision-Resistance
[12] The 14th International Conference on Cryptology and Network Security
[13] Cryptology and Network Security - 14th International Conference, CANS 2015,
     Marrakesh, Morocco, December 10-12, 2015, Proceedings
-----[ Exploiting some weaknesses of PBKDF2 ]-----