[Dovecot] CRAM-MD5 Password Generation Algorithm
Hi,
I'm just in the middle of setting up dovecot to serve IMAPS -- Actually I've finished apart from one thing: CRAM-MD5 passwords.
I'm using SQL as a backend for the password storage, and I don't want to store the passwords in plaintext. I've also configured dovecot to be rather restrictive when it comes to authentication methods (only CRAM-MD5 is allowed).
To generate the passwords to go into the database I can use the dovecotpw utility, but I'm wanting to stick some sort of minimal admin interface on the server to be able to manage the users etc without having to use the CLI.
I've looked at the theoretical explanation of the hashing algorithm, and I've read through the source code that dovecotpw uses to generate the passwords with the intent of creating a higher level language library (Perl, Ruby, PHP ... whatever)) to generate passwords, but I don't seem to be able to replicate the functionality, and there don't seem to be any existing libraries that generate consistent results (that I've found).
I don't have that much experience with C, and so I'm sure that I must have misunderstood how dovecotpw does its stuff. Perhaps someone could explain how the algorithm works? Or point me in the right direction?
Thanks, Douglas Willcocks
- Douglas Willcocks dovecot@dovecot.org:
Hi,
I'm just in the middle of setting up dovecot to serve IMAPS -- Actually I've finished apart from one thing: CRAM-MD5 passwords.
CRAM-MD5 is a shared secret mechanism to prove authenticity without transmitting the password in plaintext. Both parties - server and client - proove that they share a secret.
CRAM-MD5 passwords are passwords saved in plaintext format on the client and (!) on the server. Here's why:
- The server sends a challenge (a random string)
- The client uses the challenge to encrpyt the user password and creates a encrypted string
- The client uses the username and the encrypted string, base64 encodes both to one string (response) and sends that to the server
- The server base64 decodes the response to get the username
- The server uses to username to lookup the corresponding password.
- The server uses the password to decrypt the encrypted client string.
- The server compares the decryption result with the challenge it sent. If they match, server and client share the same secret - the password.
As you can see, the password must be available in plaintext to decrypt the encrypted client string. Databases like /etc/shadow that store passwords encrypted cannot do this. All they can do is answer questions like this: "I do have password 'foo'. If you encrypt that, will it match the value you have stored as password in the database?" Shared secret mechanisms, such as CRAM-MD5, DIGEST-MD5 and NTLM, cannot do with that. They need the password string in unencrypted plaintext.
I'm using SQL as a backend for the password storage, and I don't want to store the passwords in plaintext. I've also configured dovecot to be rather restrictive when it comes to authentication methods (only CRAM-MD5 is allowed).
Then you have to store passwords in plaintext.
To generate the passwords to go into the database I can use the dovecotpw utility, but I'm wanting to stick some sort of minimal admin interface on the server to be able to manage the users etc without having to use the CLI.
Use pwgen.
I've looked at the theoretical explanation of the hashing algorithm, and I've read through the source code that dovecotpw uses to generate the passwords with the intent of creating a higher level language library (Perl, Ruby, PHP ... whatever)) to generate passwords, but I don't seem to be able to replicate the functionality, and there don't seem to be any existing libraries that generate consistent results (that I've found).
I don't have that much experience with C, and so I'm sure that I must have misunderstood how dovecotpw does its stuff. Perhaps someone could explain how the algorithm works? Or point me in the right direction?
HTH,
p@rick
-- state of mind Agentur für Kommunikation, Design und Softwareentwicklung
Patrick Koetter Tel: 089 45227227 Echinger Strasse 3 Fax: 089 45227226 85386 Eching Web: http://www.state-of-mind.de
Amtsgericht München Partnerschaftsregister PR 563
On Sat, 12 Apr 2008 07:52:01 +0200, Patrick Ben Koetter
<p@state-of-mind.de> wrote: > * Douglas Willcocks <dovecot@dovecot.org>: >> Hi, >> >> I'm just in the middle of setting up dovecot to serve IMAPS -- Actually >> I've finished apart from one thing: CRAM-MD5 passwords. > > CRAM-MD5 is a shared secret mechanism to prove authenticity without > transmitting the password in plaintext. Both parties - server and client - > proove that they share a secret. > > CRAM-MD5 passwords are passwords saved in plaintext format on the client > and > (!) on the server. Here's why: >
That's what I first thought.
- The server sends a challenge (a random string)
- The client uses the challenge to encrpyt the user password and creates a encrypted string
- The client uses the username and the encrypted string, base64 encodes both to one string (response) and sends that to the server
- The server base64 decodes the response to get the username
- The server uses to username to lookup the corresponding password.
- The server uses the password to decrypt the encrypted client string.
- The server compares the decryption result with the challenge it sent. If they match, server and client share the same secret - the password.
Thank you for your explanation, I understand how the CRAM-MD5 communication algorithm works, it's more the CRAM-MD5 scheme hashing that is giving me trouble. From experience I know that CRAM-MD5 needs plaintext passwords, but according to the dovecot wiki pages (which I've referenced below) you _can_ have them stored in some sort of hashed form, and I'm trying to understand both how to generate that hashed for and (out of curiosity) how dovecot can manage to use hashed passwords for CRAM-MD5.
As you can see, the password must be available in plaintext to decrypt the encrypted client string. Databases like /etc/shadow that store passwords encrypted cannot do this. All they can do is answer questions like this: "I do have password 'foo'. If you encrypt that, will it match the value you have stored as password in the database?" Shared secret mechanisms, such as CRAM-MD5, DIGEST-MD5 and NTLM, cannot do with that. They need the password string in unencrypted plaintext.
I'm using SQL as a backend for the password storage, and I don't want to store the passwords in plaintext. I've also configured dovecot to be rather restrictive when it comes to authentication methods (only CRAM-MD5 is allowed).
Then you have to store passwords in plaintext.
On the following page : http://wiki.dovecot.org/HowTo/CRAM-MD5
They use dovecotpw to generate a seemingly hashed version of the password and then store it in the password database
username:{CRAM-MD5}26b633ec8bf9dd526293c5897400bddeef9299fad
Perhaps this not an irreversible hash, but more something like (althought it's not) base64? The thing is, it _looks_ like a hash.
For example, using dovecotpw I can generate the hashed (??) version of 'password', which is
{CRAM-MD5}9186d855e11eba527a7a52ca82b313e180d62234f0acc9051b527243d41e2740
I can then place that in the database and the authentication is successful.
What's more, according to http://wiki.dovecot.org/Authentication/PasswordSchemes
You only need to store the password in plaintext if you are using _both_ CRAM-MD5 and DIGEST-MD5.
To quote:
"The problem with non-plaintext auth mechanisms is that the password must be stored either in plaintext, or using a mechanism-specific scheme that's incompatible with all other non-plaintext mechanisms. For example if you're going to use CRAM-MD5 authentication, the password needs to be stored in either PLAIN or CRAM-MD5 scheme. If you want to allow both CRAM-MD5 and DIGEST-MD5, the password must be stored in plaintext."
To generate the passwords to go into the database I can use the dovecotpw utility, but I'm wanting to stick some sort of minimal admin interface on the server to be able to manage the users etc without having to use the CLI.
Use pwgen.
The problem (as stated above) is not how to generate passwords, there are thousands of libraries that can do that relatively well.
I've looked at the theoretical explanation of the hashing algorithm, and I've read through the source code that dovecotpw uses to generate the passwords with the intent of creating a higher level language library (Perl, Ruby, PHP ... whatever)) to generate passwords, but I don't seem to be able to replicate the functionality, and there don't seem to be any existing libraries that generate consistent results (that I've found).
I don't have that much experience with C, and so I'm sure that I must have misunderstood how dovecotpw does its stuff. Perhaps someone could explain how the algorithm works? Or point me in the right direction?
HTH,
p@rick
-- state of mind Agentur für Kommunikation, Design und Softwareentwicklung
Patrick Koetter Tel: 089 45227227 Echinger Strasse 3 Fax: 089 45227226 85386 Eching Web: http://www.state-of-mind.de
Amtsgericht München Partnerschaftsregister PR 563
Thanks for your reply,
Douglas Willcocks
--
At 2:07 PM +0100 4/12/08, Douglas Willcocks imposed structure on a stream of electrons, yielding:
On Sat, 12 Apr 2008 07:52:01 +0200, Patrick Ben Koetter
<p@state-of-mind.de> wrote: [...] > CRAM-MD5 passwords are passwords saved in plaintext format on the client > and > (!) on the server. Here's why: >
That's what I first thought.
It's not technically a plaintext format or an obvious encoding, but if I'm reading the Dovecot code correctly, it is some form of the intermediate "contexts" derived from the password as part of the HMAC-MD5 algorithm.
It is not clear to me whether one could actually reverse that derivation. It seems to me that it should be simple, but no one seems to say that explicitly about the practice of storing the pre-calculated contexts rather than the actual password, so maybe I'm missing something. It may be that actual implementations always use the MD5 of the actual password as the key the context calculation, rather than using the password itself. That would make the contexts 16 bytes each plus padding, which would explain the Dovecot 'CRAM-MD5' storage format.
What IS clear is that with the HMAC-MD5 contexts one can authenticate as the user using CRAM-MD5, and that CRAM-MD5 requires the server to store either a recoverable plaintext password or the HMAC-MD5 contexts derived from it.
[...]
Perhaps this not an irreversible hash, but more something like (althought it's not) base64? The thing is, it _looks_ like a hash.
I'm pretty sure that it is a pair of 16-byte values derived from the password, represented in hexadecimal and concatenated.
For example, using dovecotpw I can generate the hashed (??) version of 'password', which is
{CRAM-MD5}9186d855e11eba527a7a52ca82b313e180d62234f0acc9051b527243d41e2740
I can then place that in the database and the authentication is successful.
What's more, according to http://wiki.dovecot.org/Authentication/PasswordSchemes
You only need to store the password in plaintext if you are using _both_ CRAM-MD5 and DIGEST-MD5.
To quote:
"The problem with non-plaintext auth mechanisms is that the password must be stored either in plaintext, or using a mechanism-specific scheme that's incompatible with all other non-plaintext mechanisms. For example if you're going to use CRAM-MD5 authentication, the password needs to be stored in either PLAIN or CRAM-MD5 scheme. If you want to allow both CRAM-MD5 and DIGEST-MD5, the password must be stored in plaintext."
It may be illuminating to look at http://tools.ietf.org/html/draft-ietf-sasl-crammd5-09 and pay particular attention to Section 5.
To generate the passwords to go into the database I can use the dovecotpw utility, but I'm wanting to stick some sort of minimal admin interface on the server to be able to manage the users etc without having to use the CLI.
Use pwgen.
The problem (as stated above) is not how to generate passwords, there are thousands of libraries that can do that relatively well.
What exactly is wrong with using dovecotpw?
Are you unaware of the existence of system() and backtick operators in your preferred languages, or of the ability to write a CGI in shell?
Most generic 'best practice' advice for writing web-based tools tries to discourage any mechanism that feeds commands to a shell, but it is important to understand why that is and when the generic advice should be set aside. No Bourne/POSIX or csh-based shell is particularly fast, and system() or `` calls launching a shell to run a command line from another language has a fork cost, so using either approach for performance-sensitive apps is a bad idea on that basis alone. In regards to security, there has been a long ugly history of sloppy coders being exposed by their failure to diligently validate user input before feeding it to a shell. That problem has been addressed to some degree by delusional misfeatures like PHP's 'safe' mode, which encourages the coder and the admin to collaborate in insecurity under the misimpression of being safe. If you are disciplined about how you use what the user tells you, there's nothing inherently unsafe in passing it through a shell or even writing a simple web app entirely in shell.
-- Bill Cole bill@scconsult.com
On Sat, 12 Apr 2008 14:23:58 -0400, Bill Cole dovecot-20061108@billmail.scconsult.com wrote:
At 2:07 PM +0100 4/12/08, Douglas Willcocks imposed structure on a stream of electrons, yielding:
On Sat, 12 Apr 2008 07:52:01 +0200, Patrick Ben Koetter
<p@state-of-mind.de> wrote: [...] > CRAM-MD5 passwords are passwords saved in plaintext format on the client > and > (!) on the server. Here's why: >
That's what I first thought.
It's not technically a plaintext format or an obvious encoding, but if I'm reading the Dovecot code correctly, it is some form of the intermediate "contexts" derived from the password as part of the HMAC-MD5 algorithm.
It is not clear to me whether one could actually reverse that derivation. It seems to me that it should be simple, but no one seems to say that explicitly about the practice of storing the pre-calculated contexts rather than the actual password, so maybe I'm missing something. It may be that actual implementations always use the MD5 of the actual password as the key the context calculation, rather than using the password itself. That would make the contexts 16 bytes each plus padding, which would explain the Dovecot 'CRAM-MD5' storage format.
What IS clear is that with the HMAC-MD5 contexts one can authenticate as the user using CRAM-MD5, and that CRAM-MD5 requires the server to store either a recoverable plaintext password or the HMAC-MD5 contexts derived from it.
[...]
I'm not convinced that the context is reversible.
In the source of version 1.0.13:
The function 'hmac_md5_init' is called (from password-scheme-cram-md5.c:13) to generate the scheme. Looking at the definition of 'hmac_md5_init' and the explanation of the algorithm given on http://www.cryptostuff.com/crypto/index.php?title=hmac, it's quite easy to see the resemblance (with the exception of the lack of 'message' in 'hmac_md5_init'.
That implies the (multiple) usage of the MD5 hashing algorithm on the password, making it more or less irreversible.
I may of course have misunderstood the sequence of functions that dovecotpw actually calls of course...
Perhaps this not an irreversible hash, but more something like (althought it's not) base64? The thing is, it _looks_ like a hash.
I'm pretty sure that it is a pair of 16-byte values derived from the password, represented in hexadecimal and concatenated.
That's the representation I'm trying to replicate.
For example, using dovecotpw I can generate the hashed (??) version of 'password', which is
{CRAM-MD5}9186d855e11eba527a7a52ca82b313e180d62234f0acc9051b527243d41e2740
I can then place that in the database and the authentication is successful.
What's more, according to http://wiki.dovecot.org/Authentication/PasswordSchemes
You only need to store the password in plaintext if you are using _both_ CRAM-MD5 and DIGEST-MD5.
To quote:
"The problem with non-plaintext auth mechanisms is that the password must be stored either in plaintext, or using a mechanism-specific scheme that's incompatible with all other non-plaintext mechanisms. For example if you're going to use CRAM-MD5 authentication, the password needs to be stored in either PLAIN or CRAM-MD5 scheme. If you want to allow both CRAM-MD5 and DIGEST-MD5, the password must be stored in plaintext."
It may be illuminating to look at http://tools.ietf.org/html/draft-ietf-sasl-crammd5-09 and pay particular attention to Section 5.
I've read through the document, and I now understand where exactly the precomputed context sits in the whole picture, but I'm still unsure how to reproduce it without dovecotpw.
To generate the passwords to go into the database I can use the dovecotpw utility, but I'm wanting to stick some sort of minimal admin interface on the server to be able to manage the users etc without having to use the CLI.
Use pwgen.
The problem (as stated above) is not how to generate passwords, there are thousands of libraries that can do that relatively well.
What exactly is wrong with using dovecotpw?
Are you unaware of the existence of system() and backtick operators in your preferred languages, or of the ability to write a CGI in shell?
I _am_ aware of such functionality in the various languages I previously mentioned, but I don't want to simply wrap the executable, I would like to reproduce the algorithm. I have nothing against using dovecotpw, but would I rather not depend on it for portability reasons.
I may want to run the library on various different machines, architectures or os' and I don't want to have to compile dovecot for each situation if I only need a small part of one of the encryption libraries.
Most generic 'best practice' advice for writing web-based tools tries to discourage any mechanism that feeds commands to a shell, but it is important to understand why that is and when the generic advice should be set aside. No Bourne/POSIX or csh-based shell is particularly fast, and system() or `` calls launching a shell to run a command line from another language has a fork cost, so using either approach for performance-sensitive apps is a bad idea on that basis alone. In regards to security, there has been a long ugly history of sloppy coders being exposed by their failure to diligently validate user input before feeding it to a shell. That problem has been addressed to some degree by delusional misfeatures like PHP's 'safe' mode, which encourages the coder and the admin to collaborate in insecurity under the misimpression of being safe. If you are disciplined about how you use what the user tells you, there's nothing inherently unsafe in passing it through a shell or even writing a simple web app entirely in shell.
I am well aware of the security implications of using the best practices for web-based system <-> Shell interaction, but that isn't the problem here. It's not a question of security, it's a question of portability.
-- Bill Cole bill@scconsult.com
--
Douglas Willcocks
At 9:32 PM +0100 4/12/08, Douglas Willcocks wrote:
On Sat, 12 Apr 2008 14:23:58 -0400, Bill Cole dovecot-20061108@billmail.scconsult.com wrote:
At 2:07 PM +0100 4/12/08, Douglas Willcocks imposed structure on a stream of electrons, yielding:
On Sat, 12 Apr 2008 07:52:01 +0200, Patrick Ben Koetter
<p@state-of-mind.de> wrote: [...] > CRAM-MD5 passwords are passwords saved in plaintext format on the client > and > (!) on the server. Here's why: >
That's what I first thought.
It's not technically a plaintext format or an obvious encoding, but if I'm reading the Dovecot code correctly, it is some form of the intermediate "contexts" derived from the password as part of the HMAC-MD5 algorithm.
It is not clear to me whether one could actually reverse that derivation. It seems to me that it should be simple, but no one seems to say that explicitly about the practice of storing the pre-calculated contexts rather than the actual password, so maybe I'm missing something. It may be that actual implementations always use the MD5 of the actual password as the key the context calculation, rather than using the password itself. That would make the contexts 16 bytes each plus padding, which would explain the Dovecot 'CRAM-MD5' storage format.
What IS clear is that with the HMAC-MD5 contexts one can authenticate as the user using CRAM-MD5, and that CRAM-MD5 requires the server to store either a recoverable plaintext password or the HMAC-MD5 contexts derived from it.
[...]
I'm not convinced that the context is reversible.
I'm now convinced that the contexts are NOT reversible programmatically to the key, because I now have a correct understanding of what they actually are...
In the source of version 1.0.13:
The function 'hmac_md5_init' is called (from password-scheme-cram-md5.c:13) to generate the scheme. Looking at the definition of 'hmac_md5_init' and the explanation of the algorithm given on http://www.cryptostuff.com/crypto/index.php?title=hmac, it's quite easy to see the resemblance (with the exception of the lack of 'message' in 'hmac_md5_init'.
That implies the (multiple) usage of the MD5 hashing algorithm on the password, making it more or less irreversible.
I may of course have misunderstood the sequence of functions that dovecotpw actually calls of course...
No, it looks like your greater motivation and focus led you to a more careful reading of the source and a clearer description of what the HMAC-MD5 contexts actually are than I managed. For some reason I was thinking that the contexts were what that page calls Ki and Ko, but rather are MD5(Ko) and MD5(Ki) and are inserted in place of the normal initialization vector when completing the HMAC calculation. This means that they cannot be reversed to the actual password, but it is probably not terribly daunting to crack out the password, given the pair of hashes with a known derivation, restrictions on 'legal' passwords, and the huge pile of work that has been done on MD5 cracking. The pair of contexts in principle provides twice as much information about the original password as would a single hash, so it should be easier to crack than a straight MD5 hash.
In any case, the contexts could be used by someone who does not have the password to do a CRAM-MD5 authentication, so the direct risk of storing them is not a great deal different from storing the plaintext password. The difference is that they only provide access to a service that uses CRAM-MD5.
Perhaps this not an irreversible hash, but more something like (althought it's not) base64? The thing is, it _looks_ like a hash.
I'm pretty sure that it is a pair of 16-byte values derived from the password, represented in hexadecimal and concatenated.
That's the representation I'm trying to replicate.
And indeed, when I look at the code and at what happens when I try to do the same thing in Perl with the Digest::MD5 module, I can't get the same output.
Puzzling. I'm probably missing something.
[...]
I've read through the document, and I now understand where exactly the precomputed context sits in the whole picture, but I'm still unsure how to reproduce it without dovecotpw.
Well, if you are building a tool to manage passwords for Dovecot, that doesn't seem like a horrendous dependency.
But it is extremely annoying that reproducing what dovecotpw is doing is not easier.
To generate the passwords to go into the database I can use the dovecotpw utility, but I'm wanting to stick some sort of minimal admin interface on the server to be able to manage the users etc without having to use the CLI.
Use pwgen.
The problem (as stated above) is not how to generate passwords, there are thousands of libraries that can do that relatively well.
What exactly is wrong with using dovecotpw?
Are you unaware of the existence of system() and backtick operators in your preferred languages, or of the ability to write a CGI in shell?
I _am_ aware of such functionality in the various languages I previously mentioned, but I don't want to simply wrap the executable, I would like to reproduce the algorithm. I have nothing against using dovecotpw, but would I rather not depend on it for portability reasons.
I may want to run the library on various different machines, architectures or os' and I don't want to have to compile dovecot for each situation if I only need a small part of one of the encryption libraries.
You may find what you need in Digest::Perl::MD5, which is an all-Perl MD5 implementation of MD5. It is rarely installed because Digest::MD5 (which uses a compiled backend) is significantly faster.
-- Bill Cole bill@scconsult.com
At 9:32 PM +0100 4/12/08, Douglas Willcocks imposed structure on a stream of electrons, yielding: [...]
I've read through the document, and I now understand where exactly the precomputed context sits in the whole picture, but I'm still unsure how to reproduce it without dovecotpw.
In looking a little closer at RFC1321 that defines MD5, I think I see the trick. The stored contexts are not full MD5's of Ki and Ko, they are just the 16 bytes of state after hashing those single key blocks. A full MD5 hash gets finalized by padding out the data with nulls and a 64-bit count of the hashed data. If you were to do a normal MD5 of Ki and Ko, the result would include the result of the finalization step.
You can see this in the hmac_md5_get_cram_context and hmac_md5_set_cram_context functions in the Dovecot code. The former extracts just the 4 32-bit state values, and the latter sets them and sets the bit count in the context structure to 512 (i.e. 64 bytes)
Maybe that provides a helpful clue...
-- Bill Cole bill@scconsult.com
On Sun, 13 Apr 2008 01:39:44 -0400, Bill Cole dovecot-20061108@billmail.scconsult.com wrote:
At 9:32 PM +0100 4/12/08, Douglas Willcocks imposed structure on a stream of electrons, yielding: [...]
I've read through the document, and I now understand where exactly the precomputed context sits in the whole picture, but I'm still unsure how to reproduce it without dovecotpw.
In looking a little closer at RFC1321 that defines MD5, I think I see the trick. The stored contexts are not full MD5's of Ki and Ko, they are just the 16 bytes of state after hashing those single key blocks. A full MD5 hash gets finalized by padding out the data with nulls and a 64-bit count of the hashed data. If you were to do a normal MD5 of Ki and Ko, the result would include the result of the finalization step.
You can see this in the hmac_md5_get_cram_context and hmac_md5_set_cram_context functions in the Dovecot code. The former extracts just the 4 32-bit state values, and the latter sets them and sets the bit count in the context structure to 512 (i.e. 64 bytes)
Maybe that provides a helpful clue...
Thank you very much for the time you spent looking into this.
There are still some wholes in the algorithm, but so far:
Do the first part of HMAC as normal up to XOR'ing with 0x5c, including strangely enough what looks like a full (i.e. including nulling and padding) md5 hash if the passphrase is longer than 64 (this step should be feasible to replicate)
Do the first step of md5 hashing (again, using the Perl library you suggested, it should be feasible to replicate the required functionality)
-- From now on, it gets a bit sketchy
Using the intermediate states of Ko and Ki, extract the various segments into a char[] so you end up with an array of 32 8bit segments. I'm unsure exactly what the bitwise AND operator is used for, I thought that a bitwise AND with 0xFF would leave everything the same?
Display a hex representation of the array contents.
I'll have a go later today and report back any findings. Of course, I could have completely misunderstood one of the steps, and that'll screw up the others, but still.
-- Bill Cole bill@scconsult.com
-- Douglas Willcocks
At 11:57 AM +0100 4/13/08, Douglas Willcocks imposed structure on a stream of electrons, yielding:
On Sun, 13 Apr 2008 01:39:44 -0400, Bill Cole dovecot-20061108@billmail.scconsult.com wrote:
At 9:32 PM +0100 4/12/08, Douglas Willcocks imposed structure on a stream of electrons, yielding: [...]
I've read through the document, and I now understand where exactly the precomputed context sits in the whole picture, but I'm still unsure how to reproduce it without dovecotpw.
In looking a little closer at RFC1321 that defines MD5, I think I see the trick. The stored contexts are not full MD5's of Ki and Ko, they are just the 16 bytes of state after hashing those single key blocks. A full MD5 hash gets finalized by padding out the data with nulls and a 64-bit count of the hashed data. If you were to do a normal MD5 of Ki and Ko, the result would include the result of the finalization step.
You can see this in the hmac_md5_get_cram_context and hmac_md5_set_cram_context functions in the Dovecot code. The former extracts just the 4 32-bit state values, and the latter sets them and sets the bit count in the context structure to 512 (i.e. 64 bytes)
Maybe that provides a helpful clue...
Thank you very much for the time you spent looking into this.
Some people spend their weekends doing crosswords or Sudoku...
There are still some wholes in the algorithm, but so far:
- Do the first part of HMAC as normal up to XOR'ing with 0x5c, including strangely enough what looks like a full (i.e. including nulling and padding) md5 hash if the passphrase is longer than 64 (this step should be feasible to replicate)
The HMAC specification is clear on that: if the shared secret (i.e. the password/passphrase) is longer than the hash function blocksize, it is hashed first and the hash output used as the key instead of the secret itself.
- Do the first step of md5 hashing (again, using the Perl library you suggested, it should be feasible to replicate the required functionality)
-- From now on, it gets a bit sketchy
- Using the intermediate states of Ko and Ki, extract the various segments into a char[] so you end up with an array of 32 8bit segments. I'm unsure exactly what the bitwise AND operator is used for, I thought that a bitwise AND with 0xFF would leave everything the same?
Are you looking at the RFC1321 reference code's Encode function? The point there is that the input is an array of 32-bit unsigned integers (UINT4), the output is an array of (8-bit) characters. Doing an AND with 0xFF (an 8-bit value) on bit-shifted UINT4's is a way of clearing everything but the 8 bits of interest and making the typecast formally correct. In a compilation and runtime environment that will let you play freely with pointer types, that function would be unneeded because you could just address the input as a char * instead of a UINT * and be done with it, but the formal approach Rivest uses assures that the bits are cleanly copied from an array of 32-bit members to an array of 8-bit members even on systems where a pointer cast would not give correct results.
It is also important to keep the distinction between the HMAC and MD5 layers clear. HMAC can be used with any blockwise hash function, and the internal details of the hash function are not important to the HMAC layer.
I'd break it up this way, with a warning that I'm hand-waving the mathematical innards of MD5:
If the password is longer than 64 bytes, run a full normal MD5 on it and use the 16-byte result as your shared secret. Otherwise, the password IS the shared secret. In practice, it would be a good idea to simply restrict passwords in size and content because authentication clients (i.e. MUA's) are going to have issues with reproducing passwords that have non-ASCII, whitespace, and control characters.
Generate the 2 MD5 "keys" from the shared secret. In Perl this might look like this:
$Ki = "$secret" ^ (chr(0x36) x 64); $Ko = "$secret" ^ (chr(0x5c) x 64);
Keep in mind that the keys are special to HMAC, but as far as MD5 is concerned they are just the first 64-byte/512-bit blocks fed into the hash.
Initialize a fresh MD5 hash structure (if you are using a full MD5 implementation ) or just a 'state' array containing the 4 32-bit values of the standard MD5 initialization vector (if you are doing your own implemetation or have direct access to a MD5Transform equivalent.)
Add Ki to the hash. In reference code terms, this would be a MD5Update call with Ki as the input. Note that most of the logic in the reference code MD5Update is about supporting the addition of data to the hash in chunks that don't fill 64-byte transformation blocks perfectly. If you are doing your own implementation just for this application, MD5Update is logically unnecessary because you can just pass your MD5Transform-equivalent a state array with the standard MD5 initialization vector and Ki (which is a 64-character array)
The state array now holds the 'inner' context.
Repeat 3-4 with another fresh hash structure or state array and Ko.
The state array now holds the 'outer' context,
- Display a hex representation of the array contents.
Rendering those numbers as a hex string in the right order may not be obvious...
I'll have a go later today and report back any findings. Of course, I could have completely misunderstood one of the steps, and that'll screw up the others, but still.
If OO Perl is your thing, I think that Digest::Perl::MD5 has everything you need except a public interface to the state array. You can still get to it however, and to the subroutine that does the hex string rendering...
In fact, look here:
http://www.scconsult.com/bill/crampass.pl
It does have a dependency issue, in that it requires Digest::Perl::MD5.
-- Bill Cole bill@scconsult.com
On Sun, 13 Apr 2008 16:12:41 -0400, Bill Cole dovecot-20061108@billmail.scconsult.com wrote:
At 11:57 AM +0100 4/13/08, Douglas Willcocks imposed structure on a stream of electrons, yielding:
On Sun, 13 Apr 2008 01:39:44 -0400, Bill Cole dovecot-20061108@billmail.scconsult.com wrote:
At 9:32 PM +0100 4/12/08, Douglas Willcocks imposed structure on a stream of electrons, yielding: [...]
[...]
Thank you very much for the time you spent looking into this.
Some people spend their weekends doing crosswords or Sudoku...
They should get out more.
No. Wait...
There are still some wholes in the algorithm, but so far:
- Do the first part of HMAC as normal up to XOR'ing with 0x5c, including strangely enough what looks like a full (i.e. including nulling and padding) md5 hash if the passphrase is longer than 64 (this step should be feasible to replicate)
The HMAC specification is clear on that: if the shared secret (i.e. the password/passphrase) is longer than the hash function blocksize, it is hashed first and the hash output used as the key instead of the secret itself.
- Do the first step of md5 hashing (again, using the Perl library you suggested, it should be feasible to replicate the required functionality)
-- From now on, it gets a bit sketchy
- Using the intermediate states of Ko and Ki, extract the various segments into a char[] so you end up with an array of 32 8bit segments. I'm unsure exactly what the bitwise AND operator is used for, I thought that a bitwise AND with 0xFF would leave everything the same?
Are you looking at the RFC1321 reference code's Encode function? The point there is that the input is an array of 32-bit unsigned integers (UINT4), the output is an array of (8-bit) characters. Doing an AND with 0xFF (an 8-bit value) on bit-shifted UINT4's is a way of clearing everything but the 8 bits of interest and making the typecast formally correct. In a compilation and runtime environment that will let you play freely with pointer types, that function would be unneeded because you could just address the input as a char * instead of a UINT * and be done with it, but the formal approach Rivest uses assures that the bits are cleanly copied from an array of 32-bit members to an array of 8-bit members even on systems where a pointer cast would not give correct results.
Ok, I see why that would be necessary.
It is also important to keep the distinction between the HMAC and MD5 layers clear. HMAC can be used with any blockwise hash function, and the internal details of the hash function are not important to the HMAC layer.
I'd break it up this way, with a warning that I'm hand-waving the mathematical innards of MD5:
If the password is longer than 64 bytes, run a full normal MD5 on it and use the 16-byte result as your shared secret. Otherwise, the password IS the shared secret. In practice, it would be a good idea to simply restrict passwords in size and content because authentication clients (i.e. MUA's) are going to have issues with reproducing passwords that have non-ASCII, whitespace, and control characters.
Generate the 2 MD5 "keys" from the shared secret. In Perl this might look like this:
$Ki = "$secret" ^ (chr(0x36) x 64); $Ko = "$secret" ^ (chr(0x5c) x 64);
Keep in mind that the keys are special to HMAC, but as far as MD5 is concerned they are just the first 64-byte/512-bit blocks fed into the hash.
Initialize a fresh MD5 hash structure (if you are using a full MD5 implementation ) or just a 'state' array containing the 4 32-bit values of the standard MD5 initialization vector (if you are doing your own implemetation or have direct access to a MD5Transform equivalent.)
Add Ki to the hash. In reference code terms, this would be a MD5Update call with Ki as the input. Note that most of the logic in the reference code MD5Update is about supporting the addition of data to the hash in chunks that don't fill 64-byte transformation blocks perfectly. If you are doing your own implementation just for this application, MD5Update is logically unnecessary because you can just pass your MD5Transform-equivalent a state array with the standard MD5 initialization vector and Ki (which is a 64-character array)
The state array now holds the 'inner' context.
Repeat 3-4 with another fresh hash structure or state array and Ko.
The state array now holds the 'outer' context,
Thank you for your explanation. Using the above, the script you wrote and the Digest::Perl::MD5 source, I am now in the middle of writing a small Ruby class of my own to implement this functionality.
I'm pretty sure I understand what is happening throughout the process.
- Display a hex representation of the array contents.
Rendering those numbers as a hex string in the right order may not be obvious...
I'll have a go later today and report back any findings. Of course, I could have completely misunderstood one of the steps, and that'll screw up the others, but still.
If OO Perl is your thing, I think that Digest::Perl::MD5 has everything you need except a public interface to the state array. You can still get to it however, and to the subroutine that does the hex string rendering...
In fact, look here:
http://www.scconsult.com/bill/crampass.pl
It does have a dependency issue, in that it requires Digest::Perl::MD5.
Thank you very much for that. It's extremely useful. What's more, I would say that most of the satisfaction comes not from actually having a script that does it, but actually having a better understanding of the algorithm in general.
-- Bill Cole bill@scconsult.com
-- Douglas Willcocks
On 4/13/08, Bill Cole dovecot-20061108@billmail.scconsult.com wrote:
In fact, look here:
http://www.scconsult.com/bill/crampass.pl
It does have a dependency issue, in that it requires Digest::Perl::MD5.
There's a little mistake in this script. Line 54 should read:
if (length $secret > 64) {
You can easily check this by supplying a 65-char password and comparing the output with dovecotpw.
Chris
At 10:57 AM +0200 4/17/08, Chris Laif wrote:
On 4/13/08, Bill Cole dovecot-20061108@billmail.scconsult.com wrote:
In fact, look here:
http://www.scconsult.com/bill/crampass.pl
It does have a dependency issue, in that it requires Digest::Perl::MD5.
There's a little mistake in this script. Line 54 should read:
if (length $secret > 64) {
Fixed.
You can easily check this by supplying a 65-char password and comparing the output with dovecotpw.
Not really any need to test an obvious typo. Thanks.
-- Bill Cole bill@scconsult.com
participants (4)
-
Bill Cole
-
Chris Laif
-
Douglas Willcocks
-
Patrick Ben Koetter