Thursday, December 23, 2010

Excel RC4 Encryption Algorithm

I played a wargame. There is a protected xls file. I could not find a free tool to break it. When I tried to use their trial tools, there is a instant recovery feature. I wonder how they do it. I decided to read the encryption algorithm. I knew the default Office 2003 encryption algorithm is RC4. After some searching, I found the Microsoft Document
- http://msdn.microsoft.com/en-us/library/dd907466%28v=office.12%29.aspx
- http://msdn.microsoft.com/en-us/library/dd905723%28v=office.12%29.aspx

The first link is how the RC4 key is generated and what is stored in xls file. The second link is what contents to be encrypted.

After understanding the Excel password, I code the python for testing the password test_xls_pass.py. Here is the important part.
def gen_excel_real_key(pwd, salt):
    h0 = hashlib.md5(pwd).digest()
    h1 = hashlib.md5((h0[:5] + salt) * 16).digest()
    return h1[:5]
   
def test_pass(pwd, salt, verifier, verifierHash):
    real_key = gen_excel_real_key(pwd, salt)
    key = hashlib.md5(real_key + '\x00\x00\x00\x00').digest()
    dec = rc4_crypt(key, verifier + verifierHash)
    if hashlib.md5(dec[:16]).digest() == dec[16:]:
        print "valid pass"
    else:
        print "invalid pass"

"salt", "verifier", and "verifierHash" can be extracted from FILEPASS record in Excel file. Can you see it? The "real_key" is only 5 bytes (40 bits). If you can find this key, no need to use password. The key space of real_key is 240. It is possible to do brute forcing. But is it easier than brute forcing password?

Compare it to alphanumeric password case insensitive. The key space of 8 characters is 368 = (32+4)8 = (25 + 4)8 > 240.

Another problem of brute forcing real_key, rc4 is slow compared to md5. I tried it with my simple C code. I get about 800,000 key/sec with 1 thread on Intel Core2 Q8300 2.5GHz. It takes about 16 days with 1 thread to try the whole key space. With GPUs, real_key is possible to be cracked in a few minutes.

What can we do when we get the real_key? There is the tool named guaexcel. The demo version allows you to use any real_key to decrypt any Excel file.

MS Word is the same as MS Excel. Just change the stream name from "Workbook" to "worddocument" stream. Then use tool named guaword to decrypt the Word file.

PS: If I have time, I will optimize the code and release it for free :). But do not expect it to be fast as commercial one.

19 comments:

  1. Great article and work. Have you ever gotten around to a complete program to open/decrypt protected Excel 2003 (RC4 40bit) files?

    ReplyDelete
  2. No complete program. I just wrote a small program to test cracking speed.

    ReplyDelete
  3. Thanks for writing back. After compiling/installing your script and running it, I'm prompted not only for the name of the excel file, but also a password. Perhaps I'm using it incorrectly, is there a way to use your script to actually crack the .xls file? Thanks again.

    ReplyDelete
  4. My script is just testing the password. I wrote it just for demonstrate the algorithm. For cracking, you should write it in C. It is much more faster.

    ReplyDelete
  5. You rock! Thank you for this information. I've written a brute-forcer in C based on your code.

    I'm just using MD5 from OpenSSL so with work I should be able to replace OpenSSL with packed SSE-based MD5 and make it 4x faster.

    ReplyDelete
  6. Nice post! One of the trial recovery tools let's you see the key as it is sent back to the client. I tried to use this key with the guaexcel known_key as you mentioned but it only works for files with a fixed pwd in demo. How do you decrypt a file once you have the key?

    ReplyDelete
  7. hi, im Apuromafo of Crackslatinos, i was cheked other app from rixler (DEMO),and from the program of that, in multi cracked was converted in full(unique in this class) , and this compaƱy have a hash and big proof as you say, can send the value example , work good, i was tested now in a friend app with aperture and was working..if you need check if can crack

    example: rixler.com&protocol_version=1.2&program_name=MDPC&program_version=3.0.0.4&command=1&target=2&decryption_mode=1&reg_code=123456&engine_data=a2c036063187cc84b1cb215eb1354a1d3ced16f564fc10a93290dadd46092aac14141414148fc7b3e0d6a7e7ee6ffdaeb120bc8cf87b897d78515559575959464948414459454d41484d4957415546494a414d4a49554c4658494b4959573a8487798a78837382898173758088517a444778444a4b774c774548457945763a&check_sum=At

    and response of server:
    01E7FF40 00C03C58 ASCII "result=2&key=4d444a4a787a4a447549"
    that hash decode the xls, and re assign a new pass and demo only show some limit, cracked is in that pass.. and you remove with that and done


    *tested in 20 xls in open..and work, can be decripted and recovered without the real pass..

    ReplyDelete
  8. What's that key returned by rixler.com? It's not 40-bit.

    ReplyDelete
  9. So far so good, but which is 10 byte key in this:
    01E7FF40 00C03C58 ASCII "result=2&key=4d444a4a787a4a447549"

    ReplyDelete
    Replies
    1. Every two digits of the key represent one digit of the real key.

      Delete
    2. have you made pregress on that syntax? how do 2 digits represent one?

      Delete
    3. 44-4d => 0-9, 75-7a => A-F

      Delete
  10. I've written a C version at my blog at http://gavinsmith87.blogspot.co.uk/2012/06/microsoft-excel-2003-encryption-scanner.html.

    ReplyDelete
  11. >> MS Word is the same as MS Excel. Just change the stream name from "Workbook" to "worddocument" stream.

    I did this but I get "Cannot find RC4 pass info". Can you double-check whether this works? Thanks!

    ReplyDelete
  12. Anonymous - If you ever see this, I've been researching the MS Word format and this would not appear to be correct. If the Word document is encrypted the necessary fields will appear in the "1Table" stream (or maybe the "0Table" stream). I intend to support automatically extracting this information in my key-finding program.

    ReplyDelete
  13. Hi Gavin. Code works and I successfully decrypted an excel file. Missing password for more than 6 years and was able to retrieve the key in 6 days. Awesome tool!

    ReplyDelete
  14. hi gavin I gor this error "The type or namespace name 'ManagedRC4' could not be found"

    can u help me for this

    ReplyDelete
  15. "AnonymousJuly 22, 2013 at 8:28 AM

    Hi Gavin. Code works and I successfully decrypted an excel file. Missing password for more than 6 years and was able to retrieve the key in 6 days. Awesome tool!
    "

    hi can u please provide me your full working code becoz its not working on my side I also want to recover password of my file.

    ReplyDelete
  16. Amazing post and work performed.There is the tool named guaexcel. The demo version allows you to use any real key to decrypt any Excel File .MS Word is the same as MS Excel. Just change the stream name from "Workbook" to "worddocument" stream. Then use tool named guaword to decrypt the Word file.

    ReplyDelete