The first link is how the RC4 key is generated and what is stored in xls file. The second link is what contents to be encrypted.
After understanding the Excel password, I code the python for testing the password test_xls_pass.py. Here is the important part.
def gen_excel_real_key(pwd, salt): h0 = hashlib.md5(pwd).digest() h1 = hashlib.md5((h0[:5] + salt) * 16).digest() return h1[:5] def test_pass(pwd, salt, verifier, verifierHash): real_key = gen_excel_real_key(pwd, salt) key = hashlib.md5(real_key + '\x00\x00\x00\x00').digest() dec = rc4_crypt(key, verifier + verifierHash) if hashlib.md5(dec[:16]).digest() == dec[16:]: print "valid pass" else: print "invalid pass"
"salt", "verifier", and "verifierHash" can be extracted from FILEPASS record in Excel file. Can you see it? The "real_key" is only 5 bytes (40 bits). If you can find this key, no need to use password. The key space of real_key is 240. It is possible to do brute forcing. But is it easier than brute forcing password?
Compare it to alphanumeric password case insensitive. The key space of 8 characters is 368 = (32+4)8 = (25 + 4)8 > 240.
Another problem of brute forcing real_key, rc4 is slow compared to md5. I tried it with my simple C code. I get about 800,000 key/sec with 1 thread on Intel Core2 Q8300 2.5GHz. It takes about 16 days with 1 thread to try the whole key space. With GPUs, real_key is possible to be cracked in a few minutes.
What can we do when we get the real_key? There is the tool named guaexcel. The demo version allows you to use any real_key to decrypt any Excel file.
MS Word is the same as MS Excel. Just change the stream name from "Workbook" to "worddocument" stream. Then use tool named guaword to decrypt the Word file.
PS: If I have time, I will optimize the code and release it for free :). But do not expect it to be fast as commercial one.