2010 GSoC RFI SANDBOX PROJECT DESCRIPTION SUGGESTED SKILLS: - knowledge of PHP and/or Perl you need to know what code you'll be tracing and understanding - Python or some similar wrapper code you'll want to wrap this in another language so you don't risk infection - Linux or UN*X experience - SQL interaction and programming - an understanding of what remote file include (RFI) scripts are and what's possible with them i didn't give specifications, thinking that someone may come up with a better approach than mine. i welcome any improvements people may have. the idea is based very much on my experiences writing a sandbox before, namely that debug output can be distilled to produce a useful report. previously i had written a window EXE sandbox using the following approach based on WINE and its verbose debug output: - python wrapper script is called as root - python calls fork(), the child process changes its UID/GID to a user named "sandbox" with no rights to the rest of the system. the parent remains running as root. - the child makes a new, pristine copy of the WINE ~/.wine directory tree (which is the WINE filesystem) - the child launches the target EXE with WINE and CLI options to call the debugging output (which logs API calls with the arguments given and also stores the return value). - the parent process (from fork()) sleeps for the duration of the run - when the timer goes off and the parent wakes up, it kills the child and anything owned by the sandbox user. - the distiller then runs over the debug output to take the debug trace and make it into the intersting bits, like what URLs were passed to InternetOpenURL(), registry keys were written, etc. it also looks for new files, changed files, and processes the WINE "filesystem" (just a directory tree) for viruses and new files - the distillation is presented in a fashion similar to the norman sandbox outupt, e.g.: Downloads file http://www.foo.com/file.exe Runs C:\Windows\virus.exe i had this sort of thing in mind for the PHP sandbox, using deep trace output to discover API calls and arguments, distill that, make some sense of it, and log it for future analysis. Windows Detours is a common way to do much the same with binary executables. have a look at how Windows Detours works: http://research.microsoft.com/en-us/projects/detours/ same basic premise, and a key foundation of how executable sandboxes work: log everything, distill it, make some sense of it. exactly how one does this in a PHP sandbox is open to interpretation, but i had in mind something like "funcall", a PHP extension, for logging API calls and their arguments. i now have some basic PoC code written using PHP's funcall exension that works like this (with a PHP pBot example): $ php sandbox.php pbot.php call:fsockopen(irc, 6667, , , 30) ret:fsockopen(irc, 6667, , , 30)= Resource id #5 call:fwrite(Resource id #5, PASS fx) ret:fwrite(Resource id #5, PASS fx)= 9 call:fwrite(Resource id #5, USER whxinxxo 127.0.0.1 localhost :whxinxxo) ret:fwrite(Resource id #5, USER whxinxxo 127.0.0.1 localhost :whxinxxo)= 45 call:fwrite(Resource id #5, NICK [C]fvox61182976) ret:fwrite(Resource id #5, NICK [C]fvox61182976)= 22 call:fgets(Resource id #5, 512) ret:fgets(Resource id #5, 512)= :irc.xxx.xxx NOTICE Auth :*** Looking up your hostname... the code just enumerates what functions to wrap and logs their calls and return values. the above SHOULD be distilled into something like this: Connects to: Server irc, TCP port 6667 IRC connection: Server password: fx USER: whxinxxo 127.0.0.1 localhost :whxinxxo NICK: [C]fvox61182976 basically enough to feed a botnet tracking tool or help someone understand what an RFI script will do to their system. i generates the sandbox wrapper code using a simple python script that produces the sandbox.php wrapper code, since most of it is very repetative. it then calls "include" on the named PHP script as an argument and voila. obviously it needs some cleanup but the idea seems to have legs: - string args should be quoted - empty vals should be clear as NULL - stuff filled out by the web server (e.g. REQUEST) need to be mocked up and then it needs a distiller to produce reports, which is its own set of challenges. first, characterizing network traffic into app protos can be a bit tough, but it's been done before. we should characterize and present in a useful fashion distillations for HTTP, SMTP, FTP, and IRC. these are the most common app protos we see in the wild with RFI scripts. secondly the tool should protect the host system, so think about something like chroot() or a very unprivileged user. chroot() is safest but has some management headaches. thirdly don't take the long path here trying to anticipate all sorts of challenges that don't yet exist. also, don't write new PHP plugins unless you can identify that they must be, reuse existing code when possible. XML output should be pretty easy to generate and store, and should be comparable to anubis, wepawet, etc in terms of data captured, presented, and stored. some pointers: tools and libraries to use http://code.google.com/p/funcall http://perldoc.perl.org/perlsub.html#Overriding-Built-in-Functions example reports to mimic http://wepawet.cs.ucsb.edu/samples.php http://anubis.iseclab.org/?action=sample_reports i hope this helps explain what i am thinking about with this project.