2010 GSoC RFI SANDBOX PROJECT DESCRIPTION

SUGGESTED SKILLS:

	  - knowledge of PHP and/or Perl
	    you need to know what code you'll be tracing and understanding
	  - Python or some similar wrapper code
	    you'll want to wrap this in another language so you don't
	    risk infection
	  - Linux or UN*X experience
	  - SQL interaction and programming
	  - an understanding of what remote file include (RFI) scripts
  	    are and what's possible with them

i didn't give specifications, thinking that someone may come up with a 
better approach than mine. i welcome any improvements people may have.

the idea is based very much on my experiences writing a sandbox before,
namely that debug output can be distilled to produce a useful report.
previously i had written a window EXE sandbox using the following 
approach based on WINE and its verbose debug output:

 	- python wrapper script is called as root
 	- python calls fork(), the child process changes its UID/GID
 	  to a user named "sandbox" with no rights to the rest of the
 	  system. the parent remains running as root.
 	- the child makes a new, pristine copy of the WINE ~/.wine
 	  directory tree (which is the WINE filesystem)
 	- the child launches the target EXE with WINE and CLI options to
 	  call the debugging output (which logs API calls with the
 	  arguments given and also stores the return value).
 	- the parent process (from fork()) sleeps for the duration of the
 	  run
 	- when the timer goes off and the parent wakes up, it kills the
 	  child and anything owned by the sandbox user.
 	- the distiller then runs over the debug output to take the debug
 	  trace and make it into the intersting bits, like what
 	  URLs were passed to InternetOpenURL(), registry keys were
 	  written, etc. it also looks for new files, changed files,
 	  and processes the WINE "filesystem" (just a directory tree)
 	  for viruses and new files
 	- the distillation is presented in a fashion similar to the
 	  norman sandbox outupt, e.g.:

 		Downloads file http://www.foo.com/file.exe
 		Runs C:\Windows\virus.exe

i had this sort of thing in mind for the PHP sandbox, using deep trace 
output to discover API calls and arguments, distill that, make some sense 
of it, and log it for future analysis.

Windows Detours is a common way to do much the same with binary executables.
have a look at how Windows Detours works:

 	http://research.microsoft.com/en-us/projects/detours/

same basic premise, and a key foundation of how executable sandboxes work: 
log everything, distill it, make some sense of it.

exactly how one does this in a PHP sandbox is open to interpretation, but 
i had in mind something like "funcall", a PHP extension, for logging API 
calls and their arguments. i now have some basic PoC code written using
PHP's funcall exension that works like this (with a PHP pBot example):

$ php sandbox.php pbot.php 
call:fsockopen(irc, 6667, , , 30)
ret:fsockopen(irc, 6667, , , 30)= Resource id #5
call:fwrite(Resource id #5, PASS fx)
ret:fwrite(Resource id #5, PASS fx)= 9
call:fwrite(Resource id #5, USER whxinxxo 127.0.0.1 localhost :whxinxxo)
ret:fwrite(Resource id #5, USER whxinxxo 127.0.0.1 localhost :whxinxxo)= 45
call:fwrite(Resource id #5, NICK [C]fvox61182976)
ret:fwrite(Resource id #5, NICK [C]fvox61182976)= 22
call:fgets(Resource id #5, 512)
ret:fgets(Resource id #5, 512)= :irc.xxx.xxx NOTICE Auth :*** Looking up your hostname...


the code just enumerates what functions to wrap and logs their calls and
return values. the above SHOULD be distilled into something like this:

	Connects to: Server irc, TCP port 6667
	IRC connection:
	  Server password: fx
	  USER: whxinxxo 127.0.0.1 localhost :whxinxxo
	  NICK: [C]fvox61182976

basically enough to feed a botnet tracking tool or help someone understand
what an RFI script will do to their system. i generates the sandbox wrapper 
code using a simple python script that produces the sandbox.php wrapper
code, since most of it is very repetative. it then calls "include" on the
named PHP script as an argument and voila. 

obviously it needs some cleanup but the idea seems to have legs:

	- string args should be quoted 
	- empty vals should be clear as NULL 
	- stuff filled out by the web server (e.g. REQUEST) need to be
	  mocked up

and then it needs a distiller to produce reports, which is its own set
of challenges. first, characterizing network traffic into app protos
can be a bit tough, but it's been done before. we should characterize
and present in a useful fashion distillations for HTTP, SMTP, FTP, and
IRC. these are the most common app protos we see in the wild with RFI
scripts. 

secondly the tool should protect the host system, so think about something
like chroot() or a very unprivileged user. chroot() is safest but has some
management headaches.

thirdly don't take the long path here trying to anticipate all sorts of 
challenges that don't yet exist. also, don't write new PHP plugins unless
you can identify that they must be, reuse existing code when possible.

XML output should be pretty easy to generate and store, and should be
comparable to anubis, wepawet, etc in terms of data captured, presented,
and stored.

some pointers:

  tools and libraries to use
	http://code.google.com/p/funcall
	http://perldoc.perl.org/perlsub.html#Overriding-Built-in-Functions

  example reports to mimic
	http://wepawet.cs.ucsb.edu/samples.php
	http://anubis.iseclab.org/?action=sample_reports

i hope this helps explain what i am thinking about with this project.