bugspot in python



i saw this discussion on hackernews, which links to a google blog post about how they try and predict bug hotspots based on past activity. pretty simple approach, not all that ineffective, either.

someone on HN had a working ruby tool for git in a few hours, but those are tools i don't use (git or ruby). so i wrote a version for subversion in python, which is below. my output is based on theirs since it made a lot of sense. one thing i think i may do is add support for arbitrary strings for the match conditions (e.g. whatever terms you team uses) and also a max "age" or timeline to start at.

#!/usr/bin/env python

""" bugspot-svn.py copyright (c) 2011 jose nazario , all rights reserved license: 2 clause BSD """

# see # http://google-engtools.blogspot.com/2011/12/bug-prediction-at-google.html

def svnlog_parser(input): assert type(input) is file data = [] while True: line = input.readline() if line.startswith('-'*30): if data: yield ''.join(data) data = [] if line == '': raise StopIteration else: data.append(line)

if __name__ == '__main__': import math import os import re import sys import time

try: print 'Scanning %s' % sys.argv[1] except IndexError: print >> sys.stderr, 'Usage: %s /path/to/repo' % sys.argv[0] sys.exit(1) s = svnlog_parser(os.popen('cd %s && svn log -v' % sys.argv[1])) message_matchers = [ re.compile(x, re.I) for x in ('fixes', 'fixed', 'closes', 'bug\w?#\d+', ) ]

hotspots = {} messages = [] times = [] for m in s: paths = [] lines = m.split('\n') i = 0 for line in lines: if line.startswith('-'*20): # seperator i += 1 continue if i == 1: # revision | who | timestamp | N lines i += 1 timestamp = ' '.join(line.split(' | ')[2].split()[:2]) timestamp = int(time.strftime('%s', time.strptime(timestamp, '%Y-%m-%d %H:%M:%S'))) times.append(timestamp) continue if line == 'Changed paths:': # blah i += 1 continue try: # actual files changed if line[3] in ('D', 'M', 'A'): i += 1 paths.append(line.split(' ', 1)[1]) continue except IndexError: pass # and everything else is the changelog msg = ' '.join(lines[i:]) for matcher in message_matchers: if matcher.findall(msg): messages.append(msg) for path in paths: path = path.strip() l = hotspots.get(path, []) l.append(timestamp) hotspots[path] = l break start = min(times) end = max(times)

def score(ts): s = 0 for t in ts: t = (float(t)-start)/(end-start) s += 1/(1+(math.e**(-12*t+12))) return s

hotspots = [ (score(y),x) for x,y in hotspots.iteritems() ] hotspots.sort() hotspots.reverse() hotspots = [ (y,x) for x,y in hotspots ] hotspots = filter(lambda x: x[1] > 0.001, hotspots) print 'Found %d bugfix commits, with %d hotspots' % (len(messages), len(hotspots)) print print 'Fixes:' for msg in messages: print ' - %s' % msg print print 'Hotspots:' for path, n in hotspots: print ' %.3f - %s' % (n, path)
output on the phoneyc trunk look like this:
Scanning /Users/jose/code/phoneyc/trunk
Found 6 bugfix commits, with 2 hotspots

Fixes: - fix quoting issues - fix arg length - [phoneyc] support for RTSP MPEG4 SP Control ActiveX Control "MP4Prefix" Property Buffer Overflow vuln module, exploit demo - [phoneyc] found an exploit for QvodCtrl at SecFocus, add. fix: - add CLSID for QvodCtrl - look for URL and url - XXX case independent handling of methods etc? - proper length check - object instantiation can be done with name, not just id - [phoneyc] - handle the redirect stuff as an href - fix up URLs that lack a needed trailing '/' - import order fixup fix sgmllib exception namespace

Hotspots: 0.463 - /phoneyc/trunk/honeyclient.py 0.099 - /phoneyc/trunk/modules/jscript/NCTAudioFile2.js


i'll be testing it on larger codebases soon, i developed it against the svn repo of phoneyc.

if you use it, please let me know how you find it. i'll happily accept patches, too.

     [link]     Saturday, Dec 17, 2011 @ 08:42am

      |


passphrase generator



inspired by xkcd 936 on password generators, i wrote a simple passphrase generator in python. you can see it below.
#!/usr/bin/env python

import random import sys

try: passlen = int(sys.argv[1]) except IndexError: passlen = 4 # default to 4

r = random.Random() words = [ x.strip() for x in open('/usr/share/dict/words', 'r').readlines() ] for x in xrange(passlen): print r.choice(words), print


usage is simple: just specify how long you want the passphrase to be. if you want klingon, spanish, etc you can alter the list of words.

some examples:
Dix:bin jose$ ./passphrase 
Euplocomi Protomycetales hypodermal reissuer
Dix:bin jose$ ./passphrase 
hidromancy precompounding reascertainment gnomical
Dix:bin jose$ ./passphrase 6
orthographize sicknessproof velvetbreast holocephalian hypochlorhydric harshly


hope this is useful to someone.

     [link]     Saturday, Aug 27, 2011 @ 09:01am

      |


The App Store as a Security Model



Like it or not, the "app store" model is a likely future scenario of computing which really improves security by reducing the attackable space on a user's computer. When paired with cloud computing there's a retro-future aspect to this. We're all headed right back to mainframes and dumb terminals, but this time with shiner devices. The app store model here refers not just to Apple's "App Store" but anyone with a marketplace for installable applications that they gate.

This model has hit it big again, lately, with the iPad and various complaints that Apple's way too restrictive in their relationships with developers. There's a bit of truth to peoples' claustrophobia around this along with the unfairness complaints about Apple's recent behavior. There is another angle to this, and that's reducing the attack surface.

When you think about what a lot of people need to do, it's play some media, handle email, chat and facebook, and surf the web. It's not to do things like manage antivirus or a personal firewall, worry about the latest scareware, etc. The bad guys on the Internet thrive in this complex, confusing, arbitrary purpose computing environment. While you may need to create the next Google, lots of people don't, they just need to work, to communicate, and to enjoy.

The app store model reduces the attack surface are in a few ways. First, there is a strict way to get code to run on the system, including signing and gating through the actual application store that does the download and installation. Secondly, apps are presumably vetted (although, I doubt, for security weaknesses or malicious options), meaning rogue apps could be blocked from entry. Third, telcos and OS folks can always revoke an app's runnable status.

This ignores, as noted, any failure to screen for hidden rogue actions in an app, security flaws (accidental or intentional), or even bribery to get an app into the app store.

Couple this to the cloud, where you have something similar: fewer options, determined by someone else, offered to you as a choice. The cloud is where resources can be allocated to protect your documents. With this mix you now have a model that feels increasingly boxed in for some folks, but is really quite liberating.

What this model does, as far as security is concerned, is move a large chunk of the risk to central, manageable locations: the app store gateway, and the cloud operations folks. Presumably these should protect their own turf, and by extension you, but that's a bit of a stretch and we know it wont always happen that way. I expect some significant issues with app stores and the cloud in the coming years, but I also anticipate that we'll wind up with this model widely adopted, and with significant security benefits to come. Imagine a world where Zeus and Koobface can't arbitrarily infect your computer. Threat vectors will change but never go away.

As for me, I use the cloud but I still use a general purpose OS on my netbook and laptop. I like to innovate, and a closed platform like the iPhone OS isn't supportive of that. That said, I'll probably ditch my iPhone when my renewal comes up, it's simply not exciting enough in terms of its innovation any more. Better apps are being written elsewhere.

     [link]     Friday, Apr 16, 2010 @ 02:35pm

      |


things i did over my vacation



3 week vacation ending today ... thing i did:
  • added a twitter aggregator to infosecdaily
  • wrote a replacement for iwantsandy
  • wrote an apache logfile analyzer
  • installed a cat door to the basement
  • hung out with the baby a lot
  • visited my in-laws
  • didn't make any air travel (first time in about a year)
  • did a little cooking
  • cleaned the house
  • analyzed two new pieces of malware
  • sold some stuff on amazon
now i need a real vacation!

     [link]     Sunday, Jan 04, 2009 @ 10:57am

      |


oops command lives



years ago i saw my dad working in a terminal and i could have sworn i typed "oops " when he made a typo and it worked: the command was fixed and rerun, he didn't need to retype the whole thing. i always wanted the oops command.

however, it didn't exist, or at least as i knew it. so i wrote a portable version of it (it seems it exists in zsh, a shell i just don't use). the python part of it is really simple, just a levenshtein distance calculator and a replacement engine. you need to create a command alias for it however:

ksh, sh, bash:
$ alias oops='history>/tmp/oops_history && ~/bin/oops.py'
csh and derivatives:
% alias oops 'history > /tmp/oops_history && ~/bin/oops.py'
here's a brief example of it in action:
$ emacss ~/bin/oops.py
ksh: emacss: not found
$ oops emacs
[ emacs opens and voila, working ... ]
i make a lot of typos and rather than cutting, pasting, fixing the line this makes it easier. some bugs and limitations:
  • i need to make it use the damerau distance, which is better for spelling errors)
  • not extensively field tested at all
  • it doesn't leave a corrected mark in your history file
  • it doesn't work for shell built-ins (e.g. cd)
let me know if you're interested in playing around with it.

     [link]     Monday, Nov 17, 2008 @ 10:27am

      |


translate button



i work with a lot of foreign language websites, most in languages i do not read. to help me with that i often use google translate. but, i got sick of copying and pasting URLs or text, so i built this translate button for firefox.

it works like this: when you're on a page and you want to translate it, surf to the "translate" bookmark (i've located mine in the bookmark toolbar folder in ffox). poof, automagically translated into english. the magic is that the bookmark is javascript that will construct the proper URL for your current page as a target of google translate, and then surf you there. voila.

make a "translate" bookmark and make this the "location" in its properties. now you, too, can have a translate button.
javascript:var h = escape(location.href); 
           newurl = 'http://translate.google.com/translate?u=' + h + '&hl=en&ie=UTF-8&sl=*&tl=en';
           document.write(newurl);
           window.location = newurl;
paste that code in. simple as that. if you need to change the target language you can change "tl=en" to another laguage (e.g. "es" or "fr"). the source language is automatically determined by google translate (see "sl=*").

     [link]     Sunday, Oct 26, 2008 @ 06:16pm

      |


csv2xml.py



whipped this up for someone earlier today:
#!/usr/bin/env python

import cgi, csv, sys, cElementTree

reader = csv.reader(open(sys.argv[1], "r"), quoting=csv.QUOTE_NONE) lineno = -1 root = cElementTree.Element('document', {'name': sys.argv[1], 'generator': 'csv2xml.py'}) for line in reader: lineno += 1 if lineno == 0: fields = line continue elem = cElementTree.SubElement(root, 'line', {'lineno': str(lineno)}) for i, field in enumerate(fields): subelem = cElementTree.Element(cgi.escape(field.replace(' ', '_').replace('&', 'and'))) subelem.text = cgi.escape(line[i]) elem.append(subelem)

cElementTree.dump(root)


similar to http://csv2xml.sourceforge.net/.

     [link]     Friday, Oct 24, 2008 @ 02:05pm

      |


eating, playing, week of sep 29





little dom's been feverin' lately, but he's slowly getting better.

for the past year and a half i've been working my way through cookbooks and stuff, learning new recipes and techniques, rarely repeating myself. it's a challenge, and sometimes beth wants an old favorite that fell way back, but overall we've found a few gems. here's a couple we made this week: we eat well, and it's a lot of fun to cook.

been getting into wooden toys for dominic. i can't bring myself to spend $56 a set on wooden blocks, so i've been investigating making my own. stay tuned ...

does dominic need his own homepage on kidmondo? it may simplify stuff for his relatives to keep track of him.

also been thinking about finally painting inside the house. i wonder if we could pull off peacock blue cabinets.

     [link]     Friday, Oct 03, 2008 @ 08:24am

      |