jmatch: fuzzy text matching tool

about
requirements
to do
installation
download
related software

author: jose_at_monkey.org

license: 3-clause BSD.

about jmatch

$ jmatch -h
Usage: jmatch [-HLins] -d dist needle file1 [...]
search for matching text using fuzzy rules.
        -H      use the Hamming distance algorithm
        -L      use the Levenshtein distance algorithm
        -i      case insensitive search
        -n      print line numbers of matching files
        -s      print the final distance score
        -d      maximum distance allowed for match
the following example shows how it compares to grep:
$ jmatch -Lnis -d 5 "begin 664" /home/jose/monkey-spam*
/home/jose/monkey-spam-2.mbox:593107:5:beginning
/home/jose/monkey-spam-2.mbox:600298:5:beg
/home/jose/monkey-spam-2.mbox:994278:5:being,
/home/jose/monkey-spam-2.mbox:1169258:5:ve in
/home/jose/monkey-spam-2.mbox:1486415:5:erin
/home/jose/monkey-spam-2.mbox:1488703:5:been
some caveats when compared to grep(1):
  1. whitespace matters, the match is done against the whole line.
  2. you'll often wind up with more than you expected.
  3. be aware of how the algorithms work. the Levenshtein distance is computed with equal costs of conversions or insertions of characters. the Hamming distance can only be computed if the "needle" is the same length as the line being tested.

requirements
jmatch requires libdistance to build.

to do

  • more algorithms: needleman-wunsch, jaccard, etc ...

installation

  1. download
  2. unpack
  3. modify Makefile (or GNUmakefile) to point to the correct libdistance location (to find libdistance.a and distance.h)
  4. make (or gmake)
download

jmatch-0.2.0.tar.gz (1 jan 2005)
preliminary release.

related software

jmatch and this site are copyright © 2004 jose nazario, all rights reserved.