jmatch: fuzzy text matching tool

to do
related software


license: 3-clause BSD.

about jmatch

$ jmatch -h
Usage: jmatch [-HLins] -d dist needle file1 [...]
search for matching text using fuzzy rules.
        -H      use the Hamming distance algorithm
        -L      use the Levenshtein distance algorithm
        -i      case insensitive search
        -n      print line numbers of matching files
        -s      print the final distance score
        -d      maximum distance allowed for match
the following example shows how it compares to grep:
$ jmatch -Lnis -d 5 "begin 664" /home/jose/monkey-spam*
/home/jose/monkey-spam-2.mbox:1169258:5:ve in
some caveats when compared to grep(1):
  1. whitespace matters, the match is done against the whole line.
  2. you'll often wind up with more than you expected.
  3. be aware of how the algorithms work. the Levenshtein distance is computed with equal costs of conversions or insertions of characters. the Hamming distance can only be computed if the "needle" is the same length as the line being tested.

jmatch requires libdistance to build.

to do

  • more algorithms: needleman-wunsch, jaccard, etc ...


  1. download
  2. unpack
  3. modify Makefile (or GNUmakefile) to point to the correct libdistance location (to find libdistance.a and distance.h)
  4. make (or gmake)

jmatch-0.2.0.tar.gz (1 jan 2005)
preliminary release.

related software

jmatch and this site are copyright © 2004 jose nazario, all rights reserved.