- need many more checks in xdr for compounds - last bus outta here: 21:48 - get backups running - lunch with jose thursday or friday - get pictures developed - france receipts tomorrow - plane to accra departs friday 18:00
"I have a meeting with Dan tomorrow at noon."and
"Tomorrow, I have a meeting with Dan at noon."note that, while "tomorrow" changed positions, most words remained in the same order relative to some other set of words. these observations motivate what we call rule-based classification -- the scheme of classification used in pimlog.
"Tomorrow, I have a meeting with Dan at noon. We will be discussing new business plans for the new millenium, and then contemplating exactly how it is we're going to take ove the world."only the boldfaced text is needed in order to determine that this should be a calendar entry for "tomorrow noon". given a set such a rule and an entry to be classified, we measure distance between the rule and the entry. given the observations made in the previous section, such a distance needs to favor structural likeness as well as using "approximately" the same words to do so -- that is, we can infer the likeness of sets of words. we have augmented the levenshtein distance (aka minimum edit distance) algorithm to measure our structural likeness. the reasoning behind this choice is that the minimum distance metrics are quite comparable to ours of structural likeness; for example, to simply "switching" two components does not cost as much as a wholesale substitution of said components with new ones. so, given a set of rules (the more the better), when a new entry is classified, pimlog measures said distance from the rule to the entry and chooses the rule which has the lowest distance (highest "rank"). this rule then indicates what type of entry that is classified by that particular rule. in addition to structural distance, pimlog also normalizes target entries and rules before they are compared. such normalizations substitutes specific names for more general categories, e.g. "Tuesday" -> "
Meeting with Dave on Tuesday / appt Appointment with Dan Friday / appt Appointment at the Marriot on Sunday / appt Meet with Dr. Karl tomorrow morning / appt Meeting with Roger tomorrow at noon / appt I need to finish my NLP homework / todo Continue project Sekret / todo Mary's phone number is 666-666-666 / addr Get backups running / todo Check out Ion / todo
$ python learn.py RULES RULES.db
$ python match.py RULES.db ENTRIES I have a meeting with Dan tomorrow at noon. Type 1 with Rank 0.833333 Appointment with Mary at 6p Thursday. Type 1 with Rank 1.000000 6:0PM I need to figure out how to do my NLP assignment before Friday morning. Type 2 with Rank 0.777778 Some rather random text, that I have no idea how got here! Type 2 with Rank 0.142857 But, I do have an appointment with Mark tomorrow. Hopefully at noon. Type 1 with Rank 0.833333 I need to get an appointment with Mary thursday jan 3 at 12p. Type 2 with Rank 0.571429 Remember my appointment with Angela on friday march 6 at 9am Type 1 with Rank 0.600000 9:0AM Friday March 6