Lexical analysis in source code scanning

Uninet Infosec 2002

Jose Nazario

20 April, 2002

ABSTRACT: Automated code audits have become a hot topic in recent years, and several tools exist at various levels for performing them. many are based on lexical analysis, which matches patterns and performs actions on the selected text. while reviewing several Open Source source code analysis tools, the author decided to develop his own. this tool, named czech, is in development but is already functional for many key features. this presentation will introduce the topics of lexical analysis in source code scanning, the limitations of this approach, and cover several tools, as well. in addition, the design goals and implementation of the czech tool will also be given, along with the lessons learned in this endeavor. in summary, lexical analysis is easy for both the developer and the authors of the tools, but has severe limitations in understanding the paths that data can take within an application, severely limiting its ability to identify potential holes.

slides [Hi-res - HTML]

text (english, spanish, french) [HTML]

The presentation was generated using Magicpoint 1.09a, from http://www.mew.org/mgp/. A Makefile was used to generate the Postscript, PDF and HTML output. The Makefile is available, and it's rather generic for Magicpoint presetations. TTF fonts were downloaded from Microsoft's TTF download site at http://www.microsoft.com/typ ography/fontpack/default.htm.