signed archives: an evaluation of internet trust

Note: this is a rough writeout of what will be presented at Umeet 2002 on Sunday, 15 Dember, 2002. A partial babelfish translation into spanish and french appears below. The slides are available on my website at: http://www.monkey.org/~jose/presentations/signed-archives.d/

initial translation by babelfish, refined by Oroz. muchos gracias!

INTRODUCTION

[slide 1]
This talk will be on a survey I did earlier this fall of over 2000 software archives with detatched signatures. Looking for undiscovered trojans, errors, and a general feeling of how well the system was working, I obtained and verified these archives and studied the data set I generated.

A little bit about me. I'm a Ph.D. biochemist who left science to go work in software engineering. Years ago I founded Crimelabs as a way to share ideas and resources. My research interests include trust relationships (of which this research is a part), source code and software analysis techniques and tools, and network measurements and performance. I work at Arbor Networks in Ann Arbor, MI, USA.

[slide 2]
Very briefly, this talk will cover the research into looking for trojans on the Internet. We define a trojan as a modified binary, using the term "Trojan horse" to indicate an altered package which appears normal. I will briefly discuss the motivation behind this work, which compelled me to go looking for the prevalence of how many modified archives there are on the Internet. After a discussion of how I did this, I'll show you the raw results and then a discussion on that data set as well as the meta data I gathered in this research. Lastly, I'll describe some of the future work in this area, some of which I am hoping to work on.

[slide 3]
The whole approach to this study was to download and verify the archives. This is a rather simplistic approach, but it seems to have gotten the job done and provided some interesting results. Briefly, I identified signed archives available on the Internet, downloaded them and grabbed the corresponding PGP key, verified the archive, and then analyzed the data. Errors were, of course, investigated.

[slide 4]
This study was motivated by the series of high profile archive modifications which occured in 2002. Specifically, after the OpenSSH trojan this summer, I build a small tool to verify packages and started thinking about the wide scale analysis which would be needed on the Internet to discover how widespread this phenonmenon has become. Specifcally, this year saw packages such as Dug Song's tools dsniff, fragrouter, and fragroute trojanned, the OpenSSH source code was modified and injected into the mirror system, Sendmail suffered an interesting attack, and the latest was tcpdump and libpcap. This, of course, begs the question of how often this is happening.

Also, PGP has been around for about 10 years. It's a big success in terms of use, and it would be interesting to see how well the web of trust is doing. The threat model in this analysis is simple: if someone were to alter a software package, the failure to verify correctly via a PGP signature should appear immediately. Furthermore, if a false key is injected into the web of trust, that should also be appearant from the failure to have the right relationships based on signatures.

[slide 5]
Briefly, doing this study is relatively easy. I identified software archives with detached signatures by using Google. I simply looked for the common terms used to name the signatures, such as '.tar.gz.sig' and '.tar.gz.asc'. This built a list of about 166 unique servers with over 2800 archives. After the OpenSSH fiasco I built a small tool called `extract' which easily automatically performs the PGP check needed to verify an archive. I modified the tool for the purposes of this study to simply report on the success or failure of the verification process. I then downloaded these identified archives (using wget), checked them all, and then post processed the data.

[slide 6]
Extract is a small shell script wrapper for tar and gpg. It looks for the detached signature for an archive when you want to extract a file. Finding the archive, it then checks for the key on the local keyring. If the key exists, it continues. If the key doesn't exist, it fetches the key, adds it to the keyring, and then restarts. Once the key is present, the signature if verified with the key. If it checks out ok, the archive is extracted.

Extract was designed to be small, efficient, and easy to use. While GPG has a relatively easy to use interface, compare:

extract openssh-2.4p1.tar.gz
...
and how you would do it manually:
gnupg --verify openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz
gnupg --keyserver search.keyserver.net --recv-key (KEYID)
gnupg --verify openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz
tar -zxvf openssh-2.4p1.tar.gz
That assumes you don't have the key locally and must fetch it as well. I really prefer tools that are streamlined in their use, and attempt to write tools that fit that model.

[slide 7]
The actual act of downloading the archives took about 3 days on my cable modem. This was because the model I built operated only serially. By comparison's sake I was able to max it out by running 8 parallel fetches for the Linux kernel but complete it overnight, demonstrating the classic time-space tradeoff. The total storage required for the archives was about 9GB of disk.

[slide 8]
Briefly, here's a graph showing the impact of the traffic on my cable modem for a 1 day or so period. You can see that the traffic spikes really outshone my normal traffic, visible on the left.

[slide 9]
The analysis, which occured in bulk, started with an empty GPG key ring. I wanted to see what ring characteristics emerged after this analysis. I modified `extract-0.1' to only report on the success or failure of the verification process, and in using extract (over gpg itself) I was able to fetch keys as neeeded. The process was driven by a small shell script wrapper which finds all of the archives in the directory and runs this modified extract tool on them. Data analysis took about 3 or 4 hours on my K6-2/300 machine, which I use for most of my data collection needs (it's currently mapping the Internet). All of the actions were logged, which was then postprocessed.

[slide 10]
The overall results are shown in this slide. Briefly, 2804 archives were checked in this process, representing a total of 1426 archives. 166 unique servers were downloaded from, meaning that many act as a mirror server. Only 93 keys were retrieved in the whole process, indicating many authors have many releases.

2799 archives were a success, they verified OK in this process. 5, however, failed to do so.

[slide 11]
Since these 5 were the really Interesting set for the first stage of analysis, I had to look at them.

The first failure was due to a truncated download. A mirror site cut me off prematurely and an OpenSSH ditribution file was cut short, Hence, it didn't verify correctly.

The next two were false negatives. I don't know why they failed, but they did. Manual reinspection showed that they were OK. Note that `extract' fails to a FAILURE mode, not a PASS mode.

Failures 4 and 5 were legitimate failures. The author was contacted and the results were verified. It turns out that Alex Brennan uploaded a new archive but didn't fix the signature. As you would expect he appreciated the note.

[slide 12]
Some archives were a complete failure, however. The CMU-SNMP packages were signed using an old key. This old key format is incompatable with current standards based GnuPG tools. I haven't contacted the authors, but is a clear demonstration of a breakdown of the system. No valid keys were ever found.

[slide 13]
Now we begin the metadata analysis which forms the bulk of this paper. Basically this survey uncovered four weaknesses in the signed archive system: - inline key distribution - a risk of a compromise of the key itself - few signatures on some keys - and a lack of trust in some keys

[slide 14]
By inline key distribution I refer to the act of placing the PGP key used to sign the archives alongside the archives and signatures themselves. The problem lies in the temptation for the user to download the key as well as the archive. For an attacker, the setup is interesting: basically, when you modify the binary archive, you sign it using a forged key which you also upload to the site. When people download the archive and key pair, the signature will be valid but the archive will not be. Notable abusers of this include the OpenSSH portable team, SSH communications, the Cyrus team, and the GnuPlot team.

[slide 15]
Next, this study revealed that some keys are at risk of compromise by a determined adversary. Briefly, in the 93 keys analyzed in this study, most were 3 or fewer years old. However, some were as old as 10 years. While an older key has been around longer and established more trust (you know what to expect), it is also around longer for an adversary to attack and factor. This also assumes that the older, original PGP software generates truly safe keys. Given how many weaknesses have been found in cryptographic software in the past 10 years, this is a likely possibility. Also, the sizes of the keys relates directly to this as well. A shorter key can be factored more easily. Most of the keys used to sign archives were 1024-bit RSA and DSA keys, but some were 512-bit keys. This is now a tractable size for an adversary who has an interest in factoring any RSA encryption key pair (see Simon Singh's Code Book and the resulting challenge).

[slide 16]
These key ages are shown here graphically. You can see that most are from the year 1999, but many are from prior to that.

[slide 17]
Shown here is the distribution of the key sizes in bits. Again, most are 1024-bit keys. Both RSA and DSA keys were grouped together for this graph.

[slide 18]
Lastly, when you correlate the age of keys and their sizes, you can see a generic trend towards larger keys as the software grows to support it. However, 1024-bit keys have always been present and will probably also be present for a long time to come. Perhaps it's time we change the default setting in gpg.

[slide 19]
The next two points in the analysis of the data retrieved in this study focus on the signatures on the key. The signatures form the basis of the web of trust in the PGP world. With fewer signatures on any key, it becomes harder to verify the veracity of the key (ie does the owner really own that key? is it who you think it is?). The first set of analysis I performed focused in the number of signatures. The average number of signatures per key was 21, while some keys had no signatures and two had 261 signatures per key. These last two are Debian developers and heavily participate in key signing events.

[slide 20]
The results of that analysis are shown here in this figure. The heavy bias to the left indicates that most keys have only a handful of signatures. Very few have no signatures, but most only have about 5 or 7 signatures.

[slide 21]
The next step in the analysis of the signatures on the key was to try and establish the owner of the key. This was inspired by a good set of conversations I have had with Niels Provos. Basically, what you do here is you examine the signatures on any given key and try and trace it back to something you know. In this case, I try and tie the keys back to the large, strong set identified by the initial analysis by the folks at Dtype.org. This strong set is a set of keys over 100,000 strong which are a self contained unit. Every key in that set somehow references every other key and nothing external to that set.

Of the 93 keys analyzed here, about 2/3 could be mapped to the strong set. 36 keys failed to map back to the strong set (using the key path server at http://the.earth.li/~noodles/pathfind.html and the data from Jason Harris at http://keyserver.kjsl.com/~jharris/ka/). By tying a key back to the strong set we can safely assume that the owner is correctly indicated on the key.

While this metric is considerably stronger than the mere analysis of the number of signatures on any key, it relies heavily on the motives of any signer. Some sign only with full knowledge of the key holder and the link between that person and the key, while others sign keys after a brief introduction. This is a classic contrast of a strong trust metric and a weak one. Using the weak links in the chain, one could subvert the system with enough signatures on any forged key.

[slide 22]
The links of the keys identified in this study to the keys at the center of the strong set (indicated in blue) are shown here. For a better graphic have a look at http://monkey.org/~jose/graphing/csw03/csw03.png .

[slide 23]
This foray into the web of trust and its use isn't the first of its kind, but I do think its the first to do a widespread survey of signed archives. The `extract' tool is related to Dug Song's `gzsig' tool and Marius' PGPwrap library. Marius wrote that library after finding the license terms of `GPGme' unacceptable (a typical monkey likes BSD code).

The detached signatures are related to the BSD ports tree and the cryptographic checks made by the system. Briefly, any distribution file is hashed using three cryptographic hashes (MD5, SHA-1, and RMD160) to verify the intrigity if the download. Note that this is the system that caught most of the 2002 trojans, not the public keys system.

[slide 24]
So, while this study has shown that it appears that there are few widespread trojans lying in wait for people in the Internet, there are several weaknesses in the system which can be exploited by an attacker. Ideally, I'd like to continue to perform this check on a rolling basis. Right now I'm looking to find a research partner, I need more bandwidth and I need more storage space.

Ideally everyone would be a part of that strong set. Right now there are a number of disconnected islands (note that I'm not in the strong set). This would aid, hopefully, in establishing the veracity of the keys. It would also be nice to see tools incorporated into the PGP model, such as 'extract' or `mutt-sigtrace' (http://www.chaosreigns.com/code/mutt-sigtrace/) which can aid in the checking of keys. Next, more signed archives need to be out there, we need to know that this is what the author intended to upload. And lastly, the world needs a better system. There are simply too many holes in the current one, I think it's time to do better.

[slide 25]
I really need to acknowledge several people here. Beth let me destroy our cable modem's performance for a couple of days; Marius, Dug, Niels, Alex, and Seth all provided excellent feedback and ideas; the dtype.org people (the participants on the disucssion list, and Jason Harris) have been great in doing their work into the web of trust metrics; the Umeet organizers, thank you for having me speak.

And of course you, I appreciate your interest. Thanks!

SPANISH

archivos firmados: una evaluación de la confianza del Internet 

Nota: éste es un esbozo de lo qué será presentado en Umeet 2002 el domingo, 15 Dember, 2002. Una traducción del babelfish al español y francés aparece abajo. Las diapositivas están disponibles en mi website en:
http://www.monkey.org/~jose/presentations/signed-archives.d / 

INTRODUCCIÓN 

[ diapositiva 1 ] 
Esta charla e sobre un examen que hice este otoño sobre de 2000 archivos de software con firmas. Buscando
trojanos sin descubrir, errores, y una sensación general de cómo esta' bien el sistema trabajaba, obtuve y verifiqué estos archivos y estudié
los datos que generé. 

Un poco sobre mí. Soy un bioquímico de Ph.D. que dejo la ciencia para ir a trabajo en ingeniería del software. Hace años fundé
Crimelabs como manera de compartir ideas y recursos. Mis intereses de la investigación incluyen las relaciones de la confianza (de
cuáles es una pieza esta investigación), las técnicas y las herramientas del análisis del código de fuente y del software, y las medidas y
funcionamiento de la red. Trabajo en Arbor Networks in Ann Arbor, MI, USA.. 

[ diapositiva 2 ] 
Muy brevemente, esta charla cubrirá la investigación en buscar trojanos sobre el Internet. Definimos un Trojan como binario modificada,
usando el término "Trojan Horse" para indicar un paquete alterado que aparezca normal. Discutiré brevemente la motivación detrás de
este trabajo, que me obligó a que fuera a buscar el predominio de cuántos modificaron archivos allí están en el Internet. Después de una
discusión de cómo hice esto, le demostraré los resultados brutos y entonces una discusión sobre eso modem tan bien como los meta
datos recolecté en esta investigación. Pasado, describiré algo del trabajo futuro en esta área, algo en de la cual estoy esperando trabajar. 

[ diapositiva 3 ] 
El acercamiento del conjunto a este estudio era descargar y verificar los archivos. Esto es un acercamiento algo simplista, pero se parece
haber conseguido el trabajo hecho y proporcionado algunos resultados interesantes. Brevemente, identifiqué los archivos firmados
disponibles en el Internet, descargados les y asidos la llave correspondiente del PGP, verificados el archivo, y después analizados los
datos. Los errores, por supuesto, fueron investigados. 

[ diapositiva 4 ] 
Este estudio fue motivado por la serie de altas modificaciones del archivo del perfil que ocurrieron en 2002. Específicamente, después del
OpenSSH Trojan este verano, construyo una herramienta pequeña para verificar los paquetes y el pensamiento comenzado del análisis
amplio de la escala que sería necesario en el Internet descubrir cómo es extenso se ha convertido este fenómeno. Especificamente, este
año vio los paquetes tales como dsniff de las herramientas de Dug Song's, fragrouter, y el fragroute trojanned, el código de fuente
de OpenSSH fue modificado e inyectado en el sistema del espejo, Sendmail sufrió un ataque interesante, y el más último era tcpdump y
libpcap. Esto, por supuesto, pide la cuestión de cómo está sucediendo a menudo éste. 

También, el PGP ha estado alrededor por cerca de 10 años. Es un éxito grande en términos del uso, y sería interesante ver cómo esta' bien
la tela de la confianza está haciendo. El modelo de la amenaza en este análisis es simple: si alguien alterara una paquete de software, la
falta de verificar correctamente vía una firma del PGP debe aparecer inmediatamente. Además, si una llave falsa se inyecta en la tela de
la confianza, que debe también ser appearant de la falta tener las relaciones derechas basadas en firmas. 

[ diapositiva 5 ] 
Brevemente, hacer este estudio es relativamente fácil. Identifiqué archivos del software con las firmas separadas usando Google. Busqué
simplemente los términos comunes usados para nombrar las firmas, tales como ' tar.gz.sig ' y ' tar.gz.asc '. Esto construyó una lista de
cerca de 166 servidores únicos con sobre 2800 archivos. Después de que el fiasco de OpenSSH yo construyera una herramienta pequeña
llamada el extracto del ` ' que realiza fácilmente automáticamente el cheque del PGP necesitado para verificar un archivo. Modifiqué la
herramienta para los propósitos de este estudio de divulgar simplemente sobre el éxito o la falta del proceso de la verificación. Después
descargué estos archivos identificados (que usan el wget), los comprobé todos, y después procesé los datos. 

[ diapositiva 6 ] 
El extracto es una envoltura pequeña del shell script para tar y gpg. Busca la firma separada para un archivo cuando usted
desea extraer ese archivo. Encontrando el archivo, entonces comprueba para saber si hay la llave en keyring local. Si existe la llave,
continúa. Si no existe la llave, trae la llave, agrega la al keyring, y entonces recomenzar. Una vez que la llave esté presente, la firma si
está verificado con la llave. Si comprueba fuera de la autorización, se extrae el archivo. 

El extracto fue diseñado para ser pequeño, eficiente, y fácil utilizar. Mientras que GPG tiene un relativamente fácil utilizar el interfaz,
compare: 

 extraiga openssh-2.4p1.tar.gz...  

y cómo usted lo haría manualmente: 

gnupg --verify openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz
gnupg --keyserver search.keyserver.net --recv-key (KEYID)
gnupg --verify openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz
tar -zxvf openssh-2.4p1.tar.gz

Que le asume no tenga la llave localmente y debe traerla también. Realmente prefiero las herramientas que se aerodinamizan en su uso, y
procuro escribir las herramientas que caben ese modelo. 

[ diapositiva 7 ] 
El acto real de descargar los archivos tomó cerca de 3 días en mi módem de cable. Esto era porque el modelo que construí funcionado
solamente en serie. Por el motivo de la comparación podía al máximo él hacia fuera funcionando 8 alcances paralelos para el núcleo de
Linux pero lo termino durante la noche, demostrando el tradeoff clásico del tiempo-espacio. El almacenaje total requerido para los
archivos estaba sobre 9GB del disco. 

[ diapositiva 8 ] 
Brevemente, aquí está un gráfico que demuestra el impacto del tráfico en mi módem de cable para un 1 día o tan período. Usted puede ver
que el tráfico clava realmente outshone mi tráfico normal, visible a la izquierda. 

[ diapositiva 9 ] 
El análisis, que ocurrió en bulto, comenzó con un anillo dominante vacío de GPG. Deseé ver qué características del anillo emergieron
después de este análisis. Modifiqué el ` extract-0.1 ' para divulgar solamente sobre el éxito o el fallo del proceso de la verificación, y al
usar el extracto (con gpg) era capaz de traer llaves según se necesitaban. El proceso fue conducido por una envoltura
pequeña del shell script que encuentra todos los archivos en el directorio y corre esta herramienta modificada del extracto en ellos. El
análisis de datos tomó cerca de 3 o 4 horas en mi máquina K6-2/300, que utilizo para la mayoría de mis necesidades de la colección de
datos (está actualmente "mapeando" el Internet). Todas las acciones fueron registradas, que entonces fueron postprocesadas. 

[ diapositiva 10 ] 
Los resultados totales se demuestran en esta diapositiva. Brevemente, 2804 archivos fueron llegados este proceso, representando un
total de 1426 archivos. 166 servidores únicos fueron descargados de, significando que muchos actúan como servidor del espejo.
Solamente 93 llaves fueron recuperadas en el proceso entero, indicando a muchos autores tienen muchas revisiones (releases). 

2799 archivos eran un éxito, ellos verificaron MUY BIEN en este proceso. 5, sin embargo, fallaron. 

[ diapositiva 11 ] 
Puesto que estos 5 eran el sistema realmente interesante para la primera etapa del análisis, tuve que mirarlos. 

La primera falta era debido a una transferencia directa truncada. Un mirror site me cortó prematuramente y un archivo del
ditribution de OpenSSH fue cortado brevemente, por lo tanto, no verificó correctamente. 

Los dos siguientes eran negativas falsas. No sé porqué fallaron, pero lo hicieron. La reinspección manual demostró que eran la nota
ACEPTABLE que el extracto del ` ' falla a un modo de FALLO, no un modo del PASO. 

Las faltas 4 y 5 eran faltas legítimas. Entraron en contacto con al autor y los resultados fueron verificados. Resulta que Alex Brennan
uploaded un archivo nuevo pero no fijó la firma. Como puede esperar él apreció la nota. 

[ diapositiva 12 ] 
Algunos archivos eran una falta completa, sin embargo. Los paquetes Cmu-SNMP fueron firmados usando una vieja llave. Este viejo
formato dominante es incompatable con las herramientas basadas los estándares actuales de GnuPG. No he entrado en contacto con a
autores, sino soy una demostración clara de un fallo del sistema. No se encontró ninguna llaves válidas. 

[ diapositiva 13 ] 
Ahora comenzamos el análisis del metadata que forma el bulto de este estudio. Este examen destapó básicamente cuatro debilidades en el
sistema firmado del archivo: - distribución dominante en línea - un riesgo de un compromiso de la llave sí mismo - pocas firmas en algunas
llaves - y una carencia de la confianza en algunas llaves 

[ diapositiva 14 ] 
Por la distribución dominante en línea refiero al acto de poner la llave del PGP usada para firmar los archivos junto a los archivos y a las
firmas ellos mismos. El problema está en la tentación para que el usuario descargue la llave así como el archivo. Para un atacante, la
disposición es interesante: básicamente, cuando usted modifica el archivo binario, usted lo firma con una clave "forged"
usted también ha subido al sitio. Cuando la gente descarga el par archivo y llave, la firma será válida pero el archivo no será. Los
abusadores notables de esto incluyen al equipo portable de OpenSSH, a las comunicaciones de SSH, al equipo de Cyrus, y a equipo de
GnuPlot. 

[ diapositiva 15 ] 
Después, este estudio reveló que algunas llaves están en el riesgo del compromiso de un adversario resuelto. Brevemente, en las 93
llaves analizadas en este estudio, la mayoría tenían 3 o menos años. Sin embargo, algunos eran tan viejos como 10 años. Mientras
que una más vieja llave ha sido alrededor más larga y estableció más confianza (usted sabe qué esperar), es también alrededor más larga
para un adversario atacar y el factor. Esto también asume que el más viejo, original software del PGP genera llaves verdaderamente
seguras. Dado cuántos se han encontrado las debilidades en software criptográfico en los últimos 10 años, esto es una posibilidad
probable. También, los tamaños de las llaves se relacionan directamente con esto también. Una llave más corta se puede descomponer en
factores más fácilmente. La mayoría de las llaves usadas para firmar archivos eran 1024-bit RSA y llaves del DSA, pero algunas eran las
llaves 512-bit. Esto ahora es un tamaño manejable para un adversario que tenga un interés en descomponer en factores cualquier par de la
llave del cifrado de RSA (véase el libro del código de Simon Singh y el desafío que resulta). 

[ diapositiva 16 ] 
Estas edades dominantes se demuestran aquí gráficamente. Usted puede ver que la mayoría es a partir del año 1999, pero muchos son
antes de ése. 

[ diapositiva 17 ] 
Se demuestra aquí la distribución de los tamaños dominantes en pedacitos. Una vez más la mayoría son las llaves 1024-bit. Las llaves de
RSA y del DSA fueron agrupadas juntas para este gráfico. 

[ diapositiva 18 ] 
Pasado, cuando usted correlaciona la edad de llaves y de sus tamaños, usted puede ver una tendencia genérica hacia llaves más grandes
mientras que el software crece en su ayuda. Sin embargo, las llaves 1024-bit han estado siempre presente y probablemente también estarán
presentes durante mucho tiempo venir. Quizás es tiempo de que cambiemos el valor por defecto que fija en gpg. 

[ diapositiva 19 ] 
Los dos puntos siguientes en el análisis de los datos recuperados en este estudio *** TRANSLATION ENDS HERE *** focus on the
signatures on the key. The signatures form the basis of the web of trust in the PGP world. With fewer signatures on any key, it becomes
harder to verify the veracity of the key (ie does the owner really own that key? is it who you think it is?). The first set of analysis I
performed focused in the number of signatures. The average number of signatures per key was 21, while some keys had no signatures and
two had 261 signatures per key. These last two are Debian developers and heavily participate in key signing events. 

[slide 20] 
The results of that analysis are shown here in this figure. The heavy bias to the left indicates that most keys have only a handful of
signatures. Very few have no signatures, but most only have about 5 or 7 signatures. 

[slide 21] 
The next step in the analysis of the signatures on the key was to try and establish the owner of the key. This was inspired by a good set of
conversations I have had with Niels Provos. Basically, what you do here is you examine the signatures on any given key and try and trace
it back to something you know. In this case, I try and tie the keys back to the large, strong set identified by the initial analysis by the folks
at Dtype.org. This strong set is a set of keys over 100,000 strong which are a self contained unit. Every key in that set somehow
references every other key and nothing external to that set. 

Of the 93 keys analyzed here, about 2/3 could be mapped to the strong set. 36 keys failed to map back to the strong set (using the key
path server at http://the.earth.li/~noodles/pathfind.html and the data from Jason Harris at http://keyserver.kjsl.com/~jharris/ka/). By tying a
key back to the strong set we can safely assume that the owner is correctly indicated on the key. 

While this metric is considerably stronger than the mere analysis of the number of signatures on any key, it relies heavily on the motives
of any signer. Some sign only with full knowledge of the key holder and the link between that person and the key, while others sign keys
after a brief introduction. This is a classic contrast of a strong trust metric and a weak one. Using the weak links in the chain, one could
subvert the system with enough signatures on any forged key. 

[slide 22] 
The links of the keys identified in this study to the keys at the center of the strong set (indicated in blue) are shown here. For a better
graphic have a look at http://monkey.org/~jose/graphing/csw03/csw03.png . 

[slide 23] 
This foray into the web of trust and its use isn't the first of its kind, but I do think its the first to do a widespread survey of signed
archives. The `extract' tool is related to Dug Song's `gzsig' tool and Marius' PGPwrap library. Marius wrote that library after finding the
license terms of `GPGme' unacceptable (a typical monkey likes BSD code). 

The detached signatures are related to the BSD ports tree and the cryptographic checks made by the system. Briefly, any distribution file
is hashed using three cryptographic hashes (MD5, SHA-1, and RMD160) to verify the intrigity if the download. Note that this is the
system that caught most of the 2002 trojans, not the public keys system. 

[slide 24] 
So, while this study has shown that it appears that there are few widespread trojans lying in wait for people in the Internet, there are
several weaknesses in the system which can be exploited by an attacker. Ideally, I'd like to continue to perform this check on a rolling
basis. Right now I'm looking to find a research partner, I need more bandwidth and I need more storage space. 

Ideally everyone would be a part of that strong set. Right now there are a number of disconnected islands (note that I'm not in the strong
set). This would aid, hopefully, in establishing the veracity of the keys. It would also be nice to see tools incorporated into the PGP model,
such as 'extract' or `mutt-sigtrace' (http://www.chaosreigns.com/code/mutt-sigtrace/) which can aid in the checking of keys. Next, more
signed archives need to be out there, we need to know that this is what the author intended to upload. And lastly, the world needs a better
system. There are simply too many holes in the current one, I think it's time to do better. 

[slide 25] 
I really need to acknowledge several people here. Beth let me destroy our cable modem's performance for a couple of days; Marius, Dug,
Niels, Alex, and Seth all provided excellent feedback and ideas; the dtype.org people (the participants on the disucssion list, and Jason
Harris) have been great in doing their work into the web of trust metrics; the Umeet organizers, thank you for having me speak. 

And of course you, I appreciate your interest. Thanks! 

FRENCH

archives signées: une évaluation de confiance d'Internet 

Note: c'est un writeout approximatif ce qui sera présenté chez Umeet 2002 dimanche, de 15 Dember, 2002. Une traduction de babelfish en
espagnol et le Français apparaît ci-dessous. Les glissières sont disponibles sur mon website à:
http://www.monkey.org/~jose/presentations/signed-archives.d / 

INTRODUCTION 

[ diapositive 1 ] 
Cet entretien sera sur un aperçu que j'ai fait plus tôt cette chute de plus de les archives 2000 de logiciel avec detatched des signatures.
Recherchant les trojans non découverts, erreurs, et un sentiment général de à quel point le système fonctionnait, j'ai obtenu et ai vérifié
ces archives et ai étudié le Modem que j'ai produit. 

Un peu au sujet de moi. Je suis un biochimiste de Ph.D. qui la science gauche à aller travail dans la technologie de la programmation. Il y a
des années j'ai fondé Crimelabs comme manière de partager des idées et des ressources. Mes intérêts de recherches incluent les rapports
de confiance (de ce que cette recherche est une pièce), des techniques et des outils d'analyse de code source et de logiciel, et des
mesures et exécution de réseau. Je travaille aux réseaux d'axe à Ann Arbor, MI, Etats-Unis. 

[ diapositive 2 ] 
Très brièvement, cet entretien couvrira la recherche dans rechercher des trojans sur l'Internet. Nous définissons un Trojan en tant que
binaire modifié, en utilisant le terme "Trojan Horse" pour indiquer un paquet changé qui semble normal. Je discuterai brièvement la
motivation derrière ce travail, qui m'a contraint aller rechercher la prédominance de combien ont modifié des archives là sont sur l'Internet.
Après une discussion de la façon dont j'ai fait ceci, je vous montrerai les résultats crus et puis une discussion sur cela Modem comme les
méta-données je me suis réuni dans cette recherche. Pour finir, je décrirai certains des travaux futurs dans ce secteur, derrière dont une
partie j'espère travailler. 

[ diapositive 3 ] 
L'approche de totalité à cette étude était télécharger et vérifier les archives. C'est une approche plutôt simpliste, mais il semble avoir le
travail fait et fourni quelques résultats intéressants. Brièvement, j'ai identifié les archives signées disponibles sur l'Internet, téléchargées
leur et saisies la clef correspondante de PGP, vérifiées les archives, et alors analysées les données. Des erreurs ont été naturellement
étudiées. 

[ diapositive 4 ] 
Cette étude a été motivée par la série de modifications élevées d'archives de profil qui se sont produites en 2002. Spécifiquement, après
l'OpenSSH Trojan cet été, je construis un petit outil pour vérifier des paquets et penser commencé à l'analyse large de balance qui serait
nécessaire sur l'Internet pour découvrir comment répandu ce phenonmenon est devenu. Specifcally, cette année a vu des paquets tels que
le dsniff des outils de la chanson creusée, fragrouter, et le fragroute trojanned, le code source d'OpenSSH a été modifié et injecté dans le
système de miroir, Sendmail a souffert une attaque intéressante, et le plus en retard était tcpdump et libpcap. Ceci, naturellement, prie la
question de combien de fois ceci se produit. 

En outre, le PGP a été autour pendant environ 10 années. C'est un grand succès en termes d'utilisation, et il serait intéressant de voir à
quel point l'enchaînement de la confiance fait. Le modèle de menace dans cette analyse est simple: si quelqu'un devaient changer un
progiciel, le manque de vérifier correctement par l'intermédiaire d'une signature de PGP devrait apparaître immédiatement. En outre, si une
clef fausse est injectée dans l'enchaînement de la confiance, qui devrait également être appearant de l'échec pour avoir les bons rapports
basés sur des signatures. 

[ diapositive 5 ] 
Brièvement, faire cette étude est relativement facile. J'ai identifié des archives de logiciel avec les signatures isolées en employant
Google. J'ai simplement recherché les limites communes employées pour appeler les signatures, telles que l''tar.gz.sig 'et l''tar.gz.asc '.
Ceci a établi une liste d'environ 166 serveurs uniques avec plus de 2800 archives. Après que le fiasco d'OpenSSH j'ait construit un petit
outil appelé l'extrait de ` 'qui exécute facilement automatiquement le contrôle de PGP requis pour vérifier des archives. J'ai modifié l'outil
pour les buts de cette étude de rendre compte simplement du succès ou de l'échec du procédé de vérification. J'ai alors téléchargé ces
archives identifiées (employant le wget), les ai vérifiées toutes, et les signale alors ai traité les données. 

[ diapositive 6 ] 
L'extrait est un petit emballage de manuscrit de coquille pour le goudron et le gpg. Il recherche la signature isolée pour des archives quand
vous voulez extraire un dossier. Trouvant les archives, il vérifie alors la clef sur keyring local. Si la clef existe, elle continue. Si la clef
n'existe pas, elle cherche la clef, ajoute elle à keyring, et puis des relancements. Une fois que la clef est présente, la signature si vérifié
avec la clef. Si elle vérifie l'ok, les archives sont extraites. 

L'extrait a été conçu pour être petit, efficace, et facile pour employer. Tandis que GPG a relativement un facile d'employer l'interface,
comparez: 

 extrayez openssh-2.4p1.tar.gz...  

et comment vous le feriez manuellement: 

 gnupg -- vérifiez le gnupg d'openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz -- le keyserver search.keyserver.net -- gnupg de la recv-clef (KEYID) -- vérifiez le goudron d'openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz - le zxvf openssh-2.4p1.tar.gz

Que vous assume n'ayez pas la clef localement et devez la chercher aussi bien. Je préfère vraiment des outils qui sont améliorés dans leur
utilisation, et essaye d'écrire les outils qui adaptent ce modèle. 

[ diapositive 7 ] 
L'acte réel de télécharger les archives a pris environ 3 jours sur mon modem câblé. C'était parce que le modèle que j'ai établi opéré
seulement en série. Par le saké de la comparaison je pouvais en mesure au maximum il dehors en courant 8 efforts parallèles pour le grain
de Linux mais l'accomplis durant la nuit, démontrant la différence classique de l'temps-espace. Tout le stockage exigé pour les archives
était au sujet de 9GB de disque. 

[ diapositive 8 ] 
Brièvement, voici un graphique montrant l'impact du trafic sur mon modem câblé pour un 1 jour ou ainsi période. Vous pouvez voir que le
trafic cloue vraiment l'outshone mon trafic normal, évident du côté gauche. 

[ diapositive 9 ] 
L'analyse, qui s'est produite en vrac, a commencé par un anneau principal vide de GPG. J'ai voulu voir quelles caractéristiques d'anneau
ont émergé après cette analyse. J'ai modifié le ` extract-0.1 'pour rendre compte seulement du succès ou de l'échec du procédé de
vérification, et en employant l'extrait (gpg d'excédent lui-même) je pouvais chercher des clefs comme neeeded. Le processus a été
conduit par un petit emballage de manuscrit de coquille qui trouve toutes les archives dans l'annuaire et court cet outil modifié d'extrait sur
elles. L'analyse de données a pris environ 3 ou 4 heures sur ma machine K6-2/300, que j'utilise pour la plupart de mes besoins de collecte
de données (elle trace actuellement l'Internet). Toutes les actions ont été notées, ce qui ont été alors post-traitées. 

[ diapositive 10 ] 
Les résultats globaux sont montrés dans cette glissière. Brièvement, 2804 archives ont été signées ce processus, représentant un total de
1426 archives. 166 serveurs uniques ont été téléchargés de, signifiant que beaucoup agissent en tant que serveur de miroir. Seulement 93
clefs ont été recherchées dans le processus entier, indiquant beaucoup d'auteurs ont beaucoup de dégagements. 

2799 archives étaient un succès, elles ont vérifié BIEN dans ce processus. 5, cependant, n'ont pas fait ainsi. 

[ diapositive 11 ] 
Puisque ces 5 étaient l'ensemble vraiment intéressant pour la première étape de l'analyse, j'ai dû les regarder. 

Le premier échec était dû à un téléchargement tronqué. Un mirror site m'a coupé pr3maturément et un dossier de ditribution d'OpenSSH a
été coupé sous peu, par conséquent, il n'a pas vérifié correctement. 

Les deux prochains étaient les négatifs faux. Je ne sais pas pourquoi ils ont échoué, mais ils . Le reinspection manuel a prouvé qu'ils
étaient la note CORRECTE que l'extrait de ` 'échoue à un mode de défaillance, pas un mode de PASSAGE. 

Les échecs 4 et 5 étaient des échecs légitimes. L'auteur a été contacté et les résultats ont été vérifiés. Il s'avère qu'Alex Brennan
téléchargement de nouvelles archives mais n'a pas fixé la signature. Car vous prévoiriez il a apprécié la note. 

[ diapositive 12 ] 
Quelques archives étaient un échec complet, cependant. Les paquets Cmu-SNMP ont été signés en utilisant une vieille clef. Ce vieux
format principal est incompatable avec les outils de GnuPG basés par normes courantes. Je n'ai pas contacté les auteurs, mais suis une
démonstration claire d'une panne du système. Aucune clef valide n'a été jamais trouvée. 

[ diapositive 13 ] 
Maintenant nous commençons l'analyse de metadata qui forme la majeure partie de cet article. Fondamentalement cet aperçu a découvert
quatre faiblesses dans le système signé d'archives: - distribution principale intégrée - un risque d'un compromis de la clef lui-même - peu
de signatures sur quelques clefs - et un manque de confiance dans quelques clefs 

[ diapositive 14 ] 
Par distribution principale intégrée je me réfère à l'acte de placer la clef de PGP employée pour signer les archives à côté des archives et
des signatures elles-mêmes. Le problème se situe dans la tentation pour que l'utilisateur télécharge la clef aussi bien que les archives.
Pour un attaquant, l'installation est intéressante: fondamentalement, quand vous modifiez les archives binaires, vous les signez employant
un whch principal forgé que vous téléchargez également à l'emplacement. Quand les gens téléchargent la paire d'archives et de clef, la
signature sera valide mais les archives ne seront pas. Les trompeurs notables de ceci incluent l'équipe portative d'OpenSSH, les
communications de SSH, l'équipe de Cyrus, et l'équipe de GnuPlot. 

[ diapositive 15 ] 
Après, cette étude a indiqué que quelques clefs sont en danger de compromis par un adversaire déterminé. Brièvement, dans les 93 clefs
analysées dans cette étude, les la plupart avaient 3 ou peu d'ans. Cependant, certains étaient aussi vieux que 10 ans. Tandis qu'une clef
plus ancienne a été autour plus longue et établissait plus de confiance (vous savez quoi prévoir), elle est également autour plus longue
pour un adversaire pour attaquer et le facteur. Ceci suppose également que le logiciel plus ancien et original de PGP produit des clefs
véritablement sûres. Donné combien des faiblesses ont été trouvées dans le logiciel cryptographique en 10 dernières années, c'est une
possibilité probable. En outre, les tailles des clefs se relie directement à ceci aussi bien. Une clef plus courte peut être factorisée plus
facilement. La plupart des clefs employées pour signer des archives étaient 1024-bit RSA et clefs de DSA, mais certains étaient les clefs
512-bit. C'est maintenant une taille menable pour un adversaire qui a un intérêt en factorisant n'importe quelle paire de clef de chiffrage de
RSA (voyez le livre de code de Simon Singh et le défi résultant). 

[ diapositive 16 ] 
Ces âges principaux sont montrés ici graphiquement. Vous pouvez voir que les plus sont de l'année 1999, mais beaucoup sont de avant
celui. 

[ diapositive 17 ] 
Montrée ici est la distribution des tailles principales dans le peu. Encore, les la plupart sont les clefs 1024-bit. Des clefs de RSA et de
DSA ont été groupées ensemble pour ce graphique. 

[ diapositive 18 ] 
Pour finir, quand vous corrélez l'âge des clefs et de leurs tailles, vous pouvez voir une tendance générique vers de plus grandes clefs
pendant que le logiciel devient l'appui il. Cependant, les clefs 1024-bit ont toujours été présent et seront probablement également
présentes pendant longtemps pour venir. Peut-être il est temps où nous changeons le défaut plaçant dans le gpg. 

[ diapositive 19 ] 
Les deux prochains points dans l'analyse des données recherchées dans cette étude *** TRANSLATION ENDS HERE *** focus on the
signatures on the key. The signatures form the basis of the web of trust in the PGP world. With fewer signatures on any key, it becomes
harder to verify the veracity of the key (ie does the owner really own that key? is it who you think it is?). The first set of analysis I
performed focused in the number of signatures. The average number of signatures per key was 21, while some keys had no signatures and
two had 261 signatures per key. These last two are Debian developers and heavily participate in key signing events. 

[slide 20] 
The results of that analysis are shown here in this figure. The heavy bias to the left indicates that most keys have only a handful of
signatures. Very few have no signatures, but most only have about 5 or 7 signatures. 

[slide 21] 
The next step in the analysis of the signatures on the key was to try and establish the owner of the key. This was inspired by a good set of
conversations I have had with Niels Provos. Basically, what you do here is you examine the signatures on any given key and try and trace
it back to something you know. In this case, I try and tie the keys back to the large, strong set identified by the initial analysis by the folks
at Dtype.org. This strong set is a set of keys over 100,000 strong which are a self contained unit. Every key in that set somehow
references every other key and nothing external to that set. 

Of the 93 keys analyzed here, about 2/3 could be mapped to the strong set. 36 keys failed to map back to the strong set (using the key
path server at http://the.earth.li/~noodles/pathfind.html and the data from Jason Harris at http://keyserver.kjsl.com/~jharris/ka/). By tying a
key back to the strong set we can safely assume that the owner is correctly indicated on the key. 

While this metric is considerably stronger than the mere analysis of the number of signatures on any key, it relies heavily on the motives
of any signer. Some sign only with full knowledge of the key holder and the link between that person and the key, while others sign keys
after a brief introduction. This is a classic contrast of a strong trust metric and a weak one. Using the weak links in the chain, one could
subvert the system with enough signatures on any forged key. 

[slide 22] 
The links of the keys identified in this study to the keys at the center of the strong set (indicated in blue) are shown here. For a better
graphic have a look at http://monkey.org/~jose/graphing/csw03/csw03.png . 

[slide 23] 
This foray into the web of trust and its use isn't the first of its kind, but I do think its the first to do a widespread survey of signed
archives. The `extract' tool is related to Dug Song's `gzsig' tool and Marius' PGPwrap library. Marius wrote that library after finding the
license terms of `GPGme' unacceptable (a typical monkey likes BSD code). 

The detached signatures are related to the BSD ports tree and the cryptographic checks made by the system. Briefly, any distribution file
is hashed using three cryptographic hashes (MD5, SHA-1, and RMD160) to verify the intrigity if the download. Note that this is the
system that caught most of the 2002 trojans, not the public keys system. 

[slide 24] 
So, while this study has shown that it appears that there are few widespread trojans lying in wait for people in the Internet, there are
several weaknesses in the system which can be exploited by an attacker. Ideally, I'd like to continue to perform this check on a rolling
basis. Right now I'm looking to find a research partner, I need more bandwidth and I need more storage space. 

Ideally everyone would be a part of that strong set. Right now there are a number of disconnected islands (note that I'm not in the strong
set). This would aid, hopefully, in establishing the veracity of the keys. It would also be nice to see tools incorporated into the PGP model,
such as 'extract' or `mutt-sigtrace' (http://www.chaosreigns.com/code/mutt-sigtrace/) which can aid in the checking of keys. Next, more
signed archives need to be out there, we need to know that this is what the author intended to upload. And lastly, the world needs a better
system. There are simply too many holes in the current one, I think it's time to do better. 

[slide 25] 
I really need to acknowledge several people here. Beth let me destroy our cable modem's performance for a couple of days; Marius, Dug,
Niels, Alex, and Seth all provided excellent feedback and ideas; the dtype.org people (the participants on the disucssion list, and Jason
Harris) have been great in doing their work into the web of trust metrics; the Umeet organizers, thank you for having me speak. 

And of course you, I appreciate your interest. Thanks!