Note: this is a rough writeout of what will be presented at Umeet 2002 on Sunday, 15 Dember, 2002. A partial babelfish translation into spanish and french appears below. The slides are available on my website at: http://www.monkey.org/~jose/presentations/signed-archives.d/
initial translation by babelfish, refined by Oroz. muchos gracias!
INTRODUCTION
[slide 1]
This talk will be on a survey I did earlier this fall of over 2000 software
archives with detatched signatures. Looking for undiscovered trojans,
errors, and a general feeling of how well the system was working, I obtained
and verified these archives and studied the data set I generated.
A little bit about me. I'm a Ph.D. biochemist who left science to go work in software engineering. Years ago I founded Crimelabs as a way to share ideas and resources. My research interests include trust relationships (of which this research is a part), source code and software analysis techniques and tools, and network measurements and performance. I work at Arbor Networks in Ann Arbor, MI, USA.
[slide 2]
Very briefly, this talk will cover the research into looking for trojans
on the Internet. We define a trojan as a modified binary, using the
term "Trojan horse" to indicate an altered package which appears normal.
I will briefly discuss the motivation behind this work, which compelled
me to go looking for the prevalence of how many modified archives there
are on the Internet. After a discussion of how I did this, I'll show you the
raw results and then a discussion on that data set as well as the meta
data I gathered in this research. Lastly, I'll describe some of the future
work in this area, some of which I am hoping to work on.
[slide 3]
The whole approach to this study was to download and verify the archives.
This is a rather simplistic approach, but it seems to have gotten the job
done and provided some interesting results. Briefly, I identified signed
archives available on the Internet, downloaded them and grabbed the
corresponding PGP key, verified the archive, and then analyzed the data.
Errors were, of course, investigated.
[slide 4]
This study was motivated by the series of high profile archive modifications
which occured in 2002. Specifically, after the OpenSSH trojan this summer, I
build a small tool to verify packages and started thinking about the wide
scale analysis which would be needed on the Internet to discover how widespread
this phenonmenon has become. Specifcally, this year saw packages such as
Dug Song's tools dsniff, fragrouter, and fragroute trojanned, the OpenSSH
source code was modified and injected into the mirror system, Sendmail suffered
an interesting attack, and the latest was tcpdump and libpcap. This, of
course, begs the question of how often this is happening.
Also, PGP has been around for about 10 years. It's a big success in terms of use, and it would be interesting to see how well the web of trust is doing. The threat model in this analysis is simple: if someone were to alter a software package, the failure to verify correctly via a PGP signature should appear immediately. Furthermore, if a false key is injected into the web of trust, that should also be appearant from the failure to have the right relationships based on signatures.
[slide 5]
Briefly, doing this study is relatively easy. I identified software archives
with detached signatures by using Google. I simply looked for the common
terms used to name the signatures, such as '.tar.gz.sig' and '.tar.gz.asc'.
This built a list of about 166 unique servers with over 2800 archives.
After the OpenSSH fiasco I built a small tool called `extract' which easily
automatically performs the PGP check needed to verify an archive. I modified
the tool for the purposes of this study to simply report on the success or
failure of the verification process. I then downloaded these identified archives
(using wget), checked them all, and then post processed the data.
[slide 6]
Extract is a small shell script wrapper for tar and gpg. It looks for the
detached signature for an archive when you want to extract a file. Finding
the archive, it then checks for the key on the local keyring. If the
key exists, it continues. If the key doesn't exist, it fetches the key,
adds it to the keyring, and then restarts. Once the key is present, the
signature if verified with the key. If it checks out ok, the archive is
extracted.
Extract was designed to be small, efficient, and easy to use. While GPG has a relatively easy to use interface, compare:
extract openssh-2.4p1.tar.gz ...and how you would do it manually:
gnupg --verify openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz gnupg --keyserver search.keyserver.net --recv-key (KEYID) gnupg --verify openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz tar -zxvf openssh-2.4p1.tar.gzThat assumes you don't have the key locally and must fetch it as well. I really prefer tools that are streamlined in their use, and attempt to write tools that fit that model.
[slide 7]
The actual act of downloading the archives took about 3 days on my cable
modem. This was because the model I built operated only serially. By
comparison's sake I was able to max it out by running 8 parallel fetches
for the Linux kernel but complete it overnight, demonstrating the classic
time-space tradeoff. The total storage required for the archives was about
9GB of disk.
[slide 8]
Briefly, here's a graph showing the impact of the traffic on my cable
modem for a 1 day or so period. You can see that the traffic spikes really
outshone my normal traffic, visible on the left.
[slide 9]
The analysis, which occured in bulk, started with an empty GPG key ring. I
wanted to see what ring characteristics emerged after this analysis. I
modified `extract-0.1' to only report on the success or failure of the
verification process, and in using extract (over gpg itself) I was able to
fetch keys as neeeded. The process was driven by a small shell script wrapper
which finds all of the archives in the directory and runs this modified
extract tool on them. Data analysis took about 3 or 4 hours on my K6-2/300
machine, which I use for most of my data collection needs (it's currently
mapping the Internet). All of the actions were logged, which was then
postprocessed.
[slide 10]
The overall results are shown in this slide. Briefly, 2804 archives were
checked in this process, representing a total of 1426 archives. 166 unique
servers were downloaded from, meaning that many act as a mirror server.
Only 93 keys were retrieved in the whole process, indicating many authors
have many releases.
2799 archives were a success, they verified OK in this process. 5, however, failed to do so.
[slide 11]
Since these 5 were the really Interesting set for the first stage of analysis,
I had to look at them.
The first failure was due to a truncated download. A mirror site cut me off prematurely and an OpenSSH ditribution file was cut short, Hence, it didn't verify correctly.
The next two were false negatives. I don't know why they failed, but they did. Manual reinspection showed that they were OK. Note that `extract' fails to a FAILURE mode, not a PASS mode.
Failures 4 and 5 were legitimate failures. The author was contacted and the results were verified. It turns out that Alex Brennan uploaded a new archive but didn't fix the signature. As you would expect he appreciated the note.
[slide 12]
Some archives were a complete failure, however. The CMU-SNMP packages were
signed using an old key. This old key format is incompatable with current
standards based GnuPG tools. I haven't contacted the authors, but is a
clear demonstration of a breakdown of the system. No valid keys were ever
found.
[slide 13]
Now we begin the metadata analysis which forms the bulk of this paper.
Basically this survey uncovered four weaknesses in the signed archive
system:
- inline key distribution
- a risk of a compromise of the key itself
- few signatures on some keys
- and a lack of trust in some keys
[slide 14]
By inline key distribution I refer to the act of placing the PGP key used
to sign the archives alongside the archives and signatures themselves. The
problem lies in the temptation for the user to download the key as well as
the archive. For an attacker, the setup is interesting: basically, when you
modify the binary archive, you sign it using a forged key which you also
upload to the site. When people download the archive and key pair, the
signature will be valid but the archive will not be. Notable abusers of
this include the OpenSSH portable team, SSH communications, the Cyrus team,
and the GnuPlot team.
[slide 15]
Next, this study revealed that some keys are at risk of compromise by a
determined adversary. Briefly, in the 93 keys analyzed in this study, most
were 3 or fewer years old. However, some were as old as 10 years. While
an older key has been around longer and established more trust (you know
what to expect), it is also around longer for an adversary to attack and
factor. This also assumes that the older, original PGP software generates
truly safe keys. Given how many weaknesses have been found in cryptographic
software in the past 10 years, this is a likely possibility.
Also, the sizes of the keys relates directly to this as well. A shorter key
can be factored more easily. Most of the keys used to sign archives were
1024-bit RSA and DSA keys, but some were 512-bit keys. This is now a
tractable size for an adversary who has an interest in factoring any RSA
encryption key pair (see Simon Singh's Code Book and the resulting challenge).
[slide 16]
These key ages are shown here graphically. You can see that most are from
the year 1999, but many are from prior to that.
[slide 17]
Shown here is the distribution of the key sizes in bits. Again, most are
1024-bit keys. Both RSA and DSA keys were grouped together for this graph.
[slide 18]
Lastly, when you correlate the age of keys and their sizes, you can see a
generic trend towards larger keys as the software grows to support it.
However, 1024-bit keys have always been present and will probably also be
present for a long time to come. Perhaps it's time we change the default
setting in gpg.
[slide 19]
The next two points in the analysis of the data retrieved in this study
focus on the signatures on the key. The signatures form the basis of the
web of trust in the PGP world. With fewer signatures on any key, it
becomes harder to verify the veracity of the key (ie does the owner really
own that key? is it who you think it is?).
The first set of analysis I performed focused in the number of signatures.
The average number of signatures per key was 21, while some keys had
no signatures and two had 261 signatures per key. These last two are
Debian developers and heavily participate in key signing events.
[slide 20]
The results of that analysis are shown here in this figure. The heavy
bias to the left indicates that most keys have only a handful of
signatures. Very few have no signatures, but most only have about 5
or 7 signatures.
[slide 21]
The next step in the analysis of the signatures on the key was to try and
establish the owner of the key. This was inspired by a good set of
conversations I have had with Niels Provos. Basically, what you do here
is you examine the signatures on any given key and try and trace it back
to something you know. In this case, I try and tie the keys back to the
large, strong set identified by the initial analysis by the folks at Dtype.org.
This strong set is a set of keys over 100,000 strong which are a self
contained unit. Every key in that set somehow references every other key
and nothing external to that set.
Of the 93 keys analyzed here, about 2/3 could be mapped to the strong set. 36 keys failed to map back to the strong set (using the key path server at http://the.earth.li/~noodles/pathfind.html and the data from Jason Harris at http://keyserver.kjsl.com/~jharris/ka/). By tying a key back to the strong set we can safely assume that the owner is correctly indicated on the key.
While this metric is considerably stronger than the mere analysis of the number of signatures on any key, it relies heavily on the motives of any signer. Some sign only with full knowledge of the key holder and the link between that person and the key, while others sign keys after a brief introduction. This is a classic contrast of a strong trust metric and a weak one. Using the weak links in the chain, one could subvert the system with enough signatures on any forged key.
[slide 22]
The links of the keys identified in this study to the keys at the center
of the strong set (indicated in blue) are shown here. For a better
graphic have a look at http://monkey.org/~jose/graphing/csw03/csw03.png .
[slide 23]
This foray into the web of trust and its use isn't the first of its kind,
but I do think its the first to do a widespread survey of signed archives.
The `extract' tool is related to Dug Song's `gzsig' tool and Marius'
PGPwrap library. Marius wrote that library after finding the license terms
of `GPGme' unacceptable (a typical monkey likes BSD code).
The detached signatures are related to the BSD ports tree and the cryptographic checks made by the system. Briefly, any distribution file is hashed using three cryptographic hashes (MD5, SHA-1, and RMD160) to verify the intrigity if the download. Note that this is the system that caught most of the 2002 trojans, not the public keys system.
[slide 24]
So, while this study has shown that it appears that there are few widespread
trojans lying in wait for people in the Internet, there are several weaknesses
in the system which can be exploited by an attacker. Ideally, I'd like to
continue to perform this check on a rolling basis. Right now I'm looking to
find a research partner, I need more bandwidth and I need more storage
space.
Ideally everyone would be a part of that strong set. Right now there are a number of disconnected islands (note that I'm not in the strong set). This would aid, hopefully, in establishing the veracity of the keys. It would also be nice to see tools incorporated into the PGP model, such as 'extract' or `mutt-sigtrace' (http://www.chaosreigns.com/code/mutt-sigtrace/) which can aid in the checking of keys. Next, more signed archives need to be out there, we need to know that this is what the author intended to upload. And lastly, the world needs a better system. There are simply too many holes in the current one, I think it's time to do better.
[slide 25]
I really need to acknowledge several people here. Beth let me destroy our
cable modem's performance for a couple of days; Marius, Dug, Niels, Alex,
and Seth all provided excellent feedback and ideas; the dtype.org people
(the participants on the disucssion list, and Jason Harris) have been
great in doing their work into the web of trust metrics; the Umeet
organizers, thank you for having me speak.
And of course you, I appreciate your interest. Thanks!
archivos firmados: una evaluación de la confianza del Internet Nota: éste es un esbozo de lo qué será presentado en Umeet 2002 el domingo, 15 Dember, 2002. Una traducción del babelfish al español y francés aparece abajo. Las diapositivas están disponibles en mi website en: http://www.monkey.org/~jose/presentations/signed-archives.d / INTRODUCCIÓN [ diapositiva 1 ] Esta charla e sobre un examen que hice este otoño sobre de 2000 archivos de software con firmas. Buscando trojanos sin descubrir, errores, y una sensación general de cómo esta' bien el sistema trabajaba, obtuve y verifiqué estos archivos y estudié los datos que generé. Un poco sobre mí. Soy un bioquímico de Ph.D. que dejo la ciencia para ir a trabajo en ingeniería del software. Hace años fundé Crimelabs como manera de compartir ideas y recursos. Mis intereses de la investigación incluyen las relaciones de la confianza (de cuáles es una pieza esta investigación), las técnicas y las herramientas del análisis del código de fuente y del software, y las medidas y funcionamiento de la red. Trabajo en Arbor Networks in Ann Arbor, MI, USA.. [ diapositiva 2 ] Muy brevemente, esta charla cubrirá la investigación en buscar trojanos sobre el Internet. Definimos un Trojan como binario modificada, usando el término "Trojan Horse" para indicar un paquete alterado que aparezca normal. Discutiré brevemente la motivación detrás de este trabajo, que me obligó a que fuera a buscar el predominio de cuántos modificaron archivos allí están en el Internet. Después de una discusión de cómo hice esto, le demostraré los resultados brutos y entonces una discusión sobre eso modem tan bien como los meta datos recolecté en esta investigación. Pasado, describiré algo del trabajo futuro en esta área, algo en de la cual estoy esperando trabajar. [ diapositiva 3 ] El acercamiento del conjunto a este estudio era descargar y verificar los archivos. Esto es un acercamiento algo simplista, pero se parece haber conseguido el trabajo hecho y proporcionado algunos resultados interesantes. Brevemente, identifiqué los archivos firmados disponibles en el Internet, descargados les y asidos la llave correspondiente del PGP, verificados el archivo, y después analizados los datos. Los errores, por supuesto, fueron investigados. [ diapositiva 4 ] Este estudio fue motivado por la serie de altas modificaciones del archivo del perfil que ocurrieron en 2002. Específicamente, después del OpenSSH Trojan este verano, construyo una herramienta pequeña para verificar los paquetes y el pensamiento comenzado del análisis amplio de la escala que sería necesario en el Internet descubrir cómo es extenso se ha convertido este fenómeno. Especificamente, este año vio los paquetes tales como dsniff de las herramientas de Dug Song's, fragrouter, y el fragroute trojanned, el código de fuente de OpenSSH fue modificado e inyectado en el sistema del espejo, Sendmail sufrió un ataque interesante, y el más último era tcpdump y libpcap. Esto, por supuesto, pide la cuestión de cómo está sucediendo a menudo éste. También, el PGP ha estado alrededor por cerca de 10 años. Es un éxito grande en términos del uso, y sería interesante ver cómo esta' bien la tela de la confianza está haciendo. El modelo de la amenaza en este análisis es simple: si alguien alterara una paquete de software, la falta de verificar correctamente vía una firma del PGP debe aparecer inmediatamente. Además, si una llave falsa se inyecta en la tela de la confianza, que debe también ser appearant de la falta tener las relaciones derechas basadas en firmas. [ diapositiva 5 ] Brevemente, hacer este estudio es relativamente fácil. Identifiqué archivos del software con las firmas separadas usando Google. Busqué simplemente los términos comunes usados para nombrar las firmas, tales como ' tar.gz.sig ' y ' tar.gz.asc '. Esto construyó una lista de cerca de 166 servidores únicos con sobre 2800 archivos. Después de que el fiasco de OpenSSH yo construyera una herramienta pequeña llamada el extracto del ` ' que realiza fácilmente automáticamente el cheque del PGP necesitado para verificar un archivo. Modifiqué la herramienta para los propósitos de este estudio de divulgar simplemente sobre el éxito o la falta del proceso de la verificación. Después descargué estos archivos identificados (que usan el wget), los comprobé todos, y después procesé los datos. [ diapositiva 6 ] El extracto es una envoltura pequeña del shell script para tar y gpg. Busca la firma separada para un archivo cuando usted desea extraer ese archivo. Encontrando el archivo, entonces comprueba para saber si hay la llave en keyring local. Si existe la llave, continúa. Si no existe la llave, trae la llave, agrega la al keyring, y entonces recomenzar. Una vez que la llave esté presente, la firma si está verificado con la llave. Si comprueba fuera de la autorización, se extrae el archivo. El extracto fue diseñado para ser pequeño, eficiente, y fácil utilizar. Mientras que GPG tiene un relativamente fácil utilizar el interfaz, compare: extraiga openssh-2.4p1.tar.gz... y cómo usted lo haría manualmente: gnupg --verify openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz gnupg --keyserver search.keyserver.net --recv-key (KEYID) gnupg --verify openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz tar -zxvf openssh-2.4p1.tar.gz Que le asume no tenga la llave localmente y debe traerla también. Realmente prefiero las herramientas que se aerodinamizan en su uso, y procuro escribir las herramientas que caben ese modelo. [ diapositiva 7 ] El acto real de descargar los archivos tomó cerca de 3 días en mi módem de cable. Esto era porque el modelo que construí funcionado solamente en serie. Por el motivo de la comparación podía al máximo él hacia fuera funcionando 8 alcances paralelos para el núcleo de Linux pero lo termino durante la noche, demostrando el tradeoff clásico del tiempo-espacio. El almacenaje total requerido para los archivos estaba sobre 9GB del disco. [ diapositiva 8 ] Brevemente, aquí está un gráfico que demuestra el impacto del tráfico en mi módem de cable para un 1 día o tan período. Usted puede ver que el tráfico clava realmente outshone mi tráfico normal, visible a la izquierda. [ diapositiva 9 ] El análisis, que ocurrió en bulto, comenzó con un anillo dominante vacío de GPG. Deseé ver qué características del anillo emergieron después de este análisis. Modifiqué el ` extract-0.1 ' para divulgar solamente sobre el éxito o el fallo del proceso de la verificación, y al usar el extracto (con gpg) era capaz de traer llaves según se necesitaban. El proceso fue conducido por una envoltura pequeña del shell script que encuentra todos los archivos en el directorio y corre esta herramienta modificada del extracto en ellos. El análisis de datos tomó cerca de 3 o 4 horas en mi máquina K6-2/300, que utilizo para la mayoría de mis necesidades de la colección de datos (está actualmente "mapeando" el Internet). Todas las acciones fueron registradas, que entonces fueron postprocesadas. [ diapositiva 10 ] Los resultados totales se demuestran en esta diapositiva. Brevemente, 2804 archivos fueron llegados este proceso, representando un total de 1426 archivos. 166 servidores únicos fueron descargados de, significando que muchos actúan como servidor del espejo. Solamente 93 llaves fueron recuperadas en el proceso entero, indicando a muchos autores tienen muchas revisiones (releases). 2799 archivos eran un éxito, ellos verificaron MUY BIEN en este proceso. 5, sin embargo, fallaron. [ diapositiva 11 ] Puesto que estos 5 eran el sistema realmente interesante para la primera etapa del análisis, tuve que mirarlos. La primera falta era debido a una transferencia directa truncada. Un mirror site me cortó prematuramente y un archivo del ditribution de OpenSSH fue cortado brevemente, por lo tanto, no verificó correctamente. Los dos siguientes eran negativas falsas. No sé porqué fallaron, pero lo hicieron. La reinspección manual demostró que eran la nota ACEPTABLE que el extracto del ` ' falla a un modo de FALLO, no un modo del PASO. Las faltas 4 y 5 eran faltas legítimas. Entraron en contacto con al autor y los resultados fueron verificados. Resulta que Alex Brennan uploaded un archivo nuevo pero no fijó la firma. Como puede esperar él apreció la nota. [ diapositiva 12 ] Algunos archivos eran una falta completa, sin embargo. Los paquetes Cmu-SNMP fueron firmados usando una vieja llave. Este viejo formato dominante es incompatable con las herramientas basadas los estándares actuales de GnuPG. No he entrado en contacto con a autores, sino soy una demostración clara de un fallo del sistema. No se encontró ninguna llaves válidas. [ diapositiva 13 ] Ahora comenzamos el análisis del metadata que forma el bulto de este estudio. Este examen destapó básicamente cuatro debilidades en el sistema firmado del archivo: - distribución dominante en línea - un riesgo de un compromiso de la llave sí mismo - pocas firmas en algunas llaves - y una carencia de la confianza en algunas llaves [ diapositiva 14 ] Por la distribución dominante en línea refiero al acto de poner la llave del PGP usada para firmar los archivos junto a los archivos y a las firmas ellos mismos. El problema está en la tentación para que el usuario descargue la llave así como el archivo. Para un atacante, la disposición es interesante: básicamente, cuando usted modifica el archivo binario, usted lo firma con una clave "forged" usted también ha subido al sitio. Cuando la gente descarga el par archivo y llave, la firma será válida pero el archivo no será. Los abusadores notables de esto incluyen al equipo portable de OpenSSH, a las comunicaciones de SSH, al equipo de Cyrus, y a equipo de GnuPlot. [ diapositiva 15 ] Después, este estudio reveló que algunas llaves están en el riesgo del compromiso de un adversario resuelto. Brevemente, en las 93 llaves analizadas en este estudio, la mayoría tenían 3 o menos años. Sin embargo, algunos eran tan viejos como 10 años. Mientras que una más vieja llave ha sido alrededor más larga y estableció más confianza (usted sabe qué esperar), es también alrededor más larga para un adversario atacar y el factor. Esto también asume que el más viejo, original software del PGP genera llaves verdaderamente seguras. Dado cuántos se han encontrado las debilidades en software criptográfico en los últimos 10 años, esto es una posibilidad probable. También, los tamaños de las llaves se relacionan directamente con esto también. Una llave más corta se puede descomponer en factores más fácilmente. La mayoría de las llaves usadas para firmar archivos eran 1024-bit RSA y llaves del DSA, pero algunas eran las llaves 512-bit. Esto ahora es un tamaño manejable para un adversario que tenga un interés en descomponer en factores cualquier par de la llave del cifrado de RSA (véase el libro del código de Simon Singh y el desafío que resulta). [ diapositiva 16 ] Estas edades dominantes se demuestran aquí gráficamente. Usted puede ver que la mayoría es a partir del año 1999, pero muchos son antes de ése. [ diapositiva 17 ] Se demuestra aquí la distribución de los tamaños dominantes en pedacitos. Una vez más la mayoría son las llaves 1024-bit. Las llaves de RSA y del DSA fueron agrupadas juntas para este gráfico. [ diapositiva 18 ] Pasado, cuando usted correlaciona la edad de llaves y de sus tamaños, usted puede ver una tendencia genérica hacia llaves más grandes mientras que el software crece en su ayuda. Sin embargo, las llaves 1024-bit han estado siempre presente y probablemente también estarán presentes durante mucho tiempo venir. Quizás es tiempo de que cambiemos el valor por defecto que fija en gpg. [ diapositiva 19 ] Los dos puntos siguientes en el análisis de los datos recuperados en este estudio *** TRANSLATION ENDS HERE *** focus on the signatures on the key. The signatures form the basis of the web of trust in the PGP world. With fewer signatures on any key, it becomes harder to verify the veracity of the key (ie does the owner really own that key? is it who you think it is?). The first set of analysis I performed focused in the number of signatures. The average number of signatures per key was 21, while some keys had no signatures and two had 261 signatures per key. These last two are Debian developers and heavily participate in key signing events. [slide 20] The results of that analysis are shown here in this figure. The heavy bias to the left indicates that most keys have only a handful of signatures. Very few have no signatures, but most only have about 5 or 7 signatures. [slide 21] The next step in the analysis of the signatures on the key was to try and establish the owner of the key. This was inspired by a good set of conversations I have had with Niels Provos. Basically, what you do here is you examine the signatures on any given key and try and trace it back to something you know. In this case, I try and tie the keys back to the large, strong set identified by the initial analysis by the folks at Dtype.org. This strong set is a set of keys over 100,000 strong which are a self contained unit. Every key in that set somehow references every other key and nothing external to that set. Of the 93 keys analyzed here, about 2/3 could be mapped to the strong set. 36 keys failed to map back to the strong set (using the key path server at http://the.earth.li/~noodles/pathfind.html and the data from Jason Harris at http://keyserver.kjsl.com/~jharris/ka/). By tying a key back to the strong set we can safely assume that the owner is correctly indicated on the key. While this metric is considerably stronger than the mere analysis of the number of signatures on any key, it relies heavily on the motives of any signer. Some sign only with full knowledge of the key holder and the link between that person and the key, while others sign keys after a brief introduction. This is a classic contrast of a strong trust metric and a weak one. Using the weak links in the chain, one could subvert the system with enough signatures on any forged key. [slide 22] The links of the keys identified in this study to the keys at the center of the strong set (indicated in blue) are shown here. For a better graphic have a look at http://monkey.org/~jose/graphing/csw03/csw03.png . [slide 23] This foray into the web of trust and its use isn't the first of its kind, but I do think its the first to do a widespread survey of signed archives. The `extract' tool is related to Dug Song's `gzsig' tool and Marius' PGPwrap library. Marius wrote that library after finding the license terms of `GPGme' unacceptable (a typical monkey likes BSD code). The detached signatures are related to the BSD ports tree and the cryptographic checks made by the system. Briefly, any distribution file is hashed using three cryptographic hashes (MD5, SHA-1, and RMD160) to verify the intrigity if the download. Note that this is the system that caught most of the 2002 trojans, not the public keys system. [slide 24] So, while this study has shown that it appears that there are few widespread trojans lying in wait for people in the Internet, there are several weaknesses in the system which can be exploited by an attacker. Ideally, I'd like to continue to perform this check on a rolling basis. Right now I'm looking to find a research partner, I need more bandwidth and I need more storage space. Ideally everyone would be a part of that strong set. Right now there are a number of disconnected islands (note that I'm not in the strong set). This would aid, hopefully, in establishing the veracity of the keys. It would also be nice to see tools incorporated into the PGP model, such as 'extract' or `mutt-sigtrace' (http://www.chaosreigns.com/code/mutt-sigtrace/) which can aid in the checking of keys. Next, more signed archives need to be out there, we need to know that this is what the author intended to upload. And lastly, the world needs a better system. There are simply too many holes in the current one, I think it's time to do better. [slide 25] I really need to acknowledge several people here. Beth let me destroy our cable modem's performance for a couple of days; Marius, Dug, Niels, Alex, and Seth all provided excellent feedback and ideas; the dtype.org people (the participants on the disucssion list, and Jason Harris) have been great in doing their work into the web of trust metrics; the Umeet organizers, thank you for having me speak. And of course you, I appreciate your interest. Thanks!
archives signées: une évaluation de confiance d'Internet Note: c'est un writeout approximatif ce qui sera présenté chez Umeet 2002 dimanche, de 15 Dember, 2002. Une traduction de babelfish en espagnol et le Français apparaît ci-dessous. Les glissières sont disponibles sur mon website à: http://www.monkey.org/~jose/presentations/signed-archives.d / INTRODUCTION [ diapositive 1 ] Cet entretien sera sur un aperçu que j'ai fait plus tôt cette chute de plus de les archives 2000 de logiciel avec detatched des signatures. Recherchant les trojans non découverts, erreurs, et un sentiment général de à quel point le système fonctionnait, j'ai obtenu et ai vérifié ces archives et ai étudié le Modem que j'ai produit. Un peu au sujet de moi. Je suis un biochimiste de Ph.D. qui la science gauche à aller travail dans la technologie de la programmation. Il y a des années j'ai fondé Crimelabs comme manière de partager des idées et des ressources. Mes intérêts de recherches incluent les rapports de confiance (de ce que cette recherche est une pièce), des techniques et des outils d'analyse de code source et de logiciel, et des mesures et exécution de réseau. Je travaille aux réseaux d'axe à Ann Arbor, MI, Etats-Unis. [ diapositive 2 ] Très brièvement, cet entretien couvrira la recherche dans rechercher des trojans sur l'Internet. Nous définissons un Trojan en tant que binaire modifié, en utilisant le terme "Trojan Horse" pour indiquer un paquet changé qui semble normal. Je discuterai brièvement la motivation derrière ce travail, qui m'a contraint aller rechercher la prédominance de combien ont modifié des archives là sont sur l'Internet. Après une discussion de la façon dont j'ai fait ceci, je vous montrerai les résultats crus et puis une discussion sur cela Modem comme les méta-données je me suis réuni dans cette recherche. Pour finir, je décrirai certains des travaux futurs dans ce secteur, derrière dont une partie j'espère travailler. [ diapositive 3 ] L'approche de totalité à cette étude était télécharger et vérifier les archives. C'est une approche plutôt simpliste, mais il semble avoir le travail fait et fourni quelques résultats intéressants. Brièvement, j'ai identifié les archives signées disponibles sur l'Internet, téléchargées leur et saisies la clef correspondante de PGP, vérifiées les archives, et alors analysées les données. Des erreurs ont été naturellement étudiées. [ diapositive 4 ] Cette étude a été motivée par la série de modifications élevées d'archives de profil qui se sont produites en 2002. Spécifiquement, après l'OpenSSH Trojan cet été, je construis un petit outil pour vérifier des paquets et penser commencé à l'analyse large de balance qui serait nécessaire sur l'Internet pour découvrir comment répandu ce phenonmenon est devenu. Specifcally, cette année a vu des paquets tels que le dsniff des outils de la chanson creusée, fragrouter, et le fragroute trojanned, le code source d'OpenSSH a été modifié et injecté dans le système de miroir, Sendmail a souffert une attaque intéressante, et le plus en retard était tcpdump et libpcap. Ceci, naturellement, prie la question de combien de fois ceci se produit. En outre, le PGP a été autour pendant environ 10 années. C'est un grand succès en termes d'utilisation, et il serait intéressant de voir à quel point l'enchaînement de la confiance fait. Le modèle de menace dans cette analyse est simple: si quelqu'un devaient changer un progiciel, le manque de vérifier correctement par l'intermédiaire d'une signature de PGP devrait apparaître immédiatement. En outre, si une clef fausse est injectée dans l'enchaînement de la confiance, qui devrait également être appearant de l'échec pour avoir les bons rapports basés sur des signatures. [ diapositive 5 ] Brièvement, faire cette étude est relativement facile. J'ai identifié des archives de logiciel avec les signatures isolées en employant Google. J'ai simplement recherché les limites communes employées pour appeler les signatures, telles que l''tar.gz.sig 'et l''tar.gz.asc '. Ceci a établi une liste d'environ 166 serveurs uniques avec plus de 2800 archives. Après que le fiasco d'OpenSSH j'ait construit un petit outil appelé l'extrait de ` 'qui exécute facilement automatiquement le contrôle de PGP requis pour vérifier des archives. J'ai modifié l'outil pour les buts de cette étude de rendre compte simplement du succès ou de l'échec du procédé de vérification. J'ai alors téléchargé ces archives identifiées (employant le wget), les ai vérifiées toutes, et les signale alors ai traité les données. [ diapositive 6 ] L'extrait est un petit emballage de manuscrit de coquille pour le goudron et le gpg. Il recherche la signature isolée pour des archives quand vous voulez extraire un dossier. Trouvant les archives, il vérifie alors la clef sur keyring local. Si la clef existe, elle continue. Si la clef n'existe pas, elle cherche la clef, ajoute elle à keyring, et puis des relancements. Une fois que la clef est présente, la signature si vérifié avec la clef. Si elle vérifie l'ok, les archives sont extraites. L'extrait a été conçu pour être petit, efficace, et facile pour employer. Tandis que GPG a relativement un facile d'employer l'interface, comparez: extrayez openssh-2.4p1.tar.gz... et comment vous le feriez manuellement: gnupg -- vérifiez le gnupg d'openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz -- le keyserver search.keyserver.net -- gnupg de la recv-clef (KEYID) -- vérifiez le goudron d'openssh-2.4p1.tar.gz.sig openssh-2.4p1.tar.gz - le zxvf openssh-2.4p1.tar.gz Que vous assume n'ayez pas la clef localement et devez la chercher aussi bien. Je préfère vraiment des outils qui sont améliorés dans leur utilisation, et essaye d'écrire les outils qui adaptent ce modèle. [ diapositive 7 ] L'acte réel de télécharger les archives a pris environ 3 jours sur mon modem câblé. C'était parce que le modèle que j'ai établi opéré seulement en série. Par le saké de la comparaison je pouvais en mesure au maximum il dehors en courant 8 efforts parallèles pour le grain de Linux mais l'accomplis durant la nuit, démontrant la différence classique de l'temps-espace. Tout le stockage exigé pour les archives était au sujet de 9GB de disque. [ diapositive 8 ] Brièvement, voici un graphique montrant l'impact du trafic sur mon modem câblé pour un 1 jour ou ainsi période. Vous pouvez voir que le trafic cloue vraiment l'outshone mon trafic normal, évident du côté gauche. [ diapositive 9 ] L'analyse, qui s'est produite en vrac, a commencé par un anneau principal vide de GPG. J'ai voulu voir quelles caractéristiques d'anneau ont émergé après cette analyse. J'ai modifié le ` extract-0.1 'pour rendre compte seulement du succès ou de l'échec du procédé de vérification, et en employant l'extrait (gpg d'excédent lui-même) je pouvais chercher des clefs comme neeeded. Le processus a été conduit par un petit emballage de manuscrit de coquille qui trouve toutes les archives dans l'annuaire et court cet outil modifié d'extrait sur elles. L'analyse de données a pris environ 3 ou 4 heures sur ma machine K6-2/300, que j'utilise pour la plupart de mes besoins de collecte de données (elle trace actuellement l'Internet). Toutes les actions ont été notées, ce qui ont été alors post-traitées. [ diapositive 10 ] Les résultats globaux sont montrés dans cette glissière. Brièvement, 2804 archives ont été signées ce processus, représentant un total de 1426 archives. 166 serveurs uniques ont été téléchargés de, signifiant que beaucoup agissent en tant que serveur de miroir. Seulement 93 clefs ont été recherchées dans le processus entier, indiquant beaucoup d'auteurs ont beaucoup de dégagements. 2799 archives étaient un succès, elles ont vérifié BIEN dans ce processus. 5, cependant, n'ont pas fait ainsi. [ diapositive 11 ] Puisque ces 5 étaient l'ensemble vraiment intéressant pour la première étape de l'analyse, j'ai dû les regarder. Le premier échec était dû à un téléchargement tronqué. Un mirror site m'a coupé pr3maturément et un dossier de ditribution d'OpenSSH a été coupé sous peu, par conséquent, il n'a pas vérifié correctement. Les deux prochains étaient les négatifs faux. Je ne sais pas pourquoi ils ont échoué, mais ils . Le reinspection manuel a prouvé qu'ils étaient la note CORRECTE que l'extrait de ` 'échoue à un mode de défaillance, pas un mode de PASSAGE. Les échecs 4 et 5 étaient des échecs légitimes. L'auteur a été contacté et les résultats ont été vérifiés. Il s'avère qu'Alex Brennan téléchargement de nouvelles archives mais n'a pas fixé la signature. Car vous prévoiriez il a apprécié la note. [ diapositive 12 ] Quelques archives étaient un échec complet, cependant. Les paquets Cmu-SNMP ont été signés en utilisant une vieille clef. Ce vieux format principal est incompatable avec les outils de GnuPG basés par normes courantes. Je n'ai pas contacté les auteurs, mais suis une démonstration claire d'une panne du système. Aucune clef valide n'a été jamais trouvée. [ diapositive 13 ] Maintenant nous commençons l'analyse de metadata qui forme la majeure partie de cet article. Fondamentalement cet aperçu a découvert quatre faiblesses dans le système signé d'archives: - distribution principale intégrée - un risque d'un compromis de la clef lui-même - peu de signatures sur quelques clefs - et un manque de confiance dans quelques clefs [ diapositive 14 ] Par distribution principale intégrée je me réfère à l'acte de placer la clef de PGP employée pour signer les archives à côté des archives et des signatures elles-mêmes. Le problème se situe dans la tentation pour que l'utilisateur télécharge la clef aussi bien que les archives. Pour un attaquant, l'installation est intéressante: fondamentalement, quand vous modifiez les archives binaires, vous les signez employant un whch principal forgé que vous téléchargez également à l'emplacement. Quand les gens téléchargent la paire d'archives et de clef, la signature sera valide mais les archives ne seront pas. Les trompeurs notables de ceci incluent l'équipe portative d'OpenSSH, les communications de SSH, l'équipe de Cyrus, et l'équipe de GnuPlot. [ diapositive 15 ] Après, cette étude a indiqué que quelques clefs sont en danger de compromis par un adversaire déterminé. Brièvement, dans les 93 clefs analysées dans cette étude, les la plupart avaient 3 ou peu d'ans. Cependant, certains étaient aussi vieux que 10 ans. Tandis qu'une clef plus ancienne a été autour plus longue et établissait plus de confiance (vous savez quoi prévoir), elle est également autour plus longue pour un adversaire pour attaquer et le facteur. Ceci suppose également que le logiciel plus ancien et original de PGP produit des clefs véritablement sûres. Donné combien des faiblesses ont été trouvées dans le logiciel cryptographique en 10 dernières années, c'est une possibilité probable. En outre, les tailles des clefs se relie directement à ceci aussi bien. Une clef plus courte peut être factorisée plus facilement. La plupart des clefs employées pour signer des archives étaient 1024-bit RSA et clefs de DSA, mais certains étaient les clefs 512-bit. C'est maintenant une taille menable pour un adversaire qui a un intérêt en factorisant n'importe quelle paire de clef de chiffrage de RSA (voyez le livre de code de Simon Singh et le défi résultant). [ diapositive 16 ] Ces âges principaux sont montrés ici graphiquement. Vous pouvez voir que les plus sont de l'année 1999, mais beaucoup sont de avant celui. [ diapositive 17 ] Montrée ici est la distribution des tailles principales dans le peu. Encore, les la plupart sont les clefs 1024-bit. Des clefs de RSA et de DSA ont été groupées ensemble pour ce graphique. [ diapositive 18 ] Pour finir, quand vous corrélez l'âge des clefs et de leurs tailles, vous pouvez voir une tendance générique vers de plus grandes clefs pendant que le logiciel devient l'appui il. Cependant, les clefs 1024-bit ont toujours été présent et seront probablement également présentes pendant longtemps pour venir. Peut-être il est temps où nous changeons le défaut plaçant dans le gpg. [ diapositive 19 ] Les deux prochains points dans l'analyse des données recherchées dans cette étude *** TRANSLATION ENDS HERE *** focus on the signatures on the key. The signatures form the basis of the web of trust in the PGP world. With fewer signatures on any key, it becomes harder to verify the veracity of the key (ie does the owner really own that key? is it who you think it is?). The first set of analysis I performed focused in the number of signatures. The average number of signatures per key was 21, while some keys had no signatures and two had 261 signatures per key. These last two are Debian developers and heavily participate in key signing events. [slide 20] The results of that analysis are shown here in this figure. The heavy bias to the left indicates that most keys have only a handful of signatures. Very few have no signatures, but most only have about 5 or 7 signatures. [slide 21] The next step in the analysis of the signatures on the key was to try and establish the owner of the key. This was inspired by a good set of conversations I have had with Niels Provos. Basically, what you do here is you examine the signatures on any given key and try and trace it back to something you know. In this case, I try and tie the keys back to the large, strong set identified by the initial analysis by the folks at Dtype.org. This strong set is a set of keys over 100,000 strong which are a self contained unit. Every key in that set somehow references every other key and nothing external to that set. Of the 93 keys analyzed here, about 2/3 could be mapped to the strong set. 36 keys failed to map back to the strong set (using the key path server at http://the.earth.li/~noodles/pathfind.html and the data from Jason Harris at http://keyserver.kjsl.com/~jharris/ka/). By tying a key back to the strong set we can safely assume that the owner is correctly indicated on the key. While this metric is considerably stronger than the mere analysis of the number of signatures on any key, it relies heavily on the motives of any signer. Some sign only with full knowledge of the key holder and the link between that person and the key, while others sign keys after a brief introduction. This is a classic contrast of a strong trust metric and a weak one. Using the weak links in the chain, one could subvert the system with enough signatures on any forged key. [slide 22] The links of the keys identified in this study to the keys at the center of the strong set (indicated in blue) are shown here. For a better graphic have a look at http://monkey.org/~jose/graphing/csw03/csw03.png . [slide 23] This foray into the web of trust and its use isn't the first of its kind, but I do think its the first to do a widespread survey of signed archives. The `extract' tool is related to Dug Song's `gzsig' tool and Marius' PGPwrap library. Marius wrote that library after finding the license terms of `GPGme' unacceptable (a typical monkey likes BSD code). The detached signatures are related to the BSD ports tree and the cryptographic checks made by the system. Briefly, any distribution file is hashed using three cryptographic hashes (MD5, SHA-1, and RMD160) to verify the intrigity if the download. Note that this is the system that caught most of the 2002 trojans, not the public keys system. [slide 24] So, while this study has shown that it appears that there are few widespread trojans lying in wait for people in the Internet, there are several weaknesses in the system which can be exploited by an attacker. Ideally, I'd like to continue to perform this check on a rolling basis. Right now I'm looking to find a research partner, I need more bandwidth and I need more storage space. Ideally everyone would be a part of that strong set. Right now there are a number of disconnected islands (note that I'm not in the strong set). This would aid, hopefully, in establishing the veracity of the keys. It would also be nice to see tools incorporated into the PGP model, such as 'extract' or `mutt-sigtrace' (http://www.chaosreigns.com/code/mutt-sigtrace/) which can aid in the checking of keys. Next, more signed archives need to be out there, we need to know that this is what the author intended to upload. And lastly, the world needs a better system. There are simply too many holes in the current one, I think it's time to do better. [slide 25] I really need to acknowledge several people here. Beth let me destroy our cable modem's performance for a couple of days; Marius, Dug, Niels, Alex, and Seth all provided excellent feedback and ideas; the dtype.org people (the participants on the disucssion list, and Jason Harris) have been great in doing their work into the web of trust metrics; the Umeet organizers, thank you for having me speak. And of course you, I appreciate your interest. Thanks!