signed archives: an evaluation of internet trust

jose nazario {jose@crimelabs.net}, crimelabs research. september, 2002.

copyright 2002 jose nazario, all rights reserved.

abstract

in 2002, a series of high profile compromises of internet software servers resulted in the alteration of software archives. this prompted an evaluation of the state of trust of the signed software distribution system. over 2800 archives representing over 1400 unique software packages were downloaded and their corresponding signatures evaluated for validity. these software packages were pulled from over 260 different sites and the keys retrieved only during the verification stage. of the over 2800 archives checked, only 5 errors were found, three of which were found to be false negatives. additionally, the characteristics of the keys used to sign these archives along with the key distribution systems were studied. these findings highlight weaknesses in the signed archive distribution system and demonstrate clear vulnerabilities facing several projects.

introduction

in mid 2002, a series of compromises of high profile software distribution sites occured. this list includes the breakin and modification of the popular irc client 'irssi' [1], the dsniff, fragroute, and fragrouter source code from dugsong [2], and the openssh source code [3]. in each case the modifications were detected via cryptographic checksums, but could have been detected with public key signatures for the packages, as well.

the use of public key signatures for software archives is popular, typically using the pgp standard. the model used in pgp (and tools using this standard, including gnupg and openpgp) use the 'web of trust' model (see [4]). in this scenario, the author or team of authors generate a public and private key pair and sign the software using their private key. the public key is distributed and often signed by others as to its veracity and trustworthiness. when the software is downloaded, the public key is used to validate the source of the software and the match between the signature and the archive. this public key has increased trust based upon the number and nature of the signatures it has.

the modification of these popular software packages raises the question of how many other software packages are compromised but have remained unreported. additional questions raised include the integrity of the underlying public key system, including the strength of the keys, the trustworthiness of them through signatures on the keys, and the key distribution mechanisms typically found.

in order to obtain the answers to these questions, a diverse set of signed software packages were identified and downloaded along with their signatures. the archives were then verified using the indicated keys and the result was recorded. while only a handful of negative results were found, weaknesses in the system overall were identified and bulk public key statistics were measured. based upon this evidence, the public key signature system as it is used contains minor weaknesses and is susceptible to manipulation by an attacker.

survey results

in mid august, 2002, a total of 2804 signed software archives were identified via a google search (see methods). this number represented 1426 unique archives, with the difference being due to software mirrors of some packages. these archives were located on 166 unique servers throughout the world. the software was downloaded using a cable modem and took approximately 2.5 days, with the archive and their signatures using approximately 9.5 GB of disk space. during the bulk processing 2799 downloaded archives were a positive verification.

five failures were found in the course of this batch verification. of those, one was due to a truncated download (identified by comparing the archive against the same archive from different sites) and two were false negatives revealed by repeating the verification process. the remaining two failures were legitimate mismatches between the signature and the archive. the author was contacted and this result was confirmed.

during the course of this study some archives were unable to verify from the cmu-snmp archives. these are found at ftp://ftp.andrew.cmu.edu/pub/snmp/. attempts to verify the signature using the downloaded archive produce the error:

gpg: Warning: using insecure memory!
gpg: please see http://www.gnupg.org/faq.html for more information
gpg: Signature made Mon Mar 31 18:18:58 1997 EST using RSA key ID 65965CD1
gpg: Can't check signature: public key not found
  
an attempt was made to fetch this key from the various keyservers, all of which yielded errors:
$  gpg --keyserver pgp.mit.edu --recv-key 65965CD1 
gpg: Warning: using insecure memory!
gpg: please see http://www.gnupg.org/faq.html for more information
gpg: requesting key 65965CD1 from HKP keyserver pgp.mit.edu
gpg: key 65965CD1: no valid user IDs
gpg: this may be caused by a missing self-signature
gpg: Total number processed: 1
gpg:           w/o user IDs: 1
  
this leaves the key unimported into the local key ring, causing the first error shown above. examination of several archives and websites were unable to produce a key. all other archives were able to be processed by the bulk verification tools used in this study.

uncovered weaknesses

while at first it may appear that the trust in the system of signed archives on the internet is well placed, several common practices were revealed during this study. these observed behaviors can be combined to weaken the trust which can be placed in the signed archive. these practices include the placement of the public key, the vulnerability of the key to compromise, and a low number of signatures on an average public key. each of these concerns is discussed below using data gathered from the public keys used in this study.

inline key distribution

the location for the distribution of the public keys associated with a signed archive is an important consideration. as described above for the cmu-snmp archives, public keys must be readily identifiable and downloaded in order to verify the signature. this concern was described by alex brennen in the 'Strong Distribution HOWTO' [5] in section 2.2:

There are three steps that I recommend that you take in order to circulate your public key. First, you should post your public key on the website where the software is distributed from. You should place the ASCII armored public key is a conspicuous place where people can easily find and download it.
care must be taken, however, to protect the integrity of the key. this is somewhat addressed in the 'Strong Distribution HOWTO' when the author states later in that same section,
I do not recommend that you include your public key inside your software archive. While there is no technical security problems with this, it does encourage the end user to accept the public key driven by trust based in the location of the key rather than integrity imparted upon the key by signatures. Encouraging such habits in the end users will make them more susceptible to trojan horse attacks against the Strong Distribution Model in which fake archives and fake keys are distributed.
while brennen cautions against inclusion of the key within the archive, distributing the key inline with the archive by placing it on the same server and often in the same directory is a similar action. a compromised server can not only have its software archives modified by an attacker but the keys forged, as well, leading to a match between the signature, key, and archive. instead, the distribution site for the keys should be an additional factor into their trust. a key server which protects the keys protects the signatures, by extension.

several popular software packages which are signed use inline key distribution, in addition to a key server, to make their public keys available. these products include:

in each case an attacker can trivially circumvent the protection offered by using public key signatures by inserting their own key into this location. due to its location, users will download this key and trust it and verify the signature using this key.

key compromise

one additional consideration in the evaluation of the trust of the signatures on software archives is the age of the keys. older keys are typically of a smaller bit size and thus weaker than their newer counterparts. this is due mainly to the limitations of the software at the time of the key's creation. as shown in figure 1, most keys are no more than 3 years old, but a significant number of the keys used to sign the archives examined in this study are 5 or more years old. key age and size are related to their likelihood of their compromise by factorization by a determined adversary.

figure 1: distribution of key ages. the keys used in this study were examined to find their creation year. this date was then graphed as a function of the frequency of the date of key creation. most keys are no more than 3 years old, with some keys almost 10 years old.

similarily, when the sizes of the keys are examined, a strong trend towards the default settings is immediately appearant. later versions of pgp and the gpg tool use 1024 bits as the default key size. the observed key lengths in bits are plotted in figure 2. when looking at recommendations by both schneier and rivest, 1024 is on the short end at the current time (2002) for security against even a modestly funded foe [6].

figure 2: size distribution of keys in use. the sizes of the public keys used to sign the archives studied in this research are shown above. of the 93 keys examined here, 79 have a size of 1024 bits. this graph does not differentiate between DSA keys (57 in this study) and RSA keys (36 in this study).

because earlier versions of pgp had limitations on the sizes of keys they could generate, the year of key creation was examined in relation to the size of the key. as shown in figure 3, no strong correlation between the date of key generation and size in bits exists. in general, larger keys are found in the more recent years. note that in every year found, 1024 bit keys are also in use.

figure 3: correlating key age and size in bits. the ages of the keys studied here were plotted against their sizes in bits. while a general trend of larger keys as the age decreases is observed, it is only a general trend. it is also interesting to observe that 1024 bit keys are present at every sample period. it is important to note that these keys are all in active use at this time.

the size of keys is a factor in the security of the key, and thus the signed archive. recent advances in factoring of public keys [7] have caused some to abandon the reccomendations made by schneier, rivest, and others [6] in favor of larger keys [8].

obviously the nature of the adversary must be taken into account. software which is of interest to governments or large corporations to alter are under more serious threats than software which only interests individuals. software of the former include cryptography products like IPsec and SSH implementations; the latter includes tools like gnuplot and snmp. it is safe to assume that factoring keys above 512 bits is still of interest only to dedicated researchers, large corporations, and governments.

weak signature use

by far the biggest issue uncovered was a large number of keys used to sign archives which were self signed or contain a low number of signatures by other parties. from the analyzed archives a total number of 93 unique keys were retrieved which contained 1971 signatures. this yields an average of 21 signatures per key. the number of signatures per key is plotted in figure 4.

figure 4: signatures per key. using the keys downloaded to verify the software archives studied in this research, the number of signatures per key was measured and the resulting frequency plotted. each key has at least one signature (from itself), with most keys having 4 signatures. only a handful of keys have more than 12 signatures. note that this does not indicate the strength of the signatures, measured by their connectedness.

at the maximum a number of 260 signatures were found on two keys used in this study, from Joost van Baal and Eduard Bloch, each involved in the debian linux project. at the minimum, five keys were found to have only one signature. this signature was from the self signing of the key, which adds no security to the key. a total of 25 keys were found to be in use with 3 or fewer signatures on each key.

notable keys which are in active use with 3 or fewer signatures include the official scyld computing key (scyld.com), the procmail distribution key, each with 3 signatures on each key (2 when the self signature is discounted), the frees/wan 1.97 snapshot key, which has 2 signatures, and the xemacs distribution key, which has only one signature from itself. in each case this limited number of signatures reduces the amount of trust in the archive due to the possibility of an introduction of a forged key. in the absence of trustworthy signatures, the veracity of the key cannot be understood.

it is important to note that while it is easy to obtain a large number of signatures for a public key, it is important that these keys be trustworthy and verifiable, as well. of the 1971 signatures found in the keys used in this study, 1662 had unknown user ids. subtracting the self signatures (93, one for each public key used in this research), this leaves 216 keys which had established user ids based soley on other archive signatures. while a key may have many signatures, they only add value when they are from known and trusted parties.

related work

this research is not the first foray into studying the application of public key cryptography in the real world. a 1996 study into the trust model as it is implemented in pgp reveals that the weakest component of the pgp trust model is the trust assigned by a person to any key [9]. an additional study examined the security of the private key from the strength of the passphrase in use. while the study did not accumulate a large pool of passphases, the data it did accumulate enough data to note that passphrase security is a threat to the security of the system [10]. lastly, an examination of an attack tree against the pgp model shows that theft of the private key by other means (such as compromise of a workstation) is a more substantial threat than factoring the public keys [11].

conclusions

this paper has examined the verification of over 2800 signed archives downloaded from various worldwide sites on the internet. while only 2 failures to properly verify were found, several other threats to the security of the signed distribution model were found and discussed. these weaknesses include a lack of trustworthy signatures on the keys, old or low strength keys, and poor distribution methods of the public key. while the system as a whole is not failing, key points where improvements can be made have been identified. possible additional work includes an ongoing project to verify archives found on the internet using similar methods employed here, and key signing facilitation, especially to trusted pgp users.

methods

to find signed archives to download and evaluate, the google search engine was used with the search terms 'tar.gz.sig' and 'tgz.sig'. these search terms were chosen as they are popular signature file extensions. this search yielded 261 sites and 400 unique sites and subdirectories to check, representing 2804 archives to download along with 2804 signatures. due to duplicates and mirrors, this list represented 1426 unique archives to download and evaluate. the 'wget' utility was used to download the directories, which included the archive and the signature file, to a local machine for evaluation.

a small tool was written which used the 'gpg' tool, version 1.0.7, running on openbsd to verify the signatures. the directory tree was walked and the archives and signatures were compared, with the actions logged. if the key did not exist in the local public keyring, it was fetched from pgp.mit.edu and the verification process was repeated. this key server was chosen as it is a well established key server which contains most of the keys used in this study. failures were examined manually.

keys were examined after the verification process to gather statistics on the stated key size, key creation date, and number of signatures on each key. no attempt was made to verify the veracity of the stated key signatures on any key, and no trust web analysis was performed.

graphs were generated using gnuplot.

the complete list of sites and archives analyzed in this study, as well as the tools used to process the data, are all available in a separate document here.

references

1. irssi compromise information, http://www.irssi.org/?page=news
2. Re: Trojan/backdoor in fragroute 1.2 source distribution, Dug Song, http://lwn.net/Articles/1479/?format=printable
3. OpenSSH Security Advisory (adv.trojan), Niels Provos, http://www.openssh.com/txt/trojan.adv
4. Explanation of the web of trust of PGP, Patrick Feisthammel, http://www.rubin.ch/pgp/weboftrust.en.html
5. Strong Distribution HOWTO, V. Alex Brennen, http://www.cryptnet.net/fdp/crypto/strong_distro.html
6. Applied Cryptography, Second Edition, Bruce Schneier, pages 158-165.
7. Circuits for integer factorization: a proposal, D. J. Bernstein, http://cr.yp.to/papers/nfscircuit.ps
8. 1024-bit RSA keys in danger of compromise, Lucky Green, http://tin.le.org/vault/security/encryption/rsa1024.html
9. An Analysis of PGP's Trust Model, Alfarez Abdul-Rahman, http://www.cs.ucl.ac.uk/staff/F.AbdulRahman/docs/pgptrust.html
10. Results of a Survey on PGP Pass Phrase Usage, Arnold G. Reinhold, http://world.std.com/~reinhold/passphrase.survey.asc
11. Attack Trees, Bruce Schneier, http://www.counterpane.com/attacktrees-ddj-ft.html

acknowledgements

the author would like to extend his gracious thanks to rick wash, marius eriksen, and niels provos, all of the university of michigan, along with jason peel, florian kohl, jeff godin, and v. alex brennen for their insightful comments and advice during this study and the preparation of this report.