What Crypto is Good For (And Why it Matters) Jose Nazario May, 2001 Copyright (C) 2001 Jose Nazario, all rights reserved. ----------------------------------------[ Abstract With an increasing awareness of computer and network security, a lot of people are hearing a lot of things about cryptography. Some of these claims are poorly understood or misrepresentations. This piece is an attempt to introduce various cryptography terms, ideas, and illustrate where they can provide security. Three applications are discussed, PGP, SSH and IPSec. No mathematics background is assumed. -----------------------------------[ Introduction In this day and age, as network security topics gain a higher profile, a significant number of people are scrambling to provide some form of security. Encryption has always represented a solution to many people, though it's not always understood. Nor is it always applicable as a security solution, sometimes leading to a wasted effort and a false sense of protection. Below, some basics of cryptography are introduced, and their uses illustrated. This is meant only to serve as a brief overview of the benefits of the application of cryptography, and the places where encryption can be implemented. ----------------[ Conventions Used To facilitate the illustration of message structures, and symbolize various concepts, the following representations will be used: term meaning -------------------------------------- C ciphertext P plaintext || concatenation H() hash function E () encryption with a secret key K K D () decryption with a secret key K K K () encryption with a public key U K () encryption with a private key R ------------------[ Symmetric Cryptography Classical cryptography, using a shared secret, is what we typically think of when we think of encryption. This is called symmetric cryptography, as both sides use a shared secret to encrypt and decrypt communications with each other. They are usually fast when implemented both in software and hardware. The strength of a shared secret cipher is the time it takes to calculate the key for the cipher used, or to analyze captured transmissions to mathematically calculate the key. As such, larger keys are usually equated with a stronger encryption mechanism, everything else being equal. To compromise the key, an attacker would have to search a larger number of keys. The goal is to make this time much longer than the timeliness of the data. Example ciphers include the ROT-13 cipher, Caesar cipher, and Enigma. Note that these three examples range from weak encryption to 'secret decoder ring' kinds of encryption, so don't use them. They just illustrate the point. Popular ciphers in use today that utilize a shared secret include DES, AES, Blowfish, Twofish, IDEA, and RC5. They range from somewhat weak to quite strong when implemented properly. Their speed makes them especially attractive for implementations. The structure of a message 'M' using key 'K' transmitted over an encrypted channel, using the above terminology, would be: sender recipient ------- ------- E (M)=C D (M)=P K K Note that there are two classical problems in the use of shared secret cryptography: key distribution and direct attacks on the key itself. The first problem has been solved by a variety of means, including physical distribution methods. In the past two decades, the use of public key encryption (described more below) as a key exchange protocol has gained popularity, and effectiveness. In this scenario, the key itself for the session is encrypted using the intended recipient's public key and sent to them. This provides protection for the key and a secure distribution method. As described below, PGP, SSH, SSL and IPSec are just some of the cryptosystems that utilize this methodology for symmetrical cipher key exchanges. The second problem associated with secret key cryptography has typically been worked around by choosing a cipher that is resistant to various forms of direct analysis of the messages (ie differential or linear cryptanalysis, not covered here), and good key usage schedules. Often a session key is generated randomly, used only once, and discarded. This means that an attacker would have to begin attacking a new key when a new session is initiated. This is one kind of measure used by SSH, SSL and related protocol suites. --------[ Public Key Cryptography Asymmetric, or public key, cryptography, utilizes two components of a key to allow for security. The key's components, a public half and a private half, are mathematically related. The strength of the mechanism lies in the difficulty of calculating the private key from the public components. Bear in mind that one often overlooked facet of the security is the privacy of the private key, which is sometimes not protected at all. The public key is designed to be shared with the world. Public key cryptography is slower than private key cryptography, and is mainly used for the exchange of symmetrical cryptography keys, authentication and verification. Popular implementations of public key cryptography include RSA, Diffie-Hellman, and El Gamal. Elliptical curve cryptography is also a public key cryptography scheme, based on different mathematics which are also difficult to perform in the reverse direction (such as factoring a private key from the public component). Because they are related, public key cryptography provides an excellent mechanism for authentication. By using the sender's private key to encrypt, and the sender's public key to decrypt, the authenticity of the sender can be verified. This is the basis of a significant number of cryptographic mechanisms, like PGP, SSH, and PKIs. The structure of a message 'M' sent between a sender and a recipient, using the public and private keys of the recipient as described above would be: sender recipient ------- ------- K (M)=C K (M)=P U R Because of the computational complexity involved in both encryption and decryption using public key cryptography, it is often used for small messages only or for authentication purposes. By encrypting a 'nonce', or a one time used phrase, using public key cryptography, the authenticity of the sender can be established. Some suites, like SSH and SSL, use public key cryptography to encrypt the session key, which is used for a secret key cipher. -----------------------------------[ Hash Functions The third major component of a cryptography system is a hash function, often called a 'message digest'. This is, in effect, a fingerprint of the input. Hash functions take variable length inputs and, through a mathematical transformation, produce a fixed length output. These functions are one way, which is to say they cannot be reversed. Only a re-computation of the hash output can be used for verification. This has the effect of being useful to detect changes in the source data of the function. When two different messages, M and M', produce the same message digest output, H, in the same function, it is called a 'collision'. While the goal is to have no collisions in at least 2^n messages, where n is the size of the putput in bits (ie for MD5, which has a 128 bit output, this value would be 2^128), this may not always found. Depending on the mathematics, some functions are more resistant that others. Note that for most uses, popular functions like MD5 and SHA-1 suffice. Popular hash functions include the message digest functions MD4 and MD5, SHA-1 and even the CRC check. SHA-1 is considered to be the strongest one listed, though MD5 suffices for most uses. The CRC function can be trivially worked around, and MD4 has had some problems in its history. However there is one problem in hash functions as their use is described above. If an attacker wishes to circumvent the hash checking, all they would have to do is to recalculate the hash output on their injected data, and re-append that to the stream which they are altering. The target system would be unable to detect the change, as the hash would be computed properly for the altered data. One variant of the hash function gets around this, and is called a keyed hash. These functions, often called 'HMAC' functions, calculate the hash function of the input and then hash this against a fixed length input key of the same length. This input key is another shared secret in the cryptosystem. When the receiving end calculates the HMAC, a difference is noted and the alteration of data is detected. When used in this manner, MD5 becomes HMAC-MD5, and so on. Hashes are typically appended to the data as a signature. A typical message 'M' structure with a hash of function 'H' may look like this: sender recipient ------- ------------------------------------------ M||H(M) calculates H(M) on received M and compares it to the appended H(M). --------------------------[ One Time Pads This is an extremely secure cryptosystem, but in practical usage it's almost intractable to implement. The strength of this system comes from the fact that no two 'keys' are ever the same. In brief, a message is hashed against a pad of random data that is as long as the message itself. The message is then hashed against the 'key' on the receiving end and thus readable by the recipient. This key is never again used, leading to the same input generating different output every time it is used. One extreme example of this was illustrated in the VENONA project, an NSA project that analyzed a significant number of USSR transmissions. Because some one time pads were reused, repeats in the data became apparant and some messages were readable. Very few systems ever use one time pads. -------------[ Other Important Facets of a Cryptosystem The above information is the core of any cryptosystem, but without a decent support infrastructure, all of this work could be wasted. One of the most important features of any cryptosystem is a strong psuedorandom number or bit generator. This is vital for the strength of the session keys (see below for examples where session keys are randomly generated). If someone could predict with reasonable certainty the sequence of bits, they would be able to predict what keys would be used and compromise the system. Even if this is a simple matter of selecting what keys to try first due to a demonstrated bias in bit generation, this can make a significant impact on the efficiency of breaking the encryptin key in a brute force, exhaustive search. The security of the messages encoded with this key, or any communications sessions (ie a ssh session) would thus be abolished. Two popular random number generators are Yarrow and Blum-Blum-Shub. Each produces strongly random appearing numbers. It's important to stress that these are algorithms, and as such do not produce truly random numbers, only numbers that cannot be predicted with reasonable certainty without knowing the seed of the generator. A variety of tests exist to certify a generator as a strong source of psuedorandom numbers or bits, though none of them are perfect. When used in conjunction, however, it is possible to raise the confidence in the entropy of a bit stream. Lastly, its vital to keep the secret components of any cryptosystem secret. This includes secret keys in symmetrical algorithms, the private keys in a public key cryptosystem, or the seed to a random number generator. Often overlooked is the message itself. The weakest point to attack is at the endpoints, when someone encrypts or decrypts the data to be transmitted. For example, decrypting a message received by PGP and leaving the decrypted message world readable defeats the whole purpose of using encryption. ------------[ What Cryptography Provides Having introduced the basics of cryptography, now we can examine what advantages cryptography can provide. In summary, cryptography can provide you with a combination of four major security goals: o Authentication strong proof of the user's identity o Confidentiality the encryption renders the data nearly impossible for an eavesdropper to read o Integrity the alteration of data can be readily detected o Nonrepudiation undisputable proof of the origin of a message All four of these advantages can be readily achieved by the simple application of cryptography in the right places. To illustrate the basics, we will use a simple email transmission as the basis for our secure communications model. Note that it can be extended into almost any direction, including a stream based connection. Let's look at a typical email based communication system. A message, M, can traverse 7 possible routes: 1 It can pass unmolested by anyone from A to B This is the path we expect it to take, though that's not always a safe assumption to make. 2 It can be observed by a third party, C In this scenario, an attacker, C, eavesdrops on a communication between A and B and records the data. This occurs without knowledge of either party A or B as it doesn't affect the message. 3 C can forge a message to B, claiming to be A In this situation, C pretends to be A and sends a message to B. Because it claims to be from A, B may take a certain action. 4 C can intercept and modify the message In this case, C receives the message by some means, modifies the message to alter the content, and returns it on its path to B. 5 C can perform a replay attack of the A<->B session C can capture the traffic as it passes from A to B and not modify it. Then C can resend the stream to B, pretending to be A, hoping to fool B into performing some action again. 6 A can send a message to B and later deny it In this sitation, C has no role. A sends a message to B which they may later want to deny, such as a financial transaction. 7 C can block delivery of the message to B from A In this case, C actively prevents the completion of the transmission from A to B. The use of cryptography would help prevent the attacks in scenarios 2, 3, 4, 5 and 6. Normally a denial of service attack can be carried out without much skill, which may include the feeding of bogus data to a system to choke it, or the simple act of cutting some wires (data or power). Cryptography wouldn't directly help in either scenario. So, how can encryption protect this message? Quite simply, really. The second scenario, where an eavesdropper observes a message transmission, can easily be circumvented by the use of cryptography. If A and B encrypted their messages to each other, the observing party, C, would be unable to decode the messages and understand them. All C would see is a bunch of encrypted bytes. Note that if C is able to get the key used and knows the algorithm in use, the game is up, so a good system, like PGP, must be used. In the third situation described above, where an attacker forges a message to B, claiming to be from A, the use of cryptography could help B notice that it is not a message from A. Recall that the properties of public key cryptography allow for both the directed encryption of a message as well as the signing of a message, depending on which key was used for encryption. As such, if A always writes their messages using their private key, and B decrypts them using A's public key, when a message arrives from C claiming to be from A, the disparity in the sender's identification will be noticed. Similarily, if A and B are communicating and A wishes to deny that they are the origin of a message, B can use the cryptographic signature to prove that A is indeed the origin, an only A could be that origin. This is the point of nonrepudiation. In the fourth point, when the message is intercepted and modified, two forms of cryptography can be used to protect against this. In the first, if the data was encrypted, the attacker would first have to compromise the security of the encryption to decode the message before altering it. Secondly, by using a hash of the message at the end of the transmission, the integrity of the message can be verified. When the recipient compares the hash they compute with the hash sent with the message, a difference will be noted. Again, with nonrepudiation, A can prove to B that they did not send the message that B received. In the fifth situation, the replay attack, A and B communicate over a channel that C observes and captures. This could include a mail message for, say, a financial transaction (such as "Please buy 1000 shares of the WidgetCo preferred stock"), or an authentication stream. C can then pretend to be A at a later time and fool B by replaying the stream from A, hoping to bypass security measures with A's credentials. Cryptography can protect against this through several means, including one time use challenge-response mechanisms and hashed values which include the timestamp information. The challenge-response mechanism is one of the most popular, but relies on a very large and unpredictable pool of challenges to be presented. In the sixth example, where A later denies a message earlier sent by them to B, cryptography an be used to establish the identity of the origin of the message. If A uses a cryptographic signature on their mail, or their mail software issues one, B can use this to establish that A was the only possible origin of the message. This is called 'nonrepudiation', where the receiving party can confirm the identity of the origin of a message. Of course, if A plans to later deny the message's origin, they may attempt to avoid the use of a cryptopgrahic signature in the first place. Situations where this could be a problem should force the use of cryptographic signatures to ensure this kind of protection. In summary, cryptography is not a perfect solution. A dedicated attacker can utilize their computational power, weaknesses in the algorithms or implementations you have chosen, or, by some other means obtain the data which you are trying to protect using encryption. The use of cryptography only makes this attack more difficult. Furthermore, it is important to recognize the value of the data when compared to the security applied to it. Obviously a small email to a friend to schedule a lunch date (where nothing sensitive would be dicussed) doesn't need to be encrypted. However, company data, personnel information, administrative or even account passwords, or government secrets obviously would draw attention from an adversary and need to be protected from their eyes. These kinds of transmissions should utilize some form of protection, which encryption may be of value in providing. ----------[ Crypto in the Real World Having introduced the basics of encryption and cryptosystems, and outlined their advantages, we will illustrate how three cryptosystems utilize these facets of cryptography to achieve security. --------------------< PGP Pretty Good Privacy, or PGP, is a standard based email security solution. It utilizes public key cryptography, together with symmetric cryptography and hashing functions, to provide secure email at the message level. This leads to privacy, authentication, and integrity of the email messages. PGP users generate their asymmetric cryptography keys and make available their public key for anyone to use. This key is then added to the sender's key ring and used when a message is composed to that recipient. This key ring is a collection of public keys of the people with whom you communicate. By storing them on a ring, you keep them in a safe place that is also convenient to access, rather than having to redownload them for each message you wish to send. Alternatively, when viewing a message signed with a private key, this public key is used for decryption. PGP messages are combinations of public key encrypted material, private key encrypted material, and hash function output. When an encrypted email is generated to a recipient, PGP generates a random session key for a symmetrical algorithm (usually the IDEA cipher) and encrypts the message using this key. The key is then encrypted using the recipient's public key and prepended to the message. A hash is generated on this encrypted message and appended, providing a basic integrity check. The recipient then decrypts the session key using their private key, decodes the message using the session key, and PGP verifies it arrived intact by computing a hash value for the message. When messages are signed, a cryptographically strong hash (such as MD5) of the message is generated and then encrypted using the sender's private key. This is then appended to the message. When a recipient wants to verify the source of the message, they can use the public key of the sender to decode the hash and compare the output to the computer hash. When they match, the message is known to have originated from the indicated source. Signing and encryption can be combined, of course, providing for even stronger security. Keys are verified through a web of trust. By certifying the veracity of the key's owner, signers extend trust. -------------------------------------< SSH The secure shell, or SSH, protocol operates much like PGP but on a stream of data. It provides for similar kinds of protection that PGP offers, including authentication, integrity and confidentiality. Account information, including passwords and session data, is protected using encryption. Upon a connection initiation, the client and server exchange public keys. This public key is then used to encrypt a session key, used with a symmetrical cipher. The client sends this key to the server and communications are then done using this symmetric algorithm key. This provides for authentication of the server (if the server's public key is known beforehand), and privacy, as well as integrity of the session. This leads to protections against passive attacks, like password sniffing, and active attacks, like session hijacking. Stronger authentication can be achieved as well, using public key cryptography on the part of the client. By presenting data that could only have been encrypted using the private key, and the server knowing the public key already for that user, the client can be authenticated, as well. Hashing functions are also used to provide integrity checking of the data in the stream. Upon detection of an invalid sum of the data, the connection is dropped to prevent any forged data from entering the system. One additional feature of ssh not intuitively obvious is the defense against a replay attack. This helps keep a defender from using an already used ssh stream to gain entry back into a network, for example after watching a legitimate session occur. --------------< IPSec IPSec is a set of extensions to the IPv4 protocol, and is standard in IPv6, which allows for network layer security. By providing this kind of facility at the network layer, applications can utilize the IPSec features without any modification. The underlying system, such as the kernel or various network devices, are the only ones that need to know about IPSec. By utilizing all three cryptography components described above, public key cryptography, private key cryptosystems and hash functions, IPSec provides for authentication, integrity, confidentiality, and replay protection. These can be combined to produce a strong network layer. Two new protocols are specified, numbered 50 and 51, for ESP (or Encapsulating Security Payload) and AH (Authentication Header), respectively. In its basic form, transport mode, IPSec prepends the AH header information before an IP packet. This is then stripped off when the other endpoint is reached. By verifying that the packet originated with the source it claims to have, strong authentication and integrity checking at the packet level are achieved. Note that the AH protocol provides no protection for the confidentiality of the data. In tunnel mode, IPsec uses both the AH and the ESP headers to provide for both authentication and confidentiality, as well as integrity checking. The entire packet is encrypted, as well, providing for strong protection of the data and the stream. IPsec systems usually use either IPsec gateways on a network to transparently provide network level encryption to the client machines, or nodes have IPsec software installed. Routing tables determine if traffic is to be encapsulated within the IPsec stream. Because it uses encapsulation, any kind of IP packet can be protected by IPsec, including any TCP stream, UDP communication, or even ICMP. All of this occurs transparently to the upper layer applications and without any additional configuration once the IPsec routing is set up. The encryption parameters are specified either automatically when a new IPSec connection is negotiated or manually. They include the encryption and hashing algorithms, plus keys to protect the data. The Oakley key exchange protocol operates very similarly to the SSH connection setup, utilizing public key cryptography for mutual authentication, and then the exchange of symetrical cryptography keys used in the encryption and decryption of packets. --------------[ Getting and Using Cryptography Having explained the benefits and uses of cryptography, you may wish to introduce a more extensive use of encryption in your applications or your networking. This can be easily accomplished using a few tools. One of the most extensive cryptographic toolkits, and one that is widely regarded with great esteem, is the OpenSSL toolkit. It provides not only SSL encryption features (the Secure Sockets Layer, ie HTTPS), but also a wide variety of encryption algorithms and hash functions. These are available on the command line, interactively, and via API calls to libcrypto. The major problem with OpenSSL is that it operates only at the application layer, in userspace. While networking applications can take advantage of OpenSSL's encryption routines, they have to be aware of these calls and be built around them. At the networking layer, where the applications do not have to be aware of the encryption mechanisms, you have to go to the kernel. To do this, IPSec functionality would have to be integrated into the kernel. On Linux, the FreeS/WAN effort has made great strides to getting IPSec functions integrated into Linux. In BSD kernels, the KAME project has been especially popular and effective, providing not only IPSec but also IPv6. Please see the resources section of this article for more information. ----------------[ Limitations of Cryptography It's important to note that encryption will not stop a good number of security attacks, and that it's only a piece of a larger security suite. For instance, buffer overflow attacks, the popular string format attack, and most denial of service attacks cannot be thwarted by encryption. One tempting solution is to require strong authentication for every service, including ones open to the world. This isn't realistic, though, as you may not need that kind of authentication from people. After all, it is an open resource for everyone. Secondly, most denial of service attacks don't require any form of authentication, thus defeating the whole purpose of using cryptography as an authentication mechanism. It's also vital to state, quite clearly, that often it is said that, "with encryption, an attacker will be unable to sniff your data from the wire." That's not true. In fact, encrypted data can still be sniffed from the network. The main problem is then that the attacker will have to decode it to make use of the data, if they are after the payload of the data. Due to weaknesses in certain algorithms or their implementations, an attacker could, in a timely fashion, decode the data and make use of the stolen information. If they are interested solely in traffic analysis, ie two companies talking more frequently, which may signal a business action between them, the standard use of cryptography will not be of assistance in thwarting that type of attack. While encryption can be integrated into a system to thwart traffic analysis, this is beyond the scope of this article. -------------[ Conclusions This has been a brief tour of the principles of cryptography and their application to the protection of data. By applying cryptography in the right areas, data and access can be protected and controlled. However, crypto is not the perfect solution, but only one component of a complex security policy. -----------------------------[ Resources As mentioned above, here are some cryptographic toolkits: OpenSSL: http://www.openssl.org/ KAME: http://www.kame.net/ FreeS/WAN: http://www.freeswan.org/ To learn more about cryptography, check out this reading list: Applied Cryptography: considered a great, useful handbook on the subject. http://www.amazon.com/exec/obidos/ASIN/0471117099/ref=lm_lb_6/107-8383142-92797 13 Cryptography & Network Security: a great introduction to ciphers, their applications (ie SSL, Kerberos), and related topics. http://www.amazon.com/exec/obidos/ASIN/0138690170/ref=lm_lb_7/107-8383142-92797 13 PGP : Pretty Good Privacy: though dated, a thorough coverage of the subject. http://www.amazon.com/exec/obidos/ASIN/1565920988/ref=lm_lb_5/107-8383142-92797 13 Secure Shell : The Definitive Guide: a wonderful tome on this great tool. http://www.amazon.com/exec/obidos/ASIN/0596000111/ref=lm_lb_3/107-8383142-92797 13 IPSec: The New Security Standard: a good intro to IPSec and its complexities. http://www.amazon.com/exec/obidos/ASIN/0130118982/ref=lm_lb_1/107-8383142-92797 13 A brief introduction to differential and linear cryptanslysis (which is quite difficult and math intensive): http://www-computerlabor.math.uni-kiel.de/~fjacobs/dlcrypta/dlcrypta.html