What Crypto is Good For
                         (And Why it Matters)

                             Jose Nazario
                               May, 2001

        Copyright (C) 2001 Jose Nazario, all rights reserved.

----------------------------------------[ Abstract

With an increasing awareness of computer and network security, a lot of
people are hearing a lot of things about cryptography. Some of these
claims are poorly understood or misrepresentations. This piece is an
attempt to introduce various cryptography terms, ideas, and illustrate
where they can provide security. Three applications are discussed, PGP,
SSH and IPSec. No mathematics background is assumed.

-----------------------------------[ Introduction

In this day and age, as network security topics gain a higher profile, a
significant number of people are scrambling to provide some form of
security. Encryption has always represented a solution to many people,
though it's not always understood. Nor is it always applicable as a
security solution, sometimes leading to a wasted effort and a false sense
of protection.

Below, some basics of cryptography are introduced, and their uses
illustrated. This is meant only to serve as a brief overview of the
benefits of the application of cryptography, and the places where
encryption can be implemented.

----------------[ Conventions Used

To facilitate the illustration of message structures, and symbolize
various concepts, the following representations will be used:

term    meaning
--------------------------------------
C       ciphertext
P       plaintext
||      concatenation
H()     hash function
E ()    encryption with a secret key K
 K
D ()    decryption with a secret key K
 K
K ()    encryption with a public key
 U
K ()    encryption with a private key
 R

------------------[ Symmetric Cryptography

Classical cryptography, using a shared secret, is what we typically think
of when we think of encryption. This is called symmetric cryptography, as
both sides use a shared secret to encrypt and decrypt communications with
each other. They are usually fast when implemented both in software and
hardware.

The strength of a shared secret cipher is the time it takes to calculate
the key for the cipher used, or to analyze captured transmissions to
mathematically calculate the key. As such, larger keys are usually equated
with a stronger encryption mechanism, everything else being equal. To
compromise the key, an attacker would have to search a larger number of
keys. The goal is to make this time much longer than the timeliness of the
data.

Example ciphers include the ROT-13 cipher, Caesar cipher, and Enigma. Note
that these three examples range from weak encryption to 'secret decoder
ring' kinds of encryption, so don't use them. They just illustrate the
point. Popular ciphers in use today that utilize a shared secret include
DES, AES, Blowfish, Twofish, IDEA, and RC5. They range from somewhat weak
to quite strong when implemented properly. Their speed makes them
especially attractive for implementations.

The structure of a message 'M' using key 'K' transmitted over an encrypted
channel, using the above terminology, would be:

        sender                  recipient
        -------                 -------
        E (M)=C                 D (M)=P
         K                       K

Note that there are two classical problems in the use of shared secret
cryptography: key distribution and direct attacks on the key itself. The
first problem has been solved by a variety of means, including physical
distribution methods. In the past two decades, the use of public key
encryption (described more below) as a key exchange protocol has gained
popularity, and effectiveness. In this scenario, the key itself for the
session is encrypted using the intended recipient's public key and sent to
them. This provides protection for the key and a secure distribution
method. As described below, PGP, SSH, SSL and IPSec are just some of the
cryptosystems that utilize this methodology for symmetrical cipher key
exchanges.

The second problem associated with secret key cryptography has typically
been worked around by choosing a cipher that is resistant to various forms
of direct analysis of the messages (ie differential or linear
cryptanalysis, not covered here), and good key usage schedules. Often a
session key is generated randomly, used only once, and discarded. This
means that an attacker would have to begin attacking a new key when a new
session is initiated. This is one kind of measure used by SSH, SSL and
related protocol suites.

--------[ Public Key Cryptography

Asymmetric, or public key, cryptography, utilizes two components of a key
to allow for security. The key's components, a public half and a private
half, are mathematically related. The strength of the mechanism lies in
the difficulty of calculating the private key from the public components.
Bear in mind that one often overlooked facet of the security is the
privacy of the private key, which is sometimes not protected at all. The
public key is designed to be shared with the world.

Public key cryptography is slower than private key cryptography, and is
mainly used for the exchange of symmetrical cryptography keys,
authentication and verification.

Popular implementations of public key cryptography include RSA,
Diffie-Hellman, and El Gamal. Elliptical curve cryptography is also a
public key cryptography scheme, based on different mathematics which are
also difficult to perform in the reverse direction (such as factoring a
private key from the public component).

Because they are related, public key cryptography provides an excellent
mechanism for authentication. By using the sender's private key to
encrypt, and the sender's public key to decrypt, the authenticity of the
sender can be verified. This is the basis of a significant number of
cryptographic mechanisms, like PGP, SSH, and PKIs.

The structure of a message 'M' sent between a sender and a recipient,
using the public and private keys of the recipient as described above
would be:

        sender                  recipient
        -------                 -------
        K (M)=C                 K (M)=P
         U                       R

Because of the computational complexity involved in both encryption and
decryption using public key cryptography, it is often used for small
messages only or for authentication purposes. By encrypting a 'nonce', or
a one time used phrase, using public key cryptography, the authenticity of
the sender can be established. Some suites, like SSH and SSL, use public
key cryptography to encrypt the session key, which is used for a secret
key cipher.

-----------------------------------[ Hash Functions

The third major component of a cryptography system is a hash function,
often called a 'message digest'. This is, in effect, a fingerprint of the
input. Hash functions take variable length inputs and, through a
mathematical transformation, produce a fixed length output. These
functions are one way, which is to say they cannot be reversed. Only a
re-computation of the hash output can be used for verification. This has
the effect of being useful to detect changes in the source data of the
function.

When two different messages, M and M', produce the same message digest
output, H, in the same function, it is called a 'collision'. While the
goal is to have no collisions in at least 2^n messages, where n is the
size of the putput in bits (ie for MD5, which has a 128 bit output, this
value would be 2^128), this may not always found. Depending on the
mathematics, some functions are more resistant that others. Note that for
most uses, popular functions like MD5 and SHA-1 suffice.

Popular hash functions include the message digest functions MD4 and MD5,
SHA-1 and even the CRC check. SHA-1 is considered to be the strongest one
listed, though MD5 suffices for most uses. The CRC function can be
trivially worked around, and MD4 has had some problems in its history.

However there is one problem in hash functions as their use is described
above. If an attacker wishes to circumvent the hash checking, all they
would have to do is to recalculate the hash output on their injected data,
and re-append that to the stream which they are altering. The target
system would be unable to detect the change, as the hash would be computed
properly for the altered data. One variant of the hash function gets
around this, and is called a keyed hash. These functions, often called
'HMAC' functions, calculate the hash function of the input and then hash
this against a fixed length input key of the same length. This input key
is another shared secret in the cryptosystem. When the receiving end
calculates the HMAC, a difference is noted and the alteration of data is
detected. When used in this manner, MD5 becomes HMAC-MD5, and so on.

Hashes are typically appended to the data as a signature. A typical
message 'M' structure with a hash of function 'H' may look like this:

        sender          recipient
        -------         ------------------------------------------
        M||H(M)         calculates H(M) on received M and compares
                        it to the appended H(M).  

--------------------------[ One Time Pads

This is an extremely secure cryptosystem, but in practical usage it's
almost intractable to implement. The strength of this system comes from
the fact that no two 'keys' are ever the same.

In brief, a message is hashed against a pad of random data that is as long
as the message itself. The message is then hashed against the 'key' on the
receiving end and thus readable by the recipient. This key is never again
used, leading to the same input generating different output every time it
is used.

One extreme example of this was illustrated in the VENONA project, an NSA
project that analyzed a significant number of USSR transmissions. Because
some one time pads were reused, repeats in the data became apparant and
some messages were readable.

Very few systems ever use one time pads.

-------------[ Other Important Facets of a Cryptosystem

The above information is the core of any cryptosystem, but without a
decent support infrastructure, all of this work could be wasted.

One of the most important features of any cryptosystem is a strong
psuedorandom number or bit generator. This is vital for the strength of
the session keys (see below for examples where session keys are randomly
generated).

If someone could predict with reasonable certainty the sequence of bits,
they would be able to predict what keys would be used and compromise the
system. Even if this is a simple matter of selecting what keys to try
first due to a demonstrated bias in bit generation, this can make a
significant impact on the efficiency of breaking the encryptin key in a
brute force, exhaustive search. The security of the messages encoded with
this key, or any communications sessions (ie a ssh session) would thus be
abolished.

Two popular random number generators are Yarrow and Blum-Blum-Shub. Each
produces strongly random appearing numbers. It's important to stress that
these are algorithms, and as such do not produce truly random numbers,
only numbers that cannot be predicted with reasonable certainty without
knowing the seed of the generator.

A variety of tests exist to certify a generator as a strong source of
psuedorandom numbers or bits, though none of them are perfect. When used
in conjunction, however, it is possible to raise the confidence in the
entropy of a bit stream.

Lastly, its vital to keep the secret components of any cryptosystem
secret. This includes secret keys in symmetrical algorithms, the private
keys in a public key cryptosystem, or the seed to a random number
generator.

Often overlooked is the message itself. The weakest point to attack is at
the endpoints, when someone encrypts or decrypts the data to be
transmitted. For example, decrypting a message received by PGP and leaving
the decrypted message world readable defeats the whole purpose of using
encryption.

------------[ What Cryptography Provides

Having introduced the basics of cryptography, now we can examine what
advantages cryptography can provide.

In summary, cryptography can provide you with a combination of four major
security goals:

        o Authentication
          strong proof of the user's identity

        o Confidentiality
          the encryption renders the data nearly impossible for an
          eavesdropper to read

        o Integrity
          the alteration of data can be readily detected

        o Nonrepudiation
          undisputable proof of the origin of a message

All four of these advantages can be readily achieved by the simple
application of cryptography in the right places.

To illustrate the basics, we will use a simple email transmission as the
basis for our secure communications model. Note that it can be extended
into almost any direction, including a stream based connection.

Let's look at a typical email based communication system. A message, M,
can traverse 7 possible routes:

        1 It can pass unmolested by anyone from A to B
          This is the path we expect it to take, though that's not always
          a safe assumption to make.

        2 It can be observed by a third party, C
          In this scenario, an attacker, C, eavesdrops on a communication
          between A and B and records the data. This occurs without
          knowledge of either party A or B as it doesn't affect the
          message.

        3 C can forge a message to B, claiming to be A
          In this situation, C pretends to be A and sends a message to 
          B. Because it claims to be from A, B may take a certain action.

        4 C can intercept and modify the message 
          In this case, C receives the message by some means, modifies the
          message to alter the content, and returns it on its path to B.

        5 C can perform a replay attack of the A<->B session
          C can capture the traffic as it passes from A to B and not
          modify it. Then C can resend the stream to B, pretending to be
          A, hoping to fool B into performing some action again. 

        6 A can send a message to B and later deny it
          In this sitation, C has no role. A sends a message to B which
          they may later want to deny, such as a financial transaction.

        7 C can block delivery of the message to B from A
          In this case, C actively prevents the completion of the 
          transmission from A to B.

The use of cryptography would help prevent the attacks in scenarios 2, 3,
4, 5 and 6. Normally a denial of service attack can be carried out without
much skill, which may include the feeding of bogus data to a system to
choke it, or the simple act of cutting some wires (data or power).
Cryptography wouldn't directly help in either scenario.

So, how can encryption protect this message? Quite simply, really.

The second scenario, where an eavesdropper observes a message
transmission, can easily be circumvented by the use of cryptography. If A
and B encrypted their messages to each other, the observing party, C,
would be unable to decode the messages and understand them. All C would
see is a bunch of encrypted bytes. Note that if C is able to get the key
used and knows the algorithm in use, the game is up, so a good system,
like PGP, must be used.

In the third situation described above, where an attacker forges a message
to B, claiming to be from A, the use of cryptography could help B notice
that it is not a message from A. Recall that the properties of public key
cryptography allow for both the directed encryption of a message as well
as the signing of a message, depending on which key was used for
encryption. As such, if A always writes their messages using their private
key, and B decrypts them using A's public key, when a message arrives from
C claiming to be from A, the disparity in the sender's identification will
be noticed. Similarily, if A and B are communicating and A wishes to deny
that they are the origin of a message, B can use the cryptographic
signature to prove that A is indeed the origin, an only A could be that
origin. This is the point of nonrepudiation. 

In the fourth point, when the message is intercepted and modified, two
forms of cryptography can be used to protect against this. In the first,
if the data was encrypted, the attacker would first have to compromise the
security of the encryption to decode the message before altering it.
Secondly, by using a hash of the message at the end of the transmission,
the integrity of the message can be verified. When the recipient compares
the hash they compute with the hash sent with the message, a difference
will be noted. Again, with nonrepudiation, A can prove to B that they did
not send the message that B received. 

In the fifth situation, the replay attack, A and B communicate over a
channel that C observes and captures. This could include a mail message
for, say, a financial transaction (such as "Please buy 1000 shares of the
WidgetCo preferred stock"), or an authentication stream. C can then
pretend to be A at a later time and fool B by replaying the stream from A,
hoping to bypass security measures with A's credentials. Cryptography can
protect against this through several means, including one time use
challenge-response mechanisms and hashed values which include the
timestamp information. The challenge-response mechanism is one of the most
popular, but relies on a very large and unpredictable pool of challenges
to be presented. 

In the sixth example, where A later denies a message earlier sent by them
to B, cryptography an be used to establish the identity of the origin of
the message. If A uses a cryptographic signature on their mail, or their
mail software issues one, B can use this to establish that A was the only
possible origin of the message. This is called 'nonrepudiation', where the
receiving party can confirm the identity of the origin of a message. Of
course, if A plans to later deny the message's origin, they may attempt to
avoid the use of a cryptopgrahic signature in the first place. Situations
where this could be a problem should force the use of cryptographic
signatures to ensure this kind of protection.

In summary, cryptography is not a perfect solution. A dedicated attacker
can utilize their computational power, weaknesses in the algorithms or
implementations you have chosen, or, by some other means obtain the data
which you are trying to protect using encryption. The use of cryptography
only makes this attack more difficult.

Furthermore, it is important to recognize the value of the data when
compared to the security applied to it. Obviously a small email to a
friend to schedule a lunch date (where nothing sensitive would be
dicussed) doesn't need to be encrypted. However, company data, personnel
information, administrative or even account passwords, or government
secrets obviously would draw attention from an adversary and need to be
protected from their eyes. These kinds of transmissions should utilize
some form of protection, which encryption may be of value in providing.

----------[ Crypto in the Real World

Having introduced the basics of encryption and cryptosystems, and outlined
their advantages, we will illustrate how three cryptosystems utilize these
facets of cryptography to achieve security.

--------------------< PGP

Pretty Good Privacy, or PGP, is a standard based email security solution.
It utilizes public key cryptography, together with symmetric cryptography
and hashing functions, to provide secure email at the message level. This
leads to privacy, authentication, and integrity of the email messages.

PGP users generate their asymmetric cryptography keys and make available
their public key for anyone to use. This key is then added to the sender's
key ring and used when a message is composed to that recipient. This key
ring is a collection of public keys of the people with whom you
communicate. By storing them on a ring, you keep them in a safe place that
is also convenient to access, rather than having to redownload them for
each message you wish to send.  Alternatively, when viewing a message
signed with a private key, this public key is used for decryption. PGP
messages are combinations of public key encrypted material, private key
encrypted material, and hash function output.

When an encrypted email is generated to a recipient, PGP generates a
random session key for a symmetrical algorithm (usually the IDEA cipher)
and encrypts the message using this key. The key is then encrypted using
the recipient's public key and prepended to the message. A hash is
generated on this encrypted message and appended, providing a basic
integrity check. The recipient then decrypts the session key using their
private key, decodes the message using the session key, and PGP verifies
it arrived intact by computing a hash value for the message.

When messages are signed, a cryptographically strong hash (such as MD5) of
the message is generated and then encrypted using the sender's private
key. This is then appended to the message. When a recipient wants to
verify the source of the message, they can use the public key of the
sender to decode the hash and compare the output to the computer hash.
When they match, the message is known to have originated from the
indicated source.

Signing and encryption can be combined, of course, providing for even
stronger security. 

Keys are verified through a web of trust. By certifying the veracity of
the key's owner, signers extend trust. 

-------------------------------------< SSH

The secure shell, or SSH, protocol operates much like PGP but on a stream
of data. It provides for similar kinds of protection that PGP offers,
including authentication, integrity and confidentiality. Account
information, including passwords and session data, is protected using
encryption.

Upon a connection initiation, the client and server exchange public keys.
This public key is then used to encrypt a session key, used with a
symmetrical cipher. The client sends this key to the server and
communications are then done using this symmetric algorithm key. This
provides for authentication of the server (if the server's public key is
known beforehand), and privacy, as well as integrity of the session. This
leads to protections against passive attacks, like password sniffing, and
active attacks, like session hijacking.

Stronger authentication can be achieved as well, using public key
cryptography on the part of the client. By presenting data that could only
have been encrypted using the private key, and the server knowing the
public key already for that user, the client can be authenticated, as
well.

Hashing functions are also used to provide integrity checking of the data
in the stream. Upon detection of an invalid sum of the data, the
connection is dropped to prevent any forged data from entering the system.

One additional feature of ssh not intuitively obvious is the defense
against a replay attack. This helps keep a defender from using an already
used ssh stream to gain entry back into a network, for example after
watching a legitimate session occur.

--------------< IPSec

IPSec is a set of extensions to the IPv4 protocol, and is standard in
IPv6, which allows for network layer security. By providing this kind of
facility at the network layer, applications can utilize the IPSec features
without any modification. The underlying system, such as the kernel or
various network devices, are the only ones that need to know about IPSec.

By utilizing all three cryptography components described above, public key
cryptography, private key cryptosystems and hash functions, IPSec provides
for authentication, integrity, confidentiality, and replay protection.
These can be combined to produce a strong network layer. Two new protocols
are specified, numbered 50 and 51, for ESP (or Encapsulating Security
Payload) and AH (Authentication Header), respectively.

In its basic form, transport mode, IPSec prepends the AH header
information before an IP packet. This is then stripped off when the other
endpoint is reached. By verifying that the packet originated with the
source it claims to have, strong authentication and integrity checking at
the packet level are achieved. Note that the AH protocol provides no
protection for the confidentiality of the data.

In tunnel mode, IPsec uses both the AH and the ESP headers to provide for
both authentication and confidentiality, as well as integrity checking.
The entire packet is encrypted, as well, providing for strong protection
of the data and the stream.

IPsec systems usually use either IPsec gateways on a network to
transparently provide network level encryption to the client machines, or
nodes have IPsec software installed. Routing tables determine if traffic
is to be encapsulated within the IPsec stream. Because it uses
encapsulation, any kind of IP packet can be protected by IPsec, including
any TCP stream, UDP communication, or even ICMP. All of this occurs
transparently to the upper layer applications and without any additional
configuration once the IPsec routing is set up.

The encryption parameters are specified either automatically when a new
IPSec connection is negotiated or manually. They include the encryption
and hashing algorithms, plus keys to protect the data. The Oakley key
exchange protocol operates very similarly to the SSH connection setup,
utilizing public key cryptography for mutual authentication, and then the
exchange of symetrical cryptography keys used in the encryption and
decryption of packets.

--------------[ Getting and Using Cryptography

Having explained the benefits and uses of cryptography, you may wish to
introduce a more extensive use of encryption in your applications or your
networking. This can be easily accomplished using a few tools.

One of the most extensive cryptographic toolkits, and one that is widely
regarded with great esteem, is the OpenSSL toolkit. It provides not only
SSL encryption features (the Secure Sockets Layer, ie HTTPS), but also a
wide variety of encryption algorithms and hash functions. These are
available on the command line, interactively, and via API calls to
libcrypto.

The major problem with OpenSSL is that it operates only at the application
layer, in userspace. While networking applications can take advantage of
OpenSSL's encryption routines, they have to be aware of these calls and be
built around them. At the networking layer, where the applications do not
have to be aware of the encryption mechanisms, you have to go to the
kernel. To do this, IPSec functionality would have to be integrated into
the kernel. On Linux, the FreeS/WAN effort has made great strides to
getting IPSec functions integrated into Linux. In BSD kernels, the KAME
project has been especially popular and effective, providing not only
IPSec but also IPv6.

Please see the resources section of this article for more information.

----------------[ Limitations of Cryptography

It's important to note that encryption will not stop a good number of
security attacks, and that it's only a piece of a larger security suite.
For instance, buffer overflow attacks, the popular string format attack,
and most denial of service attacks cannot be thwarted by encryption.

One tempting solution is to require strong authentication for every
service, including ones open to the world. This isn't realistic, though,
as you may not need that kind of authentication from people. After all, it
is an open resource for everyone. Secondly, most denial of service attacks
don't require any form of authentication, thus defeating the whole purpose
of using cryptography as an authentication mechanism.

It's also vital to state, quite clearly, that often it is said that, "with
encryption, an attacker will be unable to sniff your data from the wire."
That's not true. In fact, encrypted data can still be sniffed from the
network. The main problem is then that the attacker will have to decode it
to make use of the data, if they are after the payload of the data. Due to
weaknesses in certain algorithms or their implementations, an attacker
could, in a timely fashion, decode the data and make use of the stolen
information.

If they are interested solely in traffic analysis, ie two companies
talking more frequently, which may signal a business action between them,
the standard use of cryptography will not be of assistance in thwarting
that type of attack. While encryption can be integrated into a system to
thwart traffic analysis, this is beyond the scope of this article.

-------------[ Conclusions

This has been a brief tour of the principles of cryptography and their
application to the protection of data. By applying cryptography in the
right areas, data and access can be protected and controlled. However,
crypto is not the perfect solution, but only one component of a complex
security policy.

-----------------------------[ Resources

As mentioned above, here are some cryptographic toolkits:

OpenSSL:        http://www.openssl.org/
KAME:           http://www.kame.net/
FreeS/WAN:      http://www.freeswan.org/

To learn more about cryptography, check out this reading list:

Applied Cryptography: considered a great, useful handbook on the subject.
http://www.amazon.com/exec/obidos/ASIN/0471117099/ref=lm_lb_6/107-8383142-92797
13

Cryptography & Network Security: a great introduction to ciphers, their
applications (ie SSL, Kerberos), and related topics.
http://www.amazon.com/exec/obidos/ASIN/0138690170/ref=lm_lb_7/107-8383142-92797
13

PGP : Pretty Good Privacy: though dated, a thorough coverage of the
subject.
http://www.amazon.com/exec/obidos/ASIN/1565920988/ref=lm_lb_5/107-8383142-92797
13

Secure Shell : The Definitive Guide: a wonderful tome on this great tool.
http://www.amazon.com/exec/obidos/ASIN/0596000111/ref=lm_lb_3/107-8383142-92797
13

IPSec: The New Security Standard: a good intro to IPSec and its
complexities.
http://www.amazon.com/exec/obidos/ASIN/0130118982/ref=lm_lb_1/107-8383142-92797
13

A brief introduction to differential and linear cryptanslysis (which is
quite difficult and math intensive): 
http://www-computerlabor.math.uni-kiel.de/~fjacobs/dlcrypta/dlcrypta.html