May 25, 2011, 5:45 a.m.
posted by gelassen
Just when you thought you had keys all figured out, here I go and throw another one in the works. Where the previous section focused on symmetric keys and algorithms, this section takes a look at asymmetric keys and algorithms.
One of the biggest problems with symmetric key algorithms is that it is difficult to both share the key and protect it. Because you are using the same key to encrypt and decrypt the data, there’s no way around the fact that you have to share the key. Like the key to your front door, it’s possible for someone to steal the key or copy it.
Asymmetric keys take care of the problem of distributing keys by making two separate keys that are mathematically connected. You use a “private” key that you never reveal to anyone to decrypt the data you’ve received and the recipient uses their corresponding “public” key that everyone can have to encrypt the data. Actually, that explanation is a bit simplistic, but you get the idea. A similar idea is a bank safety deposit box. The bank has one key that they keep private, and they have a public key that they give to you. When you go to unlock your box to put your valuables in it, you need both the bank’s key and your key in the locks at the same time to be able to open the door to the box.
Asymmetric keys use prime numbers as their starting point. The first part of the process is to create a very large number by multiplying two very long prime numbers together. Prime numbers, if you remember your high school algebra, are numbers that can be divided only by the number 1 or the number itself. One example of a prime number is the number 7. No matter how hard you try, 7 can be divided only by 1 or 7. Now let’s say that you multiply two prime numbers together. If you multiply 5 x 7, you get 35. It’s not too hard to work backwards from the number 35 to figure out which two prime numbers were used to create it. But, if you used a number that was over 100 numerals long, how hard do you think it would be to find the two numbers that were used to create it? Go ahead and get your paper and pencil. I’ll wait.
You probably came up with the same answer I did: It takes an awfully long time! Even if you go through all this trouble, you have only one number. You still have a long way to go. From that long number, you derive a portion of it (via a mathematical computation) for the private key, and from the private key you derive a portion of it for the public key. The elegance in this process is that you can’t reverse engineer the public key and obtain the private key.
Although this private/public key concept is considered strong, you need at least a –2,304-bit key to achieve the same level of security of a 128-bit symmetric algorithm. Asymmetric algorithms are slow in processing and it is impractical to use them to encrypt large amounts of data. Symmetric algorithms can be approximately 1,000 times faster than asymmetric ones. Therefore, you usually see asymmetric algorithms used to protect small amounts of data such as e-mail messages and - small data files such as attachments to messages.
This is probably the most recognizable asymmetric algorithm, due in part to the very large corporation that stands behind it — RSA Data Security. RSA comes from the last names of the inventors, Ron Rivest, Adi Shamir, and Leonard Adleman, who created the algorithm in 1978. To date, it is the only asymmetric algorithm in widespread general use that is used for private/public key generation and encryption. Two other algorithms call ElGamel and Rabin also generate two keys and encrypt, but you don’t see them used as often as RSA. Because most of the other asymmetric algorithms generate only two keys, many programs use a combination of asymmetric algorithms and symmetric algorithms to protect data.
RSA uses prime numbers to create each of the keys (private & public), but using those keys to encrypt a large amount of data is impractical due to the amount of time it takes a computer to process the encryption. More often than not, an encryption program that uses RSA encrypts the data with a symmetric algorithm such as RC4 (or DES, or IDEA, and so on). Then the symmetric key created by RC4 is encrypted with the recipient’s public key. When the recipient gets the message, she uses her private key to decrypt the RC4 key, and when the RC4 key is decrypted, the bulk of the message can be decrypted.
This algorithm is commonly known as DH, which represents the last names of two of the inventors. However, if you were to meet Whit Diffie or Martin Hellman, they would be sure to point out that they couldn’t have done what they did without the work of Ralph Merkle. It was only bad luck that Diffie and Hellman’s paper appeared in print before Merkle’s did, even though they all arrived at the same idea about the same time. So, out of respect for the authors, please remember Merkle.
There is one huge difference between RSA and DH in that the DH algorithm is not used for encryption. Huh? That’s right — DH is not an encryption algorithm; rather, it is a key exchange algorithm. Diffie, Hellman, and Merkle were more concerned with the problem of sharing a key over an insecure channel than they were about the encryption of data, so they came up with a solution that created a way to share a secret.
Here’s how it works:
Natasha has a DH key pair consisting of a private key that she keeps to herself and a public key that she sends to Boris.
Boris receives Natasha’s public key and uses the DH algorithm to create a temporary private key and a temporary public key for himself. (Note that Boris’s keys have something in common with Natasha’s public key.)
Boris now takes his newly created private key and Natasha’s public key and has the DH algorithm generate a secret number.
Boris uses the secret number (instead of a RNG or a PRNG) to generate a key just for this transaction. This is called a session key.
Boris uses the session key to encrypt the data and sends it to Natasha along with his temporary public key.
When Natasha receives the encrypted message she can derive the session key because her keys and Boris’s keys have the same derivative — her public key.
With the session key pulled out of the message, Natasha can now decrypt the message.
Please note that when I say, “Natasha does this,” and “Boris does that,” I really mean that they are using their encryption program to carry out these actions. Natasha’s and Boris’s only real actions are to respond to any dialog boxes that pop up during the process. Of course the encryption program has the DH key exchange algorithm included in it, or none of this would be possible.
Let me say this from the start: PGP (Pretty Good Privacy) is not an encryption algorithm, although many people tend to think of it that way. PGP is actually an encryption program that uses both symmetric and asymmetric algorithms to encrypt data. It is most often used for e-mail because it has some very nice e-mail program plug-ins, but it can also be used for disk encryption and to securely erase data from a disk.
PGP is a hybrid cryptosystem. When a user encrypts plaintext with PGP, PGP first compresses the plaintext. Then it creates a session key, which is a one-time-only secret key. This session key encrypts the data. When the data is encrypted, the session key is then encrypted to the recipient’s public key. This public key-encrypted session key is transmitted along with the ciphertext to the recipient.
I cover PGP in more detail in Chapter 8, where you create your private and public keys, encrypt a message, and send it on to a friend who will then decrypt the message.
Elliptical Curve Cryptography, also known as ECC, is a very different form of asymmetric cryptography. It’s not used very much but when it is, it tends to be used to encrypt large amounts of data. That’s because ECC computes very quickly, and it doesn’t tie up a lot of processing time. ECC is also newer and not as well understood as some of the other encryption algorithms. For that reason, fewer researches have spent very much time trying to attack them to find their weaknesses. I am mentioning them here because I’ve begun to see secure data storage programs use ECC to encrypt the huge amounts of data that sits in large server farms.
ECC starts with a curve drawn on a graph. Remember your high school math projects where you had to draw a curve and then plot its x and y axes? Well, ECC starts out the same way. After the curves have been created, lines are drawn to intersect with the curves. Have a look at Figure to see what I mean.
The intersecting points on the curve are given numbers. Of course they are plotted numbers, so there are two: the x-coordinate and the y-coordinate. Going back to school again, Figure shows an example of the plotted points, without the curves and intersecting lines marked:
Now, add a bunch more dots to the graph as the seed, give them all numbers, and start adding and adding and adding all the numbers. The trick to ECC is that you have to know the shape of the original curve, the points where the lines intersect, and which point is the starting point for the addition to begin. From all of that you can create a large number and then compute a Diffie-Hellman key pair. ECC is simple enough to create, but it seems very difficult to attack.