Public Key Encryption

24feb12 revised 8mar16 typos 12feb18
\begin{document} \maketitle \section{Introduction} Here we continue and conclude our lesson on modular arithmetic, culminating in a proof of the RSA public key encryption algorithm. We also take the opportunity to review the mathematics you learned in the past two weeks. But because rote repetition is boring, each of the three chief topics are reviewed by considering a different proof, sharpening the result, and finding an extension of the theorem respectively. Sections 2 and 3 are elaborations not strictly necessary to understand how the RSA public key encryption works. Section 2 takes another look at the Prime Factorization Theorem, and how it was well known before the rigorous proof by strong induction we studied was developed. Section 3 deals with a more explicit algorithm, suitable for a computer, to solve this the Diophantine equations, $ex - my=1$, given $1 < e < m $ and $gcd(e,m)=1$ , so that $1 < x < m$ and $y > 1 $. From what we have learned from the form of all solutions to a Diophantine equation we can deduce that such a solution can always be found. But, one wants more than an existence proof when dealing with a computer. However, once we accept this and Fermat's Little Theorem (FLT) treated in the previous lesson (which we quote using the current notation for the convenience of the reader) we can tackle the RSA concept directly. For this purpose, you can skip directly to Section 4. \section{Prime Factorization and the Sieve of Eratosthenes} The Greeks certainly know the prime decomposition theorem but they did not know (or care about about) the principle of strong induction. They used this argument dure to Eratosthenes of Cyrene, 276-195 BC. He was a mathematician, poet, athlete, geographer, astronomer and music theorist. He measured the circumference of the earth, the tilt of the earth's axis, the distance to the sun, and he invented the leapyear. Note that no educated person (until the Dark Ages) ever doubted that the earth was a sphere! Eratosthenes also enumerated the primes in the following ingenious way. His \textit{ Sieve} may be remembered by this ditty from Pohl and Kornbluth's \textit{ Drunkard's Walk} \begin{quote} Strike the twos, and strike the threes; The Sieve of Eratosthenes. And when the multiples sublime, The numbers that are left are prime. \end{quote} The word "sublime" is used here to mean "evaporate" or go away. The concept is this. Imagine all numbers in a row, $ 2,3,4,.....$. We start with the prime $2$ because $1$ is neither a prime nor a composite. Now "strike" every multiple of two. We shall also remember that it is a multiple of two, maybe by writing a $2$ under it. For typographical reasons, I won't strike the composites but overline them, I will boldface the primes already found. \begin{itemize} \item $ \mathbf{2}, 3, \bar{4}_2, 5, \bar{6}_2, 7, \bar{8}_2, 9, \bar{10}_2, 11, \bar{12}_2, ... $ \item $ \mathbf{2}, \mathbf{3}, \bar{4}_2, 5, \bar{6}_{23}, 7, \bar{8}_2, 9_3, \bar{10}_2, 11, \bar{12}_{23} ... $ \item $ \mathbf{2}, \mathbf{3}, \bar{4}_2, \mathbf{5}, \bar{6}_{23}, 7, \bar{8}_2, 9_3, \bar{10}_{25}, 11, \bar{12}_{23} ... $ \item $ \mathbf{2}, \mathbf{3}, \bar{4}_2, \mathbf{5}, \bar{6}_{23}, \mathbf{7}, \bar{8}_2, 9_3, \bar{10}_{25}, 11, \bar{12}_{23} ... $ \item ... \end{itemize} As you see, the \textit{ next prime} is always the first number not yet struck out. Also, by remembering the prime that struck, we get the prime factors. But we don't get the repeats. I suppose we could ammend the Sieve by insisting on finding how many time $2|4$, and $2|8$ etc. So the Sieve not only finds the primes but also the prime decomposition of the composite numbers. \section{Euclid's Algorithm and back substitution} This time we specify that $a > b > 1$ before we apply long division. \subsection{Reduction to solving ax-by=1 for coprime coefficients} Let start over by writing the Diophantine equation we want to solve as a difference. \begin{eqnarray*} ax - by &=& c \mbox{ where } (a,b)|c \mbox{ of course }\\ \frac{a}{d}x - \frac{b}{d}y &=& \frac{c}{d} \mbox{ where } d = (a,b)\\ a'x - b'y &=& c' \mbox{ where } (a,b)=1, \mbox{ this is solved by solving }\\ a'x' - b'y' &=& 1 \mbox{ and multiplying } x=cx', y= cy' \\ \end{eqnarray*} \subsection{Recursive long division with back substitution} \begin{eqnarray*} \mbox{For } a>b>1 & solve & ax - by = 1 \mbox{ for positive } x, y:\\ a= bq + r&\mbox{ substitute }& (bq+r)x - by = 1\\ &\mbox{ rearrange } & -b(y-qx) + rx = 1 \\ &\mbox{ relabel } & -bu + rv = 1 \\ &\mbox{ where } & u=y-qx \\ &\mbox{ and } & v =x \\ &\mbox{ so } & x = v \\ &\mbox{ and } & y = u + qv \\ \end{eqnarray*} Let's check that all the hypotheses are satisfied for $b,r$ replacing $a,b$. \begin{itemize} \item We have that $b > r$ like $ a > b$ \item Iff $r=1$ then Euclid's algorithm ends with $1=gcd(a,b)$. \item If $b > r > 1$ we are tempted to just "do it again", but.... \item We must also check the last hypothesis, that $(b,r)=1 \item $gcd(b,r)=gcd(b,a-bq) = gcd(b,a) = 1 $ proved earlier. \end{itemize} We have reduced the problem of solving the special Diophantine equation to a similar equation, but with both coefficients smaller than before: \begin{eqnarray*} \mbox{For } b>r>1 & solve & -bu + rv = 1 \mbox{ for positive } u, v:\\ \end{eqnarray*} There is an additional important observation. If $u,v$ are both positive then so are $x,y$. But note that the signs have changed from $+-$ to $-+$. They change back in the next step. So, depending whether there are an even number of steps or not, we may end up having solved $-ax_0 + by_0 = 1$ with positive solutions. Recall that, by another theorem we proved, all other soliutions have the form $ x_t=x_0 - tb, y_t=y_0 - ta.$ By successively subtracting $b$ from $x_0$ we reach a negative $x_t$, and so $ x = -x_t, y= -y_t $ solve the original problem with positive solutions.
Questions 1: Why must $y_t$ also be negative as soon as $x_t$ is?
\section{Fermat's Little Theorem generalized} Recall the FLT, which says that if $gcd(w,p)=1$ then $w^{p-1} \equiv_p 1$. We next generalize Fermat's Little Theorem by applying it with two different primes $p,q$ and $gcd(w,pq)=1$, so that neither $p$ nor $q$ divides $w$. Here is the line of reasoning that proves that $w^{(p-1)(q-1)}\equiv_{pq} 1 $. \begin{eqnarray*} w^{p-1}&\equiv_p& 1 \mbox{ applying FLT to } (w,p) \\ \therefore w^{(p-1)(q-1)}&\equiv_p& 1^{q-1} = 1 \\ \mbox{ But } (w^{p-1})^{q-1} &\equiv_q& 1 \mbox{ applying FLT to } (w^{p-1}, q) \\ \therefore p | (w^{(p-1)(q-1)}-1) & \mbox{ and }& q | (w^{(p-1)(q-1)}-1) \\ \mbox{ both being prime }& & pq | (w^{(p-1)(q-1)}-1) \\ \therefore w^{(p-1)(q-1)}&\equiv_{pq}1 \\ \end{eqnarray*} This is the same conclusion as FLT for a composite modulus $m=pq$ which has just two prime factors.
Question 2: State a further generalization for the case that $w$ is relatively prime to a modulus $m$ whose prime decomposition has no repeated primes.
\section{Application to the RSA encryption } In 1978, Ron Rivest, Adi Shamir, and Leonard Adleman of MIt published an ingenious method of encrypting messages sent over the internet. It is named \textit{ The RSA Public Key Encryption }. It works like this. Suppose you want to receive messages sent to you securely from whoever wishes to do so, so that no one but you can decipher them. Thus only you and your (possibly anonymous) correspondent knows what secrets she is sending you. Morever, everyone who wishes can read the encrypted message and not be able to do anything with it. Amazing, isn't it? Say you are collecting sensitive information from anyone willing to send some to you. As Snowden might have done if you worked for the newspapers. But you don't want the NSA, or the CIA, or anyone else to be able to decipher them, except you. And, you do this by publishing a \textit{ public key} consisting of three numbers $N,e,\ell$ where your correspondents (and any anybody else) can read them. \begin{itemize} \item $N$ is the product of the two largest primes you can afford to buy or generate: $N = pq$. \item The exponent $e$, which you have chosen to be relatively prime to $m = (p-1)(q-1)$ \item The bitlength $\ell$ of the words in the transmission, which you chose so that $2^\ell < \mbox{ both } p,q $. \end{itemize} Note that unless the third party can factor $N$ to find the $p,q$, they also cannot gain any useful information from the other two public numbers. For the exponent, you calculate a solution to this Diophantine equation \[ ed - mk = 1 \] with positive $d,k$. Note that $d,k$ and both $e,d < m$ are known only to you. You can forget $k$, but don't misplace $d$. Note that \[ d \equiv_m} e^{-1} \], the multiplicative inverse of $e$ modulo $m$.
Question 3: How would you find such a $d$ ?
\subsection{The encryption} To encode each word $w$ in stream of words, your correspondent's computer calculates \[s = w^e\%N \mbox{ i.e. the smallest } s \equiv_N w^e \mbox{ (encryption) }\], and sends you $s$, not $w$ which is supposed to be secret. This can safely be public because without more information, even the NSA supercomputers are powerless. Still, $s$ it encodes the secret $w$. Note, there are many potential solutions for $s$ because the same integer is congruent to many other integers for a given modulus $N$. But there is only one of them which is between $1 < s < N$. Why, and how would you find it? \subsection{The decryption} When you receive $s$, your computer calculates that \[ w = s^d \% N \mbox{ (decryption)} \], again choosing the only number congruent to $s^d$ between $1$ and $N$. Recall that s $s^d \equiv_N w$. But why should this be exactly $w$ and not something else congruent to it? (Duh, you want to know $w$ not some other number differing from it by a multiple of $N$.) \subsection{How does this work?} Recall that $e$ was choses relatively prime to $m=(p-1)(q-1)$. By the sharpened (computer friendly) version of Euclid's algorithm above, \begin{eqnarray*} s &\equiv_N& w^d \mbox{ as received }\\ \therefore s^d &\equiv_N& w^{ed} \\ &=& w^{ 1 + mk}\\ &=& w w^{mk}\\ &=& w (w^{(p-1)(q-1)})^k\\ &\equiv_{pq} &w (1)^k \mbox{ ..............by FLT++ }\\ &= &w \\ \therefore s^d &\equiv_N& w \\ \end{eqnarray*} Bingo. This whole procedure can be automated. Your correspondent writes a stream of ascii whose numerical equivalents are collected into a stream of numbers. Chopped into $\ell$ sized words $w$, her computer encrypts the $w$'s using your public key. She sends them to you, and your computer decrypts, knowing the secret powers to raise them to modulo $N$, that magical product of two huge primes $p,q$. But beware. Even if your Social Security number happens to be a prime, don't use it along with another prime whose product is your public N. Once both factors $p, q$ are known, the game is over, and you land in jail. \end{document}