# Base Conversion

• There are several algorithms that are needed to work with integers in a computer.
• Most obvious problem: computers store binary (0s and 1s). That's all we have to work with.
• All other data types must be build from bits.
• Theorem: For an integer $$b>1$$ (the base), a positive integer $$n$$ can be represented uniquely as $n=a_kb^k + a_{k-1}n^{k-1} +\cdots + a_1b^1 + a_0b^0\,,$ where $$k>0$$ is an integer and each $$a_i$$ is an intger $$0\le a_i\lt b$$.

• For $$b=10$$, you already knew this. The number 1253 is $1253 = 1\cdot10^{3} + 2\cdot10^{2} + 5\cdot 10^1+3\cdot 10^0\,.$
• That is, each $$a_i$$ is the decimal digit in each position.
• Decimal (the way you're used to counting) is also called “base 10”.
• The theorem implies that we can find an representation of any integer is any base.
• e.g. with base 7, we can represent 1257 as $1257 = 3\cdot 7^3 + 4\cdot 7^2 + 4\cdot 7^1 + 0\cdot 7^0\,.$
• We'll write this as a number with a subscript to indicate the base: $1253_{10} = 3440_7\,.$
• If we represent a number in base 2, we will get a binary representation: only 0s and 1s.
• With our earlier example, we get $$1253_{10} = 10011100101_2$$
• But how did we get it?
• This algorithm will convert a value ($$n$$) to any base ($$b$$):
• If we try it on the base 7 example above, the variable change as follows for each iteration: (all values base 10)
• We can read the base $$b$$ expansion up the $$n\mathop{\mathbf{mod}}b$$ column.
• If we do this with base 16, we get:
• We don't have enough digits to represent the 14. We'll resort to letters: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F.
• Then, $$1253_{10}=4\mathrm{E}5_{16}$$
• Base 16 is often called hexadecimal.
• Other base occasionally used in computing: base 8 or octal.

# Application: Base64 Encoding

• Many older email servers could only deal with ASCII characters: no non-English, no binary data (attachments, etc).
• When it became necessary to do those things in email, that data needed to be encoded somehow so that it wouldn't be mangled by older servers.
• Characters that are definitely safe in any system: letters A–Z and a–z, digits 0–9.
• That's 62. Add in “+” and “/” and we have 64 values.
• 64 unique values means we have a way to encode base 64 integers.
• Take the message you're trying to send, convert to base 64, and represent that using those characters.
• Since 64 is a power of 2, the algorithm can be simplified a little.
• Encoding in base64:
• Take the bits you have to encode. Break into 6-bit chunks. Each 6-bit string can be encoded as one base64 character.
• Convert the 6 bits (e.g. $$100110_2$$) to a number ($$2^5+2^2+2^1=38_{10}$$).
• Take the character that maps to that value (38 is “m”).
• Do that for each 6-bit chunk.
• Fix a little if you don't have a multiple of 6 bits in the original message.
• For example, the ASCII string “Hello world!” gets encoded as “SGVsbG8gd29ybGQh”.
• Longer messages are typically split across lines (long lines were another problem in old mail software):
RGlkIHlvdSByZWFsbHkgZGVjb2RlIHRoaXM/IEdvb2QgZm9yIHlvdS4gSXQncyBuaWNlIHRvIHNl
ZSBzdHVkZW50cyB0YWtpbmcgaW5pdGlhdGl2ZS4KCk1heWJlIHlvdSBleHBlY3RlZCBzb21lIG1p
ZHRlcm0gaGludHMgaGVyZT8gU29ycnkgYWJvdXQgdGhhdC4=
• Base64 is also used in data URLs in HTML: [from Wikipedia “Data URI scheme]
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
• Result: # Base-n Arithmetic

• It turns out, you already know how to do basic arithmetic on base-$$n$$ values.
• You learned it years ago: the teacher just always used base 10.
• We just have to adapt to other bases.
• Adding, the elementary school way: $$849+423$$
• What we did:
• Add up each column, starting from the right.
• If the total is 10 or more, subtract 10 and carry one.
• Include the carry when adding the next column.
• Adding any other base: replace “10” with “n” in the above.
• e.g. add $$1011_2+0010_2$$: (remember $$2_{10}=10_2$$ and $$3_{10}=11_2$$)
• Did that work?
• We started with $$11_{10}+2_{10}$$ and got $$01101_2=13_{10}$$. So, yes.
• e.g. add $$\mathrm{F}227_{16}+3\mathrm{A}65_{16}$$:
• Again, we can check by converting to decimal: $$\mathrm{F}227_{16}+3\mathrm{A}65_{16}=61991_{10}+14949_{10}$$ and got $$12\mathrm{C}8\mathrm{C}_{16}=76940_{10}$$.
• Multiplication can be adapted too: $$849\times 23$$ in base 10 looks like this
• Again, we do the same thing, using the new base instead of 10: multiply each digit, shifting once each time; add up the results.
• e.g. multiply $$1011_2+0101_2$$:
• That was $$1011_2=11_{10}$$ times $$101_2=5_{10}$$ with result $$110111_2=55_{10}$$. Again correct.

# The Euclidean Algorithm

• The Euclidean algorithm is an algorithm to find the greatest common divisor of two integers.
• i.e. given integers $$a$$ and $$b$$, find the largest integer that divides both.
• It relies on this theorem…
• Theorem: For integers $$a$$, $$b$$, $$q$$, and $$r$$ with $$a=bq+r$$, $$\gcd(a,b)=\gcd(b,r)$$.

Proof: Suppose $$d$$ is a common divisor of both $$a$$ and $$b$$. Then from earlier theorems about divisibility, $$d$$ also divides $$a-bq=r$$. Thus any common divisor of $$a$$ and $$b$$ is also a divisor of $$r$$.

Similarly, consider a factor $$e$$ that divides both $$b$$ and $$r$$. Then $$e$$ also divides $$bq+r=a$$. Thus the set of common divisors of $$a$$ and $$b$$ and of $$b$$ and $$r$$ are the same. Then the largest element of those sets must also be identical.

• Note that $$r$$ here could be $$a\mathop{\mathbf{mod}}b$$, but doesn't have to be.
• This is the algorithm:
• From the above theorem, the GCD of $$a$$ and $$b$$ doesn't change as the loop runs.
• We exit after $$a\mathop{\mathbf{mod}}b = 0$$. Then $$b\mid a$$, so the GCD of the two is $$b$$ (which has shifted into $$a$$ by the time we exit the loop).
• An example to find $$\gcd(123,345)$$:
• So, $$\gcd(123,345)=3$$.
• What is the running time of Euclid's algorithm?
• $$O(\log( \min(a,b)))$$, but not for any obvious reason. (The text does prove it in section 4.3.)