CPS100 • Introduction to Computers


Lakeland College • Japan Campus

Bits and Bytes

We just learned how to count in binary (base 2). All the digits are 0s and 1s. These are called "binary digits," or "bits" for short. One bit is a "0" or a "1."


A 3-digit number lock
3-digit Number Lock

Combinations

Now we should think about how many number combinations can be made with a certain number of digits. For example, let's say that you have a suitcase with a 3-digit number lock. How many combinations are there? Easy: 1000. You start with 000, the go up through 001, 002, 003, etc. until you reach 999. From 000 to 999 is 1000 combinations.

There is a simpler way to put it: the number of combinations is the base to the power of the number of digits, or based. If you are in base 10 and you have a 3-digit lock, then the number of combinations is 103, or 1000.

Base 2 Combinations

Now let's do the same thing in base 2. If you have 4 bits, how many different combinations (numbers) can you make? We're in base 2, so that would be 24, or 2 x 2 x 2 x 2, or 16.

We can test that by just counting from 0000 to 1111 and seeing how many numbers we make:

0000   =   0
0001   =   1
0010   =   2
0011   =   3
0100   =   4
0101   =   5
0110   =   6
0111   =   7
1000   =   8
1001   =   9
1010   =   10
1011   =   11
1100   =   12
1101   =   13
1110   =   14
1111   =   15

That is 0-15, for a total of 16 numbers!

Now that we know this system, we can see the combinations more easily:

1 bit 21 2 combinations
2 bits 22 4 combinations
3 bits 23 8 combinations
4 bits 24 16 combinations
5 bits 25 32 combinations
6 bits 26 64 combinations
7 bits 27 128 combinations
8 bits 28 256 combinations

The Byte

Now that we know about combinations, we can look at what a Byte is. Basic definition: a Byte is 8 bits. For example, 10010110 is an eight-bit number, and it is a Byte.

The next question is, "Why eight bits?" The simple answer is that it is good for typing.

Remember, a computer can only understand binary. So, what happens when you type the letter "M" on your keyboard? They computer does not know "M."

What happens is that the keyboard translates "M" into binary, specifically 1001101 (the number 77 in base 10). 1001101 is sent to the computer, which it can understand.

Translating Characters

Think about this: how many letters and other characters do we need to give codes to? Let's see if we can count them up: 26 lowercase letters, 26 uppercase letter, 10 digits, maybe 30 or so punctuation marks and symbols... we're almost to 100. But then there are a lot of special characters for non-English western languages, like the ñ in Spanish, or vowels with accents like é.

All in all, 256 combinations are enough to cover all of those. 256 combinations is 8 bits, so 8 bits is one Byte!

One code used to translate this is called ASCII, and some of the codes look like this:

 Character ASCII Binary Code
A 100 0001
B 100 0010
C 100 0011
D 100 0100
a 110 0001
b 110 0010
c 110 0011
d 110 0100

You may have noticed that the ASCII numbers I have shown you are 7 bits, not 8 bits. That's because ASCII is an older system, which used a different kind of Byte, a 7-bit Byte. Today, Bytes are 8-bits. An 8-bit Byte is also called an octet.

A Mess of Text

One problem with computers is that there are dozens of different systems to translate text to binary code! ASCII is usually recognized as a historical base; Windows and Mac generally use the same ASCII codes for basic letters, numbers and symbols used on keyboards—but not exactly the same.

It gets worse: Mac and Windows use completely different codes for the non-ASCII characters. Mac OS X uses Mac OS Roman encoding, and Windows uses Windows-1252 encoding. More modern character encoding systems are even more complex, and there are so many variations that it is difficult to understand them!

However, there is hope: UTF-8 is a popular character encoding system widely used today. It is a system based on Unicode, a code which can represent almost any language. Any character, any symbol, any emoji can be expressed with Unicode, and with UTF-8. It is even compatible with ASCII.

Tech in Your Life

Have you ever seen a browser page that looks like this:

A Mojibake screen

When you see that, you have to go to the "View" menu of your browser, and set the correct text (character) encoding. If you do it right, the page will clear up:

A Mojibake screen cleared up to normal text

What Is That?

In Japan, it is called "mojibake"; there is no common English term. Mojibake happens when the wrong character set is used to display a web page. For example, the page shown above uses the "Shift-JIS" encoding system, but is displayed as mojibake when viewed with the Mac OS Roman character set.

There is another reason as well. Take a look at one small example of text from the two versions of the page shown above:

Clean and Mojibake text

That is the exact same text, first clearly, then as mojibake. Notice that the Japanese version is 3 characters: ホーム; then notice that the mojibake version is 6 characters: ÉzÅ[ÉÄ.

That is not a coincidence. Remember, western encoding systems use 1 Byte, with 256 combinations, to encode text characters. Japanese, like other Asian languages, have far more than 256 characters! Joyo kanji alone has 2136 characters. 1 Byte is not enough.

As a result, Japanese must be encoded with more than one Byte. The Shift-JIS is a double-byte character system, meaning that 2 Bytes are used for each character. 2 Bytes is 16 bits; 16 bits has 65,536 combinations, enough for all kanji and a lot more.

As it happens, the character for ホ has the Shift-JIS code 1000001101111010.

However, when you switch to Mac OS Roman, your computer is looking for 1-Byte characters, so it splits 1000001101111010 into 10000011 and 01111010. Those two characters are—you guessed it— "É" and "z."

B or b?

Now you know what a bit is, what a Byte is, and where they come from. Next, let's look at how they are used.

First, how they are written: bits are written as b (a small "b"), while Bytes are written as B (a capital "B").

Normally, bits are used to describe the speed of data transmission. For example, if you go to an ISP (Internet Service Provider) and get a connection to the Internet, you may ask, "How fast is it?" The ISP will answer you in bits per second, or bps. A common fiber-optic connection, for example, may be 100 Mbps, or 100 million bits per second.

Many people may mistake bps for Bps, but the two are very different. If you truly have a download speed of 100 million bits per second, that means you are getting 12.5 million Bytes per second—only 1/8th the speed you might think!

On the other hand, Bytes are used to describe an amount of data. For example, you might have a photograph which is 2 MB, or 2 million Bytes.

In everyday life, we almost always use Bytes. In the rare cases where we see "bits" used, we must translate. 1 Byte is 8 bits; 1 bit is 1/8th of a Byte.

Going Metric

Next, there are the prefixes used for describing large numbers. We do not usually say "a million Bytes"; instead, we say "megabyte," and we spell it "MB." Here are the different prefixes:

Prefix Term Abbreviation Metric Bytes
  Byte B 1
Kilo Kilobyte KB 1,000
Mega Megabyte MB 1,000,000
Giga Gigabyte GB 1,000,000,000
Tera Terabyte TB 1,000,000,000,000
Peta Petabyte PB 1,000,000,000,000,000
Exa Exabyte EB 1,000,000,000,000,000,000
Zetta Zettabyte ZB 1,000,000,000,000,000,000,000
Yotta Yottabyte YB 1,000,000,000,000,000,000,000,000

Generally, people do not know what these terms are until they start being used in personal computers. The first few, kilo and mega, had been known for a long time because there were used commonly—for example, a kilometer, or a megaton.

However, giga did not really become well-known until computer storage was big enough to hold a gigabyte, which was in the mid- to late-1990's.

Before that time, people did not know what "giga" meant, and often mispronounced it as "jiga." For example, in the 1985 movie Back to the Future, Doc Brown needed to produce 1.21 gigawatts of electricity; Marty McFly, meanwhile, had no idea what that meant:



How Much Does a Byte Weigh?

Now you know what the words are. But do you understand what they mean? For example, how many songs fit in a gigabyte? If you want to store 30 minutes of video recorded on your cell phone, will a 4 GB USB flash unit be enough?

The answer is not completely easy, because not every book, photo, song, or movie is the same size. However, here is a rough estimate:

Item Size Notes
Essay 15 KB This might be a 1,500-word essay saved in .docx format.
Book 1 MB The book would be plain text (no formatting, no images) and would be about the same as a 500-page paperback.
Photo 3 MB Assuming an 8-megapixel image taken with an iPhone 5 and saved as a compressed JPG file.
Song 4.5 MB This would be a 3-minute song saved in MP3 format at medium-high quality.
Personal Video 250 MB Assuming a 2-minute video taken at Full HD resolution.
Movie 1.5 GB Assuming a 120-minute movie at Full HD with strong H.246 compression

From this chart, you can perhaps get a better idea of what the terms and amounts mean. For example, you could conclude that a 4 GB USB flash drive is just enough to hold half an hour of iPhone video. But it could also hold almost 900 songs, more than 1300 photos, about 4000 books, or millions of Essays!

Terms to Know

bita 0 or a 1; usually used to describe the speed of data transmission.
bpsbits per second.
Byte8 bits; usually used to describe amounts of data.
octetan 8-bit Byte.
basethe number which is the foundation of a counting system.
character encodinga system in which characters are represented by codes, such as binary numbers.
ASCIIone of the early character encoding systems; most modern character encodings include ASCII for the first 128 characters.
Unicodea system with more than 110,000 characters over 100 writing systems.
UTF-8the most popular current character encoding; it includes ASCII, and displays all Unicode characters.
mojibakethe "nonsense" characters that appear when you view characters using the wrong encoding system.
kilo1,000 (a thousand).
mega1,000,000 (a million).
giga1,000,000,000 (a billion).
tera1,000,000,000,000 (a trillion).
peta1,000,000,000,000,000 (a quadrillion).
Previous Chapter Chapter Quiz Next Chapter