Chapter 3: Binary and Data Representation

Introduction

Computers don't understand numbers the way humans do. At the lowest level, everything in a computer is represented as sequences of electrical signals - on or off, high or low, 1 or 0. This chapter explores how computers represent all types of data using just these two states: binary.

Why This Matters

Understanding binary and hexadecimal is essential for system programming. When you read memory dumps, debug low-level code, work with network protocols, or write device drivers, you'll encounter binary and hexadecimal constantly. Mastering these number systems and understanding data representation is non-negotiable for system programmers.

How to Study This Chapter

Practice conversions - Convert between decimal, binary, and hex by hand
Use tools - Python's bin(), hex(), int() functions are helpful
Write it out - Drawing bit patterns helps understanding
Think in patterns - Recognize common patterns (powers of 2, etc.)

Binary Number System

Base-2 System

Binary uses only two digits: 0 and 1. Each position represents a power of 2.

Decimal: Base-10 uses powers of 10
Binary:  Base-2 uses powers of 2

Example: Binary 1011

Position:    3    2    1    0
Binary:      1    0    1    1
Power of 2:  2³   2²   2¹   2⁰
Value:       8  + 0  + 2  + 1  = 11 (decimal)

Binary to Decimal Conversion

Method: Multiply each bit by its position value and sum.

Example 1: 1101₂ to decimal

1×2³ + 1×2² + 0×2¹ + 1×2⁰
= 8 + 4 + 0 + 1
= 13₁₀

Example 2: 10110₂ to decimal

1×2⁴ + 0×2³ + 1×2² + 1×2¹ + 0×2⁰
= 16 + 0 + 4 + 2 + 0
= 22₁₀

Decimal to Binary Conversion

Method: Repeatedly divide by 2, track remainders.

Example: Convert 13₁₀ to binary

13 ÷ 2 = 6 remainder 1  (least significant bit)
 6 ÷ 2 = 3 remainder 0
 3 ÷ 2 = 1 remainder 1
 1 ÷ 2 = 0 remainder 1  (most significant bit)

Reading remainders bottom to top: 1101₂

Example: Convert 42₁₀ to binary

42 ÷ 2 = 21 remainder 0
21 ÷ 2 = 10 remainder 1
10 ÷ 2 = 5  remainder 0
 5 ÷ 2 = 2  remainder 1
 2 ÷ 2 = 1  remainder 0
 1 ÷ 2 = 0  remainder 1

Result: 101010₂

Hexadecimal Number System

Base-16 System

Hexadecimal (hex) uses 16 digits: 0-9 and A-F.

Decimal:  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
Hex:      0  1  2  3  4  5  6  7  8  9   A   B   C   D   E   F

Why hexadecimal?

Compact representation of binary
Each hex digit = exactly 4 binary bits
Much easier to read than long binary strings

Hex to Decimal Conversion

Example 1: 2F₁₆ to decimal

2×16¹ + F×16⁰
= 2×16 + 15×1
= 32 + 15
= 47₁₀

Example 2: 1A3₁₆ to decimal

1×16² + A×16¹ + 3×16⁰
= 1×256 + 10×16 + 3×1
= 256 + 160 + 3
= 419₁₀

Binary to Hex Conversion

Method: Group binary digits into sets of 4 (from right), convert each group.

Example 1: 11010110₂ to hex

Binary:  1101  0110
Hex:       D      6
Result: D6₁₆

Example 2: 10111101010₂ to hex

Add leading zeros to make groups of 4:
Binary:  0101  1110  1010
Hex:       5     E     A
Result: 5EA₁₆

Hex to Binary Conversion

Method: Convert each hex digit to 4 binary bits.

Example 1: A7₁₆ to binary

A = 1010
7 = 0111
Result: 10100111₂

Example 2: 3CF₁₆ to binary

3 = 0011
C = 1100
F = 1111
Result: 001111001111₂ (or just 1111001111₂)

Bits, Bytes, and Words

Terminology

Bit (Binary Digit)

Single binary digit (0 or 1)
Smallest unit of data

Nibble

4 bits
One hex digit
Rarely used term but good to know

Byte

8 bits
Most common unit
Can represent 0-255 (unsigned)
Standard addressable unit in memory

Word

Architecture-dependent size
16-bit system: word = 16 bits (2 bytes)
32-bit system: word = 32 bits (4 bytes)
64-bit system: word = 64 bits (8 bytes)

Value Ranges

Size	Bits	Unsigned Range	Signed Range
Byte	8	0 to 255	-128 to 127
Word (16-bit)	16	0 to 65,535	-32,768 to 32,767
Double Word	32	0 to 4,294,967,295	-2,147,483,648 to 2,147,483,647
Quad Word	64	0 to 18,446,744,073,709,551,615	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Formula: N bits can represent 2ᴺ different values.

Representing Negative Numbers

Sign-Magnitude (simple but not used)

Use first bit for sign (0 = positive, 1 = negative)
Remaining bits for magnitude

Problem: Two representations of zero (+0 and -0), complicated arithmetic.

Two's Complement (used in practice)

The standard way to represent signed integers.

Rules:

Positive numbers: same as unsigned
Negative numbers: invert all bits and add 1

Example: Represent -5 in 8 bits

1. Start with +5:     00000101
2. Invert all bits:   11111010
3. Add 1:             11111011

Result: -5 = 11111011 in two's complement

Verification: Add 5 + (-5) should = 0

  00000101  (+5)
+ 11111011  (-5)
-----------
 100000000  (overflow discarded)
= 00000000  (0) ✓

Advantages:

Only one representation of zero
Addition/subtraction work the same for signed and unsigned
Easy to determine sign (check MSB: 1 = negative)

Range for 8 bits:

Most negative:  10000000 = -128
               ...
Zero:          00000000 = 0
               ...
Most positive: 01111111 = +127

Bitwise Operations

Essential for system programming, manipulating individual bits.

AND (&)

Both bits must be 1.

  1010
& 1100
------
  1000

Use: Masking bits (turn off specific bits)

// Turn off bit 2 (make it 0)
value = value & 0b11111011;

OR (|)

At least one bit must be 1.

  1010
| 1100
------
  1110

Use: Setting bits (turn on specific bits)

// Turn on bit 3 (make it 1)
value = value | 0b00001000;

XOR (^)

Bits must be different.

  1010
^ 1100
------
  0110

Use: Toggling bits, encryption

// Toggle bit 1
value = value ^ 0b00000010;

NOT (~)

Invert all bits.

~1010
-----
 0101

Use: Creating bit masks

Left Shift (<<)

Shift bits left, fill with zeros.

  1010 << 2
= 101000

Effect: Multiplies by 2ⁿ (n = shift amount)

int x = 5;        // 0101
int y = x << 2;   // 10100 = 20 (5 × 4)

Right Shift (>>)

Shift bits right.

  1010 >> 2
= 0010

Effect: Divides by 2ⁿ (n = shift amount)

int x = 20;       // 10100
int y = x >> 2;   // 00101 = 5 (20 ÷ 4)

Note: Arithmetic vs logical shift matters for signed numbers.

Character Encoding

ASCII (American Standard Code for Information Interchange)

7 bits (128 characters)
Includes letters, digits, punctuation, control characters

Common ASCII values:

'0' = 48 (0x30)
'A' = 65 (0x41)
'a' = 97 (0x61)
' ' = 32 (0x20)
'\n' = 10 (0x0A)
'\0' = 0  (0x00) - null terminator

Example: "Hi" in ASCII

'H' = 72 = 01001000
'i' = 105 = 01101001

Extended ASCII

8 bits (256 characters)
Includes special characters for various languages
Different code pages for different regions (problem!)

UTF-8 (Unicode Transformation Format - 8 bit)

Modern standard for text encoding.

Characteristics:

Variable length: 1-4 bytes per character
Backward compatible with ASCII
Can represent all Unicode characters (1,000,000+)
Used everywhere on the web

Encoding rules:

ASCII (U+0000 to U+007F):  1 byte
  0xxxxxxx

U+0080 to U+07FF:  2 bytes
  110xxxxx 10xxxxxx

U+0800 to U+FFFF:  3 bytes
  1110xxxx 10xxxxxx 10xxxxxx

U+10000 to U+10FFFF:  4 bytes
  11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Example: "Hello, 世界"

'H' = 0x48 (1 byte)
'e' = 0x65 (1 byte)
'l' = 0x6C (1 byte)
'l' = 0x6C (1 byte)
'o' = 0x6F (1 byte)
',' = 0x2C (1 byte)
' ' = 0x20 (1 byte)
'世' = 0xE4B896 (3 bytes)
'界' = 0xE7958C (3 bytes)

Floating-Point Representation (Brief Overview)

How computers represent decimal numbers.

IEEE 754 Standard

32-bit float (single precision):

Sign (1 bit) | Exponent (8 bits) | Mantissa (23 bits)

64-bit double (double precision):

Sign (1 bit) | Exponent (11 bits) | Mantissa (52 bits)

Key points:

Can't represent all decimal numbers exactly
Has special values: +∞, -∞, NaN (Not a Number)
Precision limits cause rounding errors

Example: 0.1 + 0.2 ≠ 0.3 in floating point (due to representation limits)

Endianness

The order in which bytes are stored in memory.

Big-Endian

Most significant byte first (left to right).

Example: 0x12345678 in memory

Address:  0x00  0x01  0x02  0x03
Value:    0x12  0x34  0x56  0x78

Used by: Network protocols, some CPUs (SPARC, older PowerPC)

Little-Endian

Least significant byte first (right to left).

Example: 0x12345678 in memory

Address:  0x00  0x01  0x02  0x03
Value:    0x78  0x56  0x34  0x12

Used by: x86, x64, ARM (usually)

Why it matters: When exchanging data between systems, must account for endianness.

Key Concepts

Binary is the fundamental number system for computers
Hexadecimal provides compact binary representation
Two's complement is the standard for signed integers
Bitwise operations manipulate individual bits
UTF-8 is the modern standard for text encoding
Endianness affects multi-byte data storage

Common Mistakes

Forgetting leading zeros - Binary groups should be 4 or 8 bits
Mixing up hex letters - A=10, not 1
Wrong two's complement - Must invert all bits, then add 1
Assuming ASCII for all text - UTF-8 is now standard
Ignoring endianness - Matters when reading binary files

Debugging Tips

Use hex calculators - Verify your conversions
Check bit patterns - Draw them out visually
Test with small values - Easier to verify
Know your ranges - 8 bits = 0-255 unsigned
Watch for overflow - Adding 255 + 1 in 8 bits = 0

Mini Exercises

Convert 10110101₂ to decimal
Convert 173₁₀ to binary
Convert 2FA₁₆ to decimal
Convert 11011101₂ to hexadecimal
Represent -10 in 8-bit two's complement
Calculate: 10101010₂ AND 11001100₂
Calculate: 00001111₂ OR 11110000₂
What is 00000110₂ << 2 in binary?
Find the ASCII code for 'Z'
How many bytes does "Hello" take in UTF-8?

Review Questions

Why do computers use binary instead of decimal?
What's the advantage of hexadecimal over binary?
How does two's complement represent negative numbers?
What's the difference between big-endian and little-endian?
Why is UTF-8 better than ASCII?

Reference Checklist

By the end of this chapter, you should be able to:

Convert between binary, decimal, and hexadecimal
Understand bits, bytes, and words
Represent negative numbers using two's complement
Perform bitwise operations
Understand ASCII and UTF-8 encoding
Explain endianness
Calculate value ranges for different bit sizes

Next Steps

Now that you understand how data is represented at the bit level, the next chapter explores operating system concepts. You'll learn about processes, memory management, file systems, and how the OS manages all the computer's resources.

Key Takeaway: Everything in a computer is binary. Mastering binary, hexadecimal, and data representation is essential for understanding how system software works at the lowest level.