Chapter 3: Binary and Data Representation
Introduction
Computers don't understand numbers the way humans do. At the lowest level, everything in a computer is represented as sequences of electrical signals - on or off, high or low, 1 or 0. This chapter explores how computers represent all types of data using just these two states: binary.
Why This Matters
Understanding binary and hexadecimal is essential for system programming. When you read memory dumps, debug low-level code, work with network protocols, or write device drivers, you'll encounter binary and hexadecimal constantly. Mastering these number systems and understanding data representation is non-negotiable for system programmers.
How to Study This Chapter
- Practice conversions - Convert between decimal, binary, and hex by hand
- Use tools - Python's bin(), hex(), int() functions are helpful
- Write it out - Drawing bit patterns helps understanding
- Think in patterns - Recognize common patterns (powers of 2, etc.)
Binary Number System
Base-2 System
Binary uses only two digits: 0 and 1. Each position represents a power of 2.
Decimal: Base-10 uses powers of 10
Binary: Base-2 uses powers of 2
Example: Binary 1011
Position: 3 2 1 0
Binary: 1 0 1 1
Power of 2: 2³ 2² 2¹ 2⁰
Value: 8 + 0 + 2 + 1 = 11 (decimal)
Binary to Decimal Conversion
Method: Multiply each bit by its position value and sum.
Example 1: 1101₂ to decimal
1×2³ + 1×2² + 0×2¹ + 1×2⁰
= 8 + 4 + 0 + 1
= 13₁₀
Example 2: 10110₂ to decimal
1×2⁴ + 0×2³ + 1×2² + 1×2¹ + 0×2⁰
= 16 + 0 + 4 + 2 + 0
= 22₁₀
Decimal to Binary Conversion
Method: Repeatedly divide by 2, track remainders.
Example: Convert 13₁₀ to binary
13 ÷ 2 = 6 remainder 1 (least significant bit)
6 ÷ 2 = 3 remainder 0
3 ÷ 2 = 1 remainder 1
1 ÷ 2 = 0 remainder 1 (most significant bit)
Reading remainders bottom to top: 1101₂
Example: Convert 42₁₀ to binary
42 ÷ 2 = 21 remainder 0
21 ÷ 2 = 10 remainder 1
10 ÷ 2 = 5 remainder 0
5 ÷ 2 = 2 remainder 1
2 ÷ 2 = 1 remainder 0
1 ÷ 2 = 0 remainder 1
Result: 101010₂
Hexadecimal Number System
Base-16 System
Hexadecimal (hex) uses 16 digits: 0-9 and A-F.
Decimal: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Hex: 0 1 2 3 4 5 6 7 8 9 A B C D E F
Why hexadecimal?
- Compact representation of binary
- Each hex digit = exactly 4 binary bits
- Much easier to read than long binary strings
Hex to Decimal Conversion
Example 1: 2F₁₆ to decimal
2×16¹ + F×16⁰
= 2×16 + 15×1
= 32 + 15
= 47₁₀
Example 2: 1A3₁₆ to decimal
1×16² + A×16¹ + 3×16⁰
= 1×256 + 10×16 + 3×1
= 256 + 160 + 3
= 419₁₀
Binary to Hex Conversion
Method: Group binary digits into sets of 4 (from right), convert each group.
Example 1: 11010110₂ to hex
Binary: 1101 0110
Hex: D 6
Result: D6₁₆
Example 2: 10111101010₂ to hex
Add leading zeros to make groups of 4:
Binary: 0101 1110 1010
Hex: 5 E A
Result: 5EA₁₆
Hex to Binary Conversion
Method: Convert each hex digit to 4 binary bits.
Example 1: A7₁₆ to binary
A = 1010
7 = 0111
Result: 10100111₂
Example 2: 3CF₁₆ to binary
3 = 0011
C = 1100
F = 1111
Result: 001111001111₂ (or just 1111001111₂)
Bits, Bytes, and Words
Terminology
Bit (Binary Digit)
- Single binary digit (0 or 1)
- Smallest unit of data
Nibble
- 4 bits
- One hex digit
- Rarely used term but good to know
Byte
- 8 bits
- Most common unit
- Can represent 0-255 (unsigned)
- Standard addressable unit in memory
Word
- Architecture-dependent size
- 16-bit system: word = 16 bits (2 bytes)
- 32-bit system: word = 32 bits (4 bytes)
- 64-bit system: word = 64 bits (8 bytes)
Value Ranges
| Size | Bits | Unsigned Range | Signed Range |
|---|---|---|---|
| Byte | 8 | 0 to 255 | -128 to 127 |
| Word (16-bit) | 16 | 0 to 65,535 | -32,768 to 32,767 |
| Double Word | 32 | 0 to 4,294,967,295 | -2,147,483,648 to 2,147,483,647 |
| Quad Word | 64 | 0 to 18,446,744,073,709,551,615 | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
Formula: N bits can represent 2ᴺ different values.
Representing Negative Numbers
Sign-Magnitude (simple but not used)
- Use first bit for sign (0 = positive, 1 = negative)
- Remaining bits for magnitude
Problem: Two representations of zero (+0 and -0), complicated arithmetic.
Two's Complement (used in practice)
The standard way to represent signed integers.
Rules:
- Positive numbers: same as unsigned
- Negative numbers: invert all bits and add 1
Example: Represent -5 in 8 bits
1. Start with +5: 00000101
2. Invert all bits: 11111010
3. Add 1: 11111011
Result: -5 = 11111011 in two's complement
Verification: Add 5 + (-5) should = 0
00000101 (+5)
+ 11111011 (-5)
-----------
100000000 (overflow discarded)
= 00000000 (0) ✓
Advantages:
- Only one representation of zero
- Addition/subtraction work the same for signed and unsigned
- Easy to determine sign (check MSB: 1 = negative)
Range for 8 bits:
Most negative: 10000000 = -128
...
Zero: 00000000 = 0
...
Most positive: 01111111 = +127
Bitwise Operations
Essential for system programming, manipulating individual bits.
AND (&)
Both bits must be 1.
1010
& 1100
------
1000
Use: Masking bits (turn off specific bits)
// Turn off bit 2 (make it 0)
value = value & 0b11111011;
OR (|)
At least one bit must be 1.
1010
| 1100
------
1110
Use: Setting bits (turn on specific bits)
// Turn on bit 3 (make it 1)
value = value | 0b00001000;
XOR (^)
Bits must be different.
1010
^ 1100
------
0110
Use: Toggling bits, encryption
// Toggle bit 1
value = value ^ 0b00000010;
NOT (~)
Invert all bits.
~1010
-----
0101
Use: Creating bit masks
Left Shift (<<)
Shift bits left, fill with zeros.
1010 << 2
= 101000
Effect: Multiplies by 2ⁿ (n = shift amount)
int x = 5; // 0101
int y = x << 2; // 10100 = 20 (5 × 4)
Right Shift (>>)
Shift bits right.
1010 >> 2
= 0010
Effect: Divides by 2ⁿ (n = shift amount)
int x = 20; // 10100
int y = x >> 2; // 00101 = 5 (20 ÷ 4)
Note: Arithmetic vs logical shift matters for signed numbers.
Character Encoding
ASCII (American Standard Code for Information Interchange)
- 7 bits (128 characters)
- Includes letters, digits, punctuation, control characters
Common ASCII values:
'0' = 48 (0x30)
'A' = 65 (0x41)
'a' = 97 (0x61)
' ' = 32 (0x20)
'\n' = 10 (0x0A)
'\0' = 0 (0x00) - null terminator
Example: "Hi" in ASCII
'H' = 72 = 01001000
'i' = 105 = 01101001
Extended ASCII
- 8 bits (256 characters)
- Includes special characters for various languages
- Different code pages for different regions (problem!)
UTF-8 (Unicode Transformation Format - 8 bit)
Modern standard for text encoding.
Characteristics:
- Variable length: 1-4 bytes per character
- Backward compatible with ASCII
- Can represent all Unicode characters (1,000,000+)
- Used everywhere on the web
Encoding rules:
ASCII (U+0000 to U+007F): 1 byte
0xxxxxxx
U+0080 to U+07FF: 2 bytes
110xxxxx 10xxxxxx
U+0800 to U+FFFF: 3 bytes
1110xxxx 10xxxxxx 10xxxxxx
U+10000 to U+10FFFF: 4 bytes
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Example: "Hello, 世界"
'H' = 0x48 (1 byte)
'e' = 0x65 (1 byte)
'l' = 0x6C (1 byte)
'l' = 0x6C (1 byte)
'o' = 0x6F (1 byte)
',' = 0x2C (1 byte)
' ' = 0x20 (1 byte)
'世' = 0xE4B896 (3 bytes)
'界' = 0xE7958C (3 bytes)
Floating-Point Representation (Brief Overview)
How computers represent decimal numbers.
IEEE 754 Standard
32-bit float (single precision):
Sign (1 bit) | Exponent (8 bits) | Mantissa (23 bits)
64-bit double (double precision):
Sign (1 bit) | Exponent (11 bits) | Mantissa (52 bits)
Key points:
- Can't represent all decimal numbers exactly
- Has special values: +∞, -∞, NaN (Not a Number)
- Precision limits cause rounding errors
Example: 0.1 + 0.2 ≠ 0.3 in floating point (due to representation limits)
Endianness
The order in which bytes are stored in memory.
Big-Endian
Most significant byte first (left to right).
Example: 0x12345678 in memory
Address: 0x00 0x01 0x02 0x03
Value: 0x12 0x34 0x56 0x78
Used by: Network protocols, some CPUs (SPARC, older PowerPC)
Little-Endian
Least significant byte first (right to left).
Example: 0x12345678 in memory
Address: 0x00 0x01 0x02 0x03
Value: 0x78 0x56 0x34 0x12
Used by: x86, x64, ARM (usually)
Why it matters: When exchanging data between systems, must account for endianness.
Key Concepts
- Binary is the fundamental number system for computers
- Hexadecimal provides compact binary representation
- Two's complement is the standard for signed integers
- Bitwise operations manipulate individual bits
- UTF-8 is the modern standard for text encoding
- Endianness affects multi-byte data storage
Common Mistakes
- Forgetting leading zeros - Binary groups should be 4 or 8 bits
- Mixing up hex letters - A=10, not 1
- Wrong two's complement - Must invert all bits, then add 1
- Assuming ASCII for all text - UTF-8 is now standard
- Ignoring endianness - Matters when reading binary files
Debugging Tips
- Use hex calculators - Verify your conversions
- Check bit patterns - Draw them out visually
- Test with small values - Easier to verify
- Know your ranges - 8 bits = 0-255 unsigned
- Watch for overflow - Adding 255 + 1 in 8 bits = 0
Mini Exercises
- Convert 10110101₂ to decimal
- Convert 173₁₀ to binary
- Convert 2FA₁₆ to decimal
- Convert 11011101₂ to hexadecimal
- Represent -10 in 8-bit two's complement
- Calculate: 10101010₂ AND 11001100₂
- Calculate: 00001111₂ OR 11110000₂
- What is 00000110₂ << 2 in binary?
- Find the ASCII code for 'Z'
- How many bytes does "Hello" take in UTF-8?
Review Questions
- Why do computers use binary instead of decimal?
- What's the advantage of hexadecimal over binary?
- How does two's complement represent negative numbers?
- What's the difference between big-endian and little-endian?
- Why is UTF-8 better than ASCII?
Reference Checklist
By the end of this chapter, you should be able to:
- Convert between binary, decimal, and hexadecimal
- Understand bits, bytes, and words
- Represent negative numbers using two's complement
- Perform bitwise operations
- Understand ASCII and UTF-8 encoding
- Explain endianness
- Calculate value ranges for different bit sizes
Next Steps
Now that you understand how data is represented at the bit level, the next chapter explores operating system concepts. You'll learn about processes, memory management, file systems, and how the OS manages all the computer's resources.
Key Takeaway: Everything in a computer is binary. Mastering binary, hexadecimal, and data representation is essential for understanding how system software works at the lowest level.