Binary and Data Representation

Chapter 3: Binary and Data Representation

Introduction

Computers don't understand numbers the way humans do. At the lowest level, everything in a computer is represented as sequences of electrical signals - on or off, high or low, 1 or 0. This chapter explores how computers represent all types of data using just these two states: binary.

Why This Matters

Understanding binary and hexadecimal is essential for system programming. When you read memory dumps, debug low-level code, work with network protocols, or write device drivers, you'll encounter binary and hexadecimal constantly. Mastering these number systems and understanding data representation is non-negotiable for system programmers.

How to Study This Chapter

  1. Practice conversions - Convert between decimal, binary, and hex by hand
  2. Use tools - Python's bin(), hex(), int() functions are helpful
  3. Write it out - Drawing bit patterns helps understanding
  4. Think in patterns - Recognize common patterns (powers of 2, etc.)

Binary Number System

Base-2 System

Binary uses only two digits: 0 and 1. Each position represents a power of 2.

Decimal: Base-10 uses powers of 10
Binary:  Base-2 uses powers of 2

Example: Binary 1011

Position:    3    2    1    0
Binary:      1    0    1    1
Power of 2:  2³   2²   2¹   2⁰
Value:       8  + 0  + 2  + 1  = 11 (decimal)

Binary to Decimal Conversion

Method: Multiply each bit by its position value and sum.

Example 1: 1101₂ to decimal

1×2³ + 1×2² + 0×2¹ + 1×2⁰
= 8 + 4 + 0 + 1
= 13₁₀

Example 2: 10110₂ to decimal

1×2⁴ + 0×2³ + 1×2² + 1×2¹ + 0×2⁰
= 16 + 0 + 4 + 2 + 0
= 22₁₀

Decimal to Binary Conversion

Method: Repeatedly divide by 2, track remainders.

Example: Convert 13₁₀ to binary

13 ÷ 2 = 6 remainder 1  (least significant bit)
 6 ÷ 2 = 3 remainder 0
 3 ÷ 2 = 1 remainder 1
 1 ÷ 2 = 0 remainder 1  (most significant bit)

Reading remainders bottom to top: 1101₂

Example: Convert 42₁₀ to binary

42 ÷ 2 = 21 remainder 0
21 ÷ 2 = 10 remainder 1
10 ÷ 2 = 5  remainder 0
 5 ÷ 2 = 2  remainder 1
 2 ÷ 2 = 1  remainder 0
 1 ÷ 2 = 0  remainder 1

Result: 101010₂

Hexadecimal Number System

Base-16 System

Hexadecimal (hex) uses 16 digits: 0-9 and A-F.

Decimal:  0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
Hex:      0  1  2  3  4  5  6  7  8  9   A   B   C   D   E   F

Why hexadecimal?

  • Compact representation of binary
  • Each hex digit = exactly 4 binary bits
  • Much easier to read than long binary strings

Hex to Decimal Conversion

Example 1: 2F₁₆ to decimal

2×16¹ + F×16⁰
= 2×16 + 15×1
= 32 + 15
= 47₁₀

Example 2: 1A3₁₆ to decimal

1×16² + A×16¹ + 3×16⁰
= 1×256 + 10×16 + 3×1
= 256 + 160 + 3
= 419₁₀

Binary to Hex Conversion

Method: Group binary digits into sets of 4 (from right), convert each group.

Example 1: 11010110₂ to hex

Binary:  1101  0110
Hex:       D      6
Result: D6₁₆

Example 2: 10111101010₂ to hex

Add leading zeros to make groups of 4:
Binary:  0101  1110  1010
Hex:       5     E     A
Result: 5EA₁₆

Hex to Binary Conversion

Method: Convert each hex digit to 4 binary bits.

Example 1: A7₁₆ to binary

A = 1010
7 = 0111
Result: 10100111₂

Example 2: 3CF₁₆ to binary

3 = 0011
C = 1100
F = 1111
Result: 001111001111₂ (or just 1111001111₂)

Bits, Bytes, and Words

Terminology

Bit (Binary Digit)

  • Single binary digit (0 or 1)
  • Smallest unit of data

Nibble

  • 4 bits
  • One hex digit
  • Rarely used term but good to know

Byte

  • 8 bits
  • Most common unit
  • Can represent 0-255 (unsigned)
  • Standard addressable unit in memory

Word

  • Architecture-dependent size
  • 16-bit system: word = 16 bits (2 bytes)
  • 32-bit system: word = 32 bits (4 bytes)
  • 64-bit system: word = 64 bits (8 bytes)

Value Ranges

SizeBitsUnsigned RangeSigned Range
Byte80 to 255-128 to 127
Word (16-bit)160 to 65,535-32,768 to 32,767
Double Word320 to 4,294,967,295-2,147,483,648 to 2,147,483,647
Quad Word640 to 18,446,744,073,709,551,615-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Formula: N bits can represent 2ᴺ different values.

Representing Negative Numbers

Sign-Magnitude (simple but not used)

  • Use first bit for sign (0 = positive, 1 = negative)
  • Remaining bits for magnitude

Problem: Two representations of zero (+0 and -0), complicated arithmetic.

Two's Complement (used in practice)

The standard way to represent signed integers.

Rules:

  • Positive numbers: same as unsigned
  • Negative numbers: invert all bits and add 1

Example: Represent -5 in 8 bits

1. Start with +5:     00000101
2. Invert all bits:   11111010
3. Add 1:             11111011

Result: -5 = 11111011 in two's complement

Verification: Add 5 + (-5) should = 0

  00000101  (+5)
+ 11111011  (-5)
-----------
 100000000  (overflow discarded)
= 00000000  (0) ✓

Advantages:

  • Only one representation of zero
  • Addition/subtraction work the same for signed and unsigned
  • Easy to determine sign (check MSB: 1 = negative)

Range for 8 bits:

Most negative:  10000000 = -128
               ...
Zero:          00000000 = 0
               ...
Most positive: 01111111 = +127

Bitwise Operations

Essential for system programming, manipulating individual bits.

AND (&)

Both bits must be 1.

  1010
& 1100
------
  1000

Use: Masking bits (turn off specific bits)

// Turn off bit 2 (make it 0)
value = value & 0b11111011;

OR (|)

At least one bit must be 1.

  1010
| 1100
------
  1110

Use: Setting bits (turn on specific bits)

// Turn on bit 3 (make it 1)
value = value | 0b00001000;

XOR (^)

Bits must be different.

  1010
^ 1100
------
  0110

Use: Toggling bits, encryption

// Toggle bit 1
value = value ^ 0b00000010;

NOT (~)

Invert all bits.

~1010
-----
 0101

Use: Creating bit masks

Left Shift (<<)

Shift bits left, fill with zeros.

  1010 << 2
= 101000

Effect: Multiplies by 2ⁿ (n = shift amount)

int x = 5;        // 0101
int y = x << 2;   // 10100 = 20 (5 × 4)

Right Shift (>>)

Shift bits right.

  1010 >> 2
= 0010

Effect: Divides by 2ⁿ (n = shift amount)

int x = 20;       // 10100
int y = x >> 2;   // 00101 = 5 (20 ÷ 4)

Note: Arithmetic vs logical shift matters for signed numbers.

Character Encoding

ASCII (American Standard Code for Information Interchange)

  • 7 bits (128 characters)
  • Includes letters, digits, punctuation, control characters

Common ASCII values:

'0' = 48 (0x30)
'A' = 65 (0x41)
'a' = 97 (0x61)
' ' = 32 (0x20)
'\n' = 10 (0x0A)
'\0' = 0  (0x00) - null terminator

Example: "Hi" in ASCII

'H' = 72 = 01001000
'i' = 105 = 01101001

Extended ASCII

  • 8 bits (256 characters)
  • Includes special characters for various languages
  • Different code pages for different regions (problem!)

UTF-8 (Unicode Transformation Format - 8 bit)

Modern standard for text encoding.

Characteristics:

  • Variable length: 1-4 bytes per character
  • Backward compatible with ASCII
  • Can represent all Unicode characters (1,000,000+)
  • Used everywhere on the web

Encoding rules:

ASCII (U+0000 to U+007F):  1 byte
  0xxxxxxx

U+0080 to U+07FF:  2 bytes
  110xxxxx 10xxxxxx

U+0800 to U+FFFF:  3 bytes
  1110xxxx 10xxxxxx 10xxxxxx

U+10000 to U+10FFFF:  4 bytes
  11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Example: "Hello, 世界"

'H' = 0x48 (1 byte)
'e' = 0x65 (1 byte)
'l' = 0x6C (1 byte)
'l' = 0x6C (1 byte)
'o' = 0x6F (1 byte)
',' = 0x2C (1 byte)
' ' = 0x20 (1 byte)
'世' = 0xE4B896 (3 bytes)
'界' = 0xE7958C (3 bytes)

Floating-Point Representation (Brief Overview)

How computers represent decimal numbers.

IEEE 754 Standard

32-bit float (single precision):

Sign (1 bit) | Exponent (8 bits) | Mantissa (23 bits)

64-bit double (double precision):

Sign (1 bit) | Exponent (11 bits) | Mantissa (52 bits)

Key points:

  • Can't represent all decimal numbers exactly
  • Has special values: +∞, -∞, NaN (Not a Number)
  • Precision limits cause rounding errors

Example: 0.1 + 0.2 ≠ 0.3 in floating point (due to representation limits)

Endianness

The order in which bytes are stored in memory.

Big-Endian

Most significant byte first (left to right).

Example: 0x12345678 in memory

Address:  0x00  0x01  0x02  0x03
Value:    0x12  0x34  0x56  0x78

Used by: Network protocols, some CPUs (SPARC, older PowerPC)

Little-Endian

Least significant byte first (right to left).

Example: 0x12345678 in memory

Address:  0x00  0x01  0x02  0x03
Value:    0x78  0x56  0x34  0x12

Used by: x86, x64, ARM (usually)

Why it matters: When exchanging data between systems, must account for endianness.

Key Concepts

  • Binary is the fundamental number system for computers
  • Hexadecimal provides compact binary representation
  • Two's complement is the standard for signed integers
  • Bitwise operations manipulate individual bits
  • UTF-8 is the modern standard for text encoding
  • Endianness affects multi-byte data storage

Common Mistakes

  1. Forgetting leading zeros - Binary groups should be 4 or 8 bits
  2. Mixing up hex letters - A=10, not 1
  3. Wrong two's complement - Must invert all bits, then add 1
  4. Assuming ASCII for all text - UTF-8 is now standard
  5. Ignoring endianness - Matters when reading binary files

Debugging Tips

  • Use hex calculators - Verify your conversions
  • Check bit patterns - Draw them out visually
  • Test with small values - Easier to verify
  • Know your ranges - 8 bits = 0-255 unsigned
  • Watch for overflow - Adding 255 + 1 in 8 bits = 0

Mini Exercises

  1. Convert 10110101₂ to decimal
  2. Convert 173₁₀ to binary
  3. Convert 2FA₁₆ to decimal
  4. Convert 11011101₂ to hexadecimal
  5. Represent -10 in 8-bit two's complement
  6. Calculate: 10101010₂ AND 11001100₂
  7. Calculate: 00001111₂ OR 11110000₂
  8. What is 00000110₂ << 2 in binary?
  9. Find the ASCII code for 'Z'
  10. How many bytes does "Hello" take in UTF-8?

Review Questions

  1. Why do computers use binary instead of decimal?
  2. What's the advantage of hexadecimal over binary?
  3. How does two's complement represent negative numbers?
  4. What's the difference between big-endian and little-endian?
  5. Why is UTF-8 better than ASCII?

Reference Checklist

By the end of this chapter, you should be able to:

  • Convert between binary, decimal, and hexadecimal
  • Understand bits, bytes, and words
  • Represent negative numbers using two's complement
  • Perform bitwise operations
  • Understand ASCII and UTF-8 encoding
  • Explain endianness
  • Calculate value ranges for different bit sizes

Next Steps

Now that you understand how data is represented at the bit level, the next chapter explores operating system concepts. You'll learn about processes, memory management, file systems, and how the OS manages all the computer's resources.


Key Takeaway: Everything in a computer is binary. Mastering binary, hexadecimal, and data representation is essential for understanding how system software works at the lowest level.