Chapter 17: Encoding Techniques (Base64, UTF-8, Hex)

Introduction

Encoding transforms data from one format to another for transmission, storage, or display. Unlike compression (which reduces size) or encryption (which hides meaning), encoding simply represents data in a different format. This chapter explores Base64, hexadecimal, UTF-8, and other encoding schemes essential for web development, file formats, and data interchange.

Why This Matters

Encoding enables data interchange:

Email attachments: Base64 for binary data in MIME
URLs: Percent-encoding for special characters
Web APIs: JSON with Base64-encoded binary
Images in HTML: Data URLs with Base64
Text files: UTF-8 for international characters
Color codes: Hexadecimal in CSS/HTML
Debugging: Hex dumps for binary inspection

How to Study This Chapter

Understand encoding vs encryption - Different purposes
Learn character encoding - ASCII, UTF-8, Unicode
Implement Base64 - Understand bit manipulation
Practice hex conversion - Binary to hex and back
Study URL encoding - Web-safe characters

ASCII and Character Encoding

ASCII (American Standard Code for Information Interchange)

ASCII: 7-bit encoding for 128 characters (0-127).

Character ranges:
0-31:   Control characters
32-47:  Space and punctuation
48-57:  Digits '0'-'9'
65-90:  Uppercase 'A'-'Z'
97-122: Lowercase 'a'-'z'

Examples:
'A' = 65 = 0x41 = 01000001
'a' = 97 = 0x61 = 01100001
'0' = 48 = 0x30 = 00110000

C Implementation

#include <stdio.h>

void printAscii(char c) {
    printf("'%c' = %d = 0x%02X = ", c, c, (unsigned char)c);

    // Print binary
    for (int i = 7; i >= 0; i--) {
        printf("%d", (c >> i) & 1);
    }
    printf("\n");
}

int main() {
    printAscii('A');
    printAscii('a');
    printAscii('0');
    printAscii(' ');

    return 0;
}

UTF-8 (Unicode Transformation Format - 8-bit)

UTF-8: Variable-length encoding for Unicode (1-4 bytes per character).

1 byte:  0xxxxxxx                    (ASCII compatible)
2 bytes: 110xxxxx 10xxxxxx
3 bytes: 1110xxxx 10xxxxxx 10xxxxxx
4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Examples:
'A' (U+0041): 01000001 (1 byte)
'é' (U+00E9): 11000011 10101001 (2 bytes)
'€' (U+20AC): 11100010 10000010 10101100 (3 bytes)
'😀' (U+1F600): 11110000 10011111 10011000 10000000 (4 bytes)

C++ UTF-8 Example

#include <iostream>
#include <string>
#include <iomanip>
using namespace std;

void printUtf8Bytes(const string& str) {
    cout << "UTF-8 encoding of \"" << str << "\":" << endl;

    for (unsigned char c : str) {
        cout << hex << setw(2) << setfill('0') << (int)c << " ";
    }
    cout << dec << endl;
}

int utf8CharLength(unsigned char firstByte) {
    if ((firstByte & 0x80) == 0) return 1;      // 0xxxxxxx
    if ((firstByte & 0xE0) == 0xC0) return 2;   // 110xxxxx
    if ((firstByte & 0xF0) == 0xE0) return 3;   // 1110xxxx
    if ((firstByte & 0xF8) == 0xF0) return 4;   // 11110xxx
    return 0; // Invalid
}

int main() {
    string text = "Hello 世界 😀";

    printUtf8Bytes(text);

    cout << "\nCharacter breakdown:" << endl;
    int i = 0;
    while (i < text.length()) {
        int len = utf8CharLength(text[i]);
        cout << "Character: ";
        for (int j = 0; j < len && i + j < text.length(); j++) {
            cout << hex << setw(2) << setfill('0')
                 << (int)(unsigned char)text[i + j] << " ";
        }
        cout << dec << "(" << len << " bytes)" << endl;
        i += len;
    }

    return 0;
}

Hexadecimal Encoding

Hexadecimal: Base-16 representation (0-9, A-F).

Uses

Memory addresses
Color codes (#FF5733)
Hash values (SHA-256)
Binary file dumps
MAC addresses

C Implementation

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void bytesToHex(const unsigned char* bytes, int len, char* hex) {
    for (int i = 0; i < len; i++) {
        sprintf(hex + (i * 2), "%02x", bytes[i]);
    }
    hex[len * 2] = '\0';
}

int hexToBytes(const char* hex, unsigned char* bytes) {
    int len = strlen(hex);
    if (len % 2 != 0) return -1; // Invalid hex string

    for (int i = 0; i < len / 2; i++) {
        sscanf(hex + (i * 2), "%2hhx", &bytes[i]);
    }

    return len / 2;
}

int main() {
    unsigned char data[] = "Hello";
    char hex[100];

    bytesToHex(data, 5, hex);
    printf("Bytes to Hex: %s\n", hex);

    unsigned char decoded[100];
    int decodedLen = hexToBytes(hex, decoded);
    decoded[decodedLen] = '\0';
    printf("Hex to Bytes: %s\n", decoded);

    return 0;
}

C++ Implementation

#include <iostream>
#include <string>
#include <sstream>
#include <iomanip>
using namespace std;

string bytesToHex(const unsigned char* bytes, size_t len) {
    stringstream ss;
    for (size_t i = 0; i < len; i++) {
        ss << hex << setw(2) << setfill('0') << (int)bytes[i];
    }
    return ss.str();
}

string hexToBytes(const string& hex) {
    string bytes;
    for (size_t i = 0; i < hex.length(); i += 2) {
        string byteString = hex.substr(i, 2);
        unsigned char byte = (unsigned char)strtol(byteString.c_str(), nullptr, 16);
        bytes += byte;
    }
    return bytes;
}

int main() {
    string data = "Hello";

    string hex = bytesToHex((unsigned char*)data.c_str(), data.length());
    cout << "Bytes to Hex: " << hex << endl;

    string decoded = hexToBytes(hex);
    cout << "Hex to Bytes: " << decoded << endl;

    return 0;
}

Java Implementation

public class HexEncoding {
    private static final char[] HEX_ARRAY = "0123456789abcdef".toCharArray();

    public static String bytesToHex(byte[] bytes) {
        char[] hexChars = new char[bytes.length * 2];
        for (int i = 0; i < bytes.length; i++) {
            int v = bytes[i] & 0xFF;
            hexChars[i * 2] = HEX_ARRAY[v >>> 4];
            hexChars[i * 2 + 1] = HEX_ARRAY[v & 0x0F];
        }
        return new String(hexChars);
    }

    public static byte[] hexToBytes(String hex) {
        int len = hex.length();
        byte[] bytes = new byte[len / 2];

        for (int i = 0; i < len; i += 2) {
            bytes[i / 2] = (byte) ((Character.digit(hex.charAt(i), 16) << 4)
                                 + Character.digit(hex.charAt(i + 1), 16));
        }

        return bytes;
    }

    public static void main(String[] args) {
        String data = "Hello";
        byte[] bytes = data.getBytes();

        String hex = bytesToHex(bytes);
        System.out.println("Bytes to Hex: " + hex);

        byte[] decoded = hexToBytes(hex);
        System.out.println("Hex to Bytes: " + new String(decoded));
    }
}

Base64 Encoding

Base64: Encodes binary data as ASCII text using 64 characters (A-Z, a-z, 0-9, +, /).

How It Works

1. Take 3 bytes (24 bits)
2. Split into four 6-bit groups
3. Map each 6-bit value to Base64 character
4. Pad with '=' if needed

Example: "Man"
M = 77 = 01001101
a = 97 = 01100001
n = 110 = 01101110

Combined: 010011010110000101101110

Split into 6-bit groups:
010011 = 19 = 'T'
010110 = 22 = 'W'
000101 = 5  = 'F'
101110 = 46 = 'u'

Result: "TWFu"

C Implementation

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

static const char base64_chars[] =
    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

char* base64Encode(const unsigned char* data, size_t input_length) {
    size_t output_length = 4 * ((input_length + 2) / 3);
    char* encoded = (char*)malloc(output_length + 1);

    if (!encoded) return NULL;

    size_t i, j;
    for (i = 0, j = 0; i < input_length;) {
        uint32_t octet_a = i < input_length ? data[i++] : 0;
        uint32_t octet_b = i < input_length ? data[i++] : 0;
        uint32_t octet_c = i < input_length ? data[i++] : 0;

        uint32_t triple = (octet_a << 16) + (octet_b << 8) + octet_c;

        encoded[j++] = base64_chars[(triple >> 18) & 0x3F];
        encoded[j++] = base64_chars[(triple >> 12) & 0x3F];
        encoded[j++] = base64_chars[(triple >> 6) & 0x3F];
        encoded[j++] = base64_chars[triple & 0x3F];
    }

    // Padding
    for (i = 0; i < (3 - input_length % 3) % 3; i++) {
        encoded[output_length - 1 - i] = '=';
    }

    encoded[output_length] = '\0';
    return encoded;
}

int main() {
    const char* input = "Man";
    char* encoded = base64Encode((unsigned char*)input, strlen(input));

    printf("Original: %s\n", input);
    printf("Base64: %s\n", encoded);

    free(encoded);
    return 0;
}

C++ Implementation

#include <iostream>
#include <string>
using namespace std;

const string base64_chars =
    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

string base64Encode(const unsigned char* data, size_t len) {
    string encoded;
    int i = 0;
    unsigned char char_array_3[3];
    unsigned char char_array_4[4];

    while (len--) {
        char_array_3[i++] = *(data++);

        if (i == 3) {
            char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
            char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
            char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
            char_array_4[3] = char_array_3[2] & 0x3f;

            for (i = 0; i < 4; i++) {
                encoded += base64_chars[char_array_4[i]];
            }
            i = 0;
        }
    }

    if (i) {
        for (int j = i; j < 3; j++) {
            char_array_3[j] = '\0';
        }

        char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
        char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
        char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);

        for (int j = 0; j < i + 1; j++) {
            encoded += base64_chars[char_array_4[j]];
        }

        while (i++ < 3) {
            encoded += '=';
        }
    }

    return encoded;
}

string base64Decode(const string& encoded) {
    int in_len = encoded.size();
    int i = 0, j = 0, in = 0;
    unsigned char char_array_4[4], char_array_3[3];
    string decoded;

    while (in_len-- && (encoded[in] != '=') && isalnum(encoded[in]) || encoded[in] == '+' || encoded[in] == '/') {
        char_array_4[i++] = encoded[in++];

        if (i == 4) {
            for (i = 0; i < 4; i++) {
                char_array_4[i] = base64_chars.find(char_array_4[i]);
            }

            char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
            char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
            char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

            for (i = 0; i < 3; i++) {
                decoded += char_array_3[i];
            }
            i = 0;
        }
    }

    if (i) {
        for (j = 0; j < i; j++) {
            char_array_4[j] = base64_chars.find(char_array_4[j]);
        }

        char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
        char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);

        for (j = 0; j < i - 1; j++) {
            decoded += char_array_3[j];
        }
    }

    return decoded;
}

int main() {
    string input = "Hello, World!";

    cout << "Original: " << input << endl;

    string encoded = base64Encode((unsigned char*)input.c_str(), input.length());
    cout << "Base64: " << encoded << endl;

    string decoded = base64Decode(encoded);
    cout << "Decoded: " << decoded << endl;

    return 0;
}

Java Implementation (using built-in)

import java.util.Base64;
import java.nio.charset.StandardCharsets;

public class Base64Encoding {
    public static void main(String[] args) {
        String input = "Hello, World!";

        // Encode
        String encoded = Base64.getEncoder().encodeToString(input.getBytes());
        System.out.println("Original: " + input);
        System.out.println("Base64: " + encoded);

        // Decode
        byte[] decoded = Base64.getDecoder().decode(encoded);
        String decodedStr = new String(decoded, StandardCharsets.UTF_8);
        System.out.println("Decoded: " + decodedStr);
    }
}

URL Encoding (Percent Encoding)

URL encoding: Encodes special characters as %XX where XX is hexadecimal.

Characters Requiring Encoding

Space: %20
!: %21
#: %23
$: %24
%: %25
&: %26
': %27
(: %28
): %29

C++ Implementation

#include <iostream>
#include <string>
#include <sstream>
#include <iomanip>
using namespace std;

string urlEncode(const string& str) {
    ostringstream escaped;
    escaped.fill('0');
    escaped << hex;

    for (char c : str) {
        if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
            escaped << c;
        }
        else {
            escaped << uppercase;
            escaped << '%' << setw(2) << int((unsigned char)c);
            escaped << nouppercase;
        }
    }

    return escaped.str();
}

string urlDecode(const string& str) {
    string decoded;
    for (size_t i = 0; i < str.length(); i++) {
        if (str[i] == '%') {
            if (i + 2 < str.length()) {
                string hex = str.substr(i + 1, 2);
                int value = stoi(hex, nullptr, 16);
                decoded += static_cast<char>(value);
                i += 2;
            }
        }
        else if (str[i] == '+') {
            decoded += ' ';
        }
        else {
            decoded += str[i];
        }
    }
    return decoded;
}

int main() {
    string url = "Hello World! @#$";

    cout << "Original: " << url << endl;

    string encoded = urlEncode(url);
    cout << "URL Encoded: " << encoded << endl;

    string decoded = urlDecode(encoded);
    cout << "Decoded: " << decoded << endl;

    return 0;
}

Comparison of Encoding Schemes

Encoding	Purpose	Output Size	Use Case
ASCII	Text	1 byte/char	English text
UTF-8	Unicode text	1-4 bytes/char	International text
Hex	Binary display	2 chars/byte	Debugging, hashes
Base64	Binary in text	~133%	Email, JSON, URLs
URL Encoding	Web URLs	Varies	Query strings

Common Mistakes

Confusing encoding with encryption - Encoding is not secure
Not handling padding - Base64 requires proper padding
Character set issues - Assuming ASCII when UTF-8 is needed
Buffer overflows - Not allocating enough space for encoded data
Double encoding - Encoding already encoded data
Wrong URL encoding - Not encoding all special characters
Byte order issues - Endianness in multi-byte encodings

Debugging Tips

Use online tools - Verify encoding/decoding
Print byte values - Inspect actual bytes
Test with known inputs - "Man" → "TWFu" for Base64
Check padding - Base64 padding must be correct
Validate UTF-8 - Ensure valid byte sequences
Compare with standards - RFC specifications
Handle edge cases - Empty strings, special characters

Mini Exercises

Implement Base32 encoding
Convert between different encodings
Validate UTF-8 sequences
Implement ROT13 cipher
Create ASCII art encoder
Build HTML entity encoder
Implement quoted-printable encoding
Create Unicode normalizer
Build Base85 (ASCII85) encoder
Implement punycode for IDN

Review Questions

What's the difference between encoding and encryption?
Why is Base64 used for binary data in emails?
How does UTF-8 maintain ASCII compatibility?
When should you use URL encoding?
What are the advantages of hexadecimal representation?

Reference Checklist

By the end of this chapter, you should be able to:

Convert between ASCII and binary
Understand UTF-8 encoding
Implement hexadecimal encoding/decoding
Implement Base64 encoding/decoding
Use URL encoding for web applications
Choose appropriate encoding for use case
Handle multi-byte character encodings
Debug encoding issues

Next Steps

Chapter 18 explores advanced algorithm paradigms—Divide & Conquer, Greedy, Dynamic Programming, and Backtracking. These fundamental techniques solve complex problems efficiently.

Key Takeaway: Encoding transforms data representation without changing meaning. Base64 enables binary data in text formats. UTF-8 supports international characters. Hexadecimal provides readable binary representation. Understanding encoding is essential for web development, data interchange, and system integration.