Encoding Techniques (Base64, UTF-8, Hex)

Chapter 17: Encoding Techniques (Base64, UTF-8, Hex)

Introduction

Encoding transforms data from one format to another for transmission, storage, or display. Unlike compression (which reduces size) or encryption (which hides meaning), encoding simply represents data in a different format. This chapter explores Base64, hexadecimal, UTF-8, and other encoding schemes essential for web development, file formats, and data interchange.

Why This Matters

Encoding enables data interchange:

  • Email attachments: Base64 for binary data in MIME
  • URLs: Percent-encoding for special characters
  • Web APIs: JSON with Base64-encoded binary
  • Images in HTML: Data URLs with Base64
  • Text files: UTF-8 for international characters
  • Color codes: Hexadecimal in CSS/HTML
  • Debugging: Hex dumps for binary inspection

How to Study This Chapter

  1. Understand encoding vs encryption - Different purposes
  2. Learn character encoding - ASCII, UTF-8, Unicode
  3. Implement Base64 - Understand bit manipulation
  4. Practice hex conversion - Binary to hex and back
  5. Study URL encoding - Web-safe characters

ASCII and Character Encoding

ASCII (American Standard Code for Information Interchange)

ASCII: 7-bit encoding for 128 characters (0-127).

Character ranges:
0-31:   Control characters
32-47:  Space and punctuation
48-57:  Digits '0'-'9'
65-90:  Uppercase 'A'-'Z'
97-122: Lowercase 'a'-'z'

Examples:
'A' = 65 = 0x41 = 01000001
'a' = 97 = 0x61 = 01100001
'0' = 48 = 0x30 = 00110000

C Implementation

#include <stdio.h>

void printAscii(char c) {
    printf("'%c' = %d = 0x%02X = ", c, c, (unsigned char)c);

    // Print binary
    for (int i = 7; i >= 0; i--) {
        printf("%d", (c >> i) & 1);
    }
    printf("\n");
}

int main() {
    printAscii('A');
    printAscii('a');
    printAscii('0');
    printAscii(' ');

    return 0;
}

UTF-8 (Unicode Transformation Format - 8-bit)

UTF-8: Variable-length encoding for Unicode (1-4 bytes per character).

1 byte:  0xxxxxxx                    (ASCII compatible)
2 bytes: 110xxxxx 10xxxxxx
3 bytes: 1110xxxx 10xxxxxx 10xxxxxx
4 bytes: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Examples:
'A' (U+0041): 01000001 (1 byte)
'é' (U+00E9): 11000011 10101001 (2 bytes)
'€' (U+20AC): 11100010 10000010 10101100 (3 bytes)
'😀' (U+1F600): 11110000 10011111 10011000 10000000 (4 bytes)

C++ UTF-8 Example

#include <iostream>
#include <string>
#include <iomanip>
using namespace std;

void printUtf8Bytes(const string& str) {
    cout << "UTF-8 encoding of \"" << str << "\":" << endl;

    for (unsigned char c : str) {
        cout << hex << setw(2) << setfill('0') << (int)c << " ";
    }
    cout << dec << endl;
}

int utf8CharLength(unsigned char firstByte) {
    if ((firstByte & 0x80) == 0) return 1;      // 0xxxxxxx
    if ((firstByte & 0xE0) == 0xC0) return 2;   // 110xxxxx
    if ((firstByte & 0xF0) == 0xE0) return 3;   // 1110xxxx
    if ((firstByte & 0xF8) == 0xF0) return 4;   // 11110xxx
    return 0; // Invalid
}

int main() {
    string text = "Hello 世界 😀";

    printUtf8Bytes(text);

    cout << "\nCharacter breakdown:" << endl;
    int i = 0;
    while (i < text.length()) {
        int len = utf8CharLength(text[i]);
        cout << "Character: ";
        for (int j = 0; j < len && i + j < text.length(); j++) {
            cout << hex << setw(2) << setfill('0')
                 << (int)(unsigned char)text[i + j] << " ";
        }
        cout << dec << "(" << len << " bytes)" << endl;
        i += len;
    }

    return 0;
}

Hexadecimal Encoding

Hexadecimal: Base-16 representation (0-9, A-F).

Uses

  • Memory addresses
  • Color codes (#FF5733)
  • Hash values (SHA-256)
  • Binary file dumps
  • MAC addresses

C Implementation

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void bytesToHex(const unsigned char* bytes, int len, char* hex) {
    for (int i = 0; i < len; i++) {
        sprintf(hex + (i * 2), "%02x", bytes[i]);
    }
    hex[len * 2] = '\0';
}

int hexToBytes(const char* hex, unsigned char* bytes) {
    int len = strlen(hex);
    if (len % 2 != 0) return -1; // Invalid hex string

    for (int i = 0; i < len / 2; i++) {
        sscanf(hex + (i * 2), "%2hhx", &bytes[i]);
    }

    return len / 2;
}

int main() {
    unsigned char data[] = "Hello";
    char hex[100];

    bytesToHex(data, 5, hex);
    printf("Bytes to Hex: %s\n", hex);

    unsigned char decoded[100];
    int decodedLen = hexToBytes(hex, decoded);
    decoded[decodedLen] = '\0';
    printf("Hex to Bytes: %s\n", decoded);

    return 0;
}

C++ Implementation

#include <iostream>
#include <string>
#include <sstream>
#include <iomanip>
using namespace std;

string bytesToHex(const unsigned char* bytes, size_t len) {
    stringstream ss;
    for (size_t i = 0; i < len; i++) {
        ss << hex << setw(2) << setfill('0') << (int)bytes[i];
    }
    return ss.str();
}

string hexToBytes(const string& hex) {
    string bytes;
    for (size_t i = 0; i < hex.length(); i += 2) {
        string byteString = hex.substr(i, 2);
        unsigned char byte = (unsigned char)strtol(byteString.c_str(), nullptr, 16);
        bytes += byte;
    }
    return bytes;
}

int main() {
    string data = "Hello";

    string hex = bytesToHex((unsigned char*)data.c_str(), data.length());
    cout << "Bytes to Hex: " << hex << endl;

    string decoded = hexToBytes(hex);
    cout << "Hex to Bytes: " << decoded << endl;

    return 0;
}

Java Implementation

public class HexEncoding {
    private static final char[] HEX_ARRAY = "0123456789abcdef".toCharArray();

    public static String bytesToHex(byte[] bytes) {
        char[] hexChars = new char[bytes.length * 2];
        for (int i = 0; i < bytes.length; i++) {
            int v = bytes[i] & 0xFF;
            hexChars[i * 2] = HEX_ARRAY[v >>> 4];
            hexChars[i * 2 + 1] = HEX_ARRAY[v & 0x0F];
        }
        return new String(hexChars);
    }

    public static byte[] hexToBytes(String hex) {
        int len = hex.length();
        byte[] bytes = new byte[len / 2];

        for (int i = 0; i < len; i += 2) {
            bytes[i / 2] = (byte) ((Character.digit(hex.charAt(i), 16) << 4)
                                 + Character.digit(hex.charAt(i + 1), 16));
        }

        return bytes;
    }

    public static void main(String[] args) {
        String data = "Hello";
        byte[] bytes = data.getBytes();

        String hex = bytesToHex(bytes);
        System.out.println("Bytes to Hex: " + hex);

        byte[] decoded = hexToBytes(hex);
        System.out.println("Hex to Bytes: " + new String(decoded));
    }
}

Base64 Encoding

Base64: Encodes binary data as ASCII text using 64 characters (A-Z, a-z, 0-9, +, /).

How It Works

1. Take 3 bytes (24 bits)
2. Split into four 6-bit groups
3. Map each 6-bit value to Base64 character
4. Pad with '=' if needed

Example: "Man"
M = 77 = 01001101
a = 97 = 01100001
n = 110 = 01101110

Combined: 010011010110000101101110

Split into 6-bit groups:
010011 = 19 = 'T'
010110 = 22 = 'W'
000101 = 5  = 'F'
101110 = 46 = 'u'

Result: "TWFu"

C Implementation

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

static const char base64_chars[] =
    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

char* base64Encode(const unsigned char* data, size_t input_length) {
    size_t output_length = 4 * ((input_length + 2) / 3);
    char* encoded = (char*)malloc(output_length + 1);

    if (!encoded) return NULL;

    size_t i, j;
    for (i = 0, j = 0; i < input_length;) {
        uint32_t octet_a = i < input_length ? data[i++] : 0;
        uint32_t octet_b = i < input_length ? data[i++] : 0;
        uint32_t octet_c = i < input_length ? data[i++] : 0;

        uint32_t triple = (octet_a << 16) + (octet_b << 8) + octet_c;

        encoded[j++] = base64_chars[(triple >> 18) & 0x3F];
        encoded[j++] = base64_chars[(triple >> 12) & 0x3F];
        encoded[j++] = base64_chars[(triple >> 6) & 0x3F];
        encoded[j++] = base64_chars[triple & 0x3F];
    }

    // Padding
    for (i = 0; i < (3 - input_length % 3) % 3; i++) {
        encoded[output_length - 1 - i] = '=';
    }

    encoded[output_length] = '\0';
    return encoded;
}

int main() {
    const char* input = "Man";
    char* encoded = base64Encode((unsigned char*)input, strlen(input));

    printf("Original: %s\n", input);
    printf("Base64: %s\n", encoded);

    free(encoded);
    return 0;
}

C++ Implementation

#include <iostream>
#include <string>
using namespace std;

const string base64_chars =
    "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

string base64Encode(const unsigned char* data, size_t len) {
    string encoded;
    int i = 0;
    unsigned char char_array_3[3];
    unsigned char char_array_4[4];

    while (len--) {
        char_array_3[i++] = *(data++);

        if (i == 3) {
            char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
            char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
            char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
            char_array_4[3] = char_array_3[2] & 0x3f;

            for (i = 0; i < 4; i++) {
                encoded += base64_chars[char_array_4[i]];
            }
            i = 0;
        }
    }

    if (i) {
        for (int j = i; j < 3; j++) {
            char_array_3[j] = '\0';
        }

        char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
        char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
        char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);

        for (int j = 0; j < i + 1; j++) {
            encoded += base64_chars[char_array_4[j]];
        }

        while (i++ < 3) {
            encoded += '=';
        }
    }

    return encoded;
}

string base64Decode(const string& encoded) {
    int in_len = encoded.size();
    int i = 0, j = 0, in = 0;
    unsigned char char_array_4[4], char_array_3[3];
    string decoded;

    while (in_len-- && (encoded[in] != '=') && isalnum(encoded[in]) || encoded[in] == '+' || encoded[in] == '/') {
        char_array_4[i++] = encoded[in++];

        if (i == 4) {
            for (i = 0; i < 4; i++) {
                char_array_4[i] = base64_chars.find(char_array_4[i]);
            }

            char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
            char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
            char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

            for (i = 0; i < 3; i++) {
                decoded += char_array_3[i];
            }
            i = 0;
        }
    }

    if (i) {
        for (j = 0; j < i; j++) {
            char_array_4[j] = base64_chars.find(char_array_4[j]);
        }

        char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
        char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);

        for (j = 0; j < i - 1; j++) {
            decoded += char_array_3[j];
        }
    }

    return decoded;
}

int main() {
    string input = "Hello, World!";

    cout << "Original: " << input << endl;

    string encoded = base64Encode((unsigned char*)input.c_str(), input.length());
    cout << "Base64: " << encoded << endl;

    string decoded = base64Decode(encoded);
    cout << "Decoded: " << decoded << endl;

    return 0;
}

Java Implementation (using built-in)

import java.util.Base64;
import java.nio.charset.StandardCharsets;

public class Base64Encoding {
    public static void main(String[] args) {
        String input = "Hello, World!";

        // Encode
        String encoded = Base64.getEncoder().encodeToString(input.getBytes());
        System.out.println("Original: " + input);
        System.out.println("Base64: " + encoded);

        // Decode
        byte[] decoded = Base64.getDecoder().decode(encoded);
        String decodedStr = new String(decoded, StandardCharsets.UTF_8);
        System.out.println("Decoded: " + decodedStr);
    }
}

URL Encoding (Percent Encoding)

URL encoding: Encodes special characters as %XX where XX is hexadecimal.

Characters Requiring Encoding

Space: %20
!: %21
#: %23
$: %24
%: %25
&: %26
': %27
(: %28
): %29

C++ Implementation

#include <iostream>
#include <string>
#include <sstream>
#include <iomanip>
using namespace std;

string urlEncode(const string& str) {
    ostringstream escaped;
    escaped.fill('0');
    escaped << hex;

    for (char c : str) {
        if (isalnum(c) || c == '-' || c == '_' || c == '.' || c == '~') {
            escaped << c;
        }
        else {
            escaped << uppercase;
            escaped << '%' << setw(2) << int((unsigned char)c);
            escaped << nouppercase;
        }
    }

    return escaped.str();
}

string urlDecode(const string& str) {
    string decoded;
    for (size_t i = 0; i < str.length(); i++) {
        if (str[i] == '%') {
            if (i + 2 < str.length()) {
                string hex = str.substr(i + 1, 2);
                int value = stoi(hex, nullptr, 16);
                decoded += static_cast<char>(value);
                i += 2;
            }
        }
        else if (str[i] == '+') {
            decoded += ' ';
        }
        else {
            decoded += str[i];
        }
    }
    return decoded;
}

int main() {
    string url = "Hello World! @#$";

    cout << "Original: " << url << endl;

    string encoded = urlEncode(url);
    cout << "URL Encoded: " << encoded << endl;

    string decoded = urlDecode(encoded);
    cout << "Decoded: " << decoded << endl;

    return 0;
}

Comparison of Encoding Schemes

EncodingPurposeOutput SizeUse Case
ASCIIText1 byte/charEnglish text
UTF-8Unicode text1-4 bytes/charInternational text
HexBinary display2 chars/byteDebugging, hashes
Base64Binary in text~133%Email, JSON, URLs
URL EncodingWeb URLsVariesQuery strings

Common Mistakes

  1. Confusing encoding with encryption - Encoding is not secure
  2. Not handling padding - Base64 requires proper padding
  3. Character set issues - Assuming ASCII when UTF-8 is needed
  4. Buffer overflows - Not allocating enough space for encoded data
  5. Double encoding - Encoding already encoded data
  6. Wrong URL encoding - Not encoding all special characters
  7. Byte order issues - Endianness in multi-byte encodings

Debugging Tips

  1. Use online tools - Verify encoding/decoding
  2. Print byte values - Inspect actual bytes
  3. Test with known inputs - "Man" → "TWFu" for Base64
  4. Check padding - Base64 padding must be correct
  5. Validate UTF-8 - Ensure valid byte sequences
  6. Compare with standards - RFC specifications
  7. Handle edge cases - Empty strings, special characters

Mini Exercises

  1. Implement Base32 encoding
  2. Convert between different encodings
  3. Validate UTF-8 sequences
  4. Implement ROT13 cipher
  5. Create ASCII art encoder
  6. Build HTML entity encoder
  7. Implement quoted-printable encoding
  8. Create Unicode normalizer
  9. Build Base85 (ASCII85) encoder
  10. Implement punycode for IDN

Review Questions

  1. What's the difference between encoding and encryption?
  2. Why is Base64 used for binary data in emails?
  3. How does UTF-8 maintain ASCII compatibility?
  4. When should you use URL encoding?
  5. What are the advantages of hexadecimal representation?

Reference Checklist

By the end of this chapter, you should be able to:

  • Convert between ASCII and binary
  • Understand UTF-8 encoding
  • Implement hexadecimal encoding/decoding
  • Implement Base64 encoding/decoding
  • Use URL encoding for web applications
  • Choose appropriate encoding for use case
  • Handle multi-byte character encodings
  • Debug encoding issues

Next Steps

Chapter 18 explores advanced algorithm paradigms—Divide & Conquer, Greedy, Dynamic Programming, and Backtracking. These fundamental techniques solve complex problems efficiently.


Key Takeaway: Encoding transforms data representation without changing meaning. Base64 enables binary data in text formats. UTF-8 supports international characters. Hexadecimal provides readable binary representation. Understanding encoding is essential for web development, data interchange, and system integration.