Git Architecture and Internals

Git Architecture and Internals

To truly master Git, you must understand how it works under the hood. Git is not just a version control system; it is a content-addressable filesystem with a Version Control System (VCS) built on top of it. This chapter dives into the plumbing that makes Git so efficient and reliable.


1. The Heart of Git: The .git Folder

When you run git init, Git creates a hidden directory named .git. This is where all the "magic" happens. If you delete this folder, your project becomes a simple directory of files again, and all history is lost.

Key Components:

  • objects/: The core "database." Every piece of content (files, directories, commits) is stored here as a compressed file named after its SHA-1 hash.
  • refs/: Pointers to the tips of your branches and your tags. A branch is literally just a small text file containing the SHA-1 of its latest commit.
  • HEAD: A reference to the branch you currently have checked out (e.g., ref: refs/heads/main).
  • index (The Staging Area): A binary file that stores a "map" of your project. It records what your next commit will look like.
  • config: Stores project-specific settings (like remote URLs or user-specific aliases).
  • hooks/: A collection of scripts that Git can run automatically before or after commands like commit or push.
  • info/exclude: Similar to .gitignore, but for patterns you don't want to commit to the repository (local ignores).

2. The Four Git Objects

Git stores its history using a hierarchy of immutable objects. Every object is identified by a 40-character SHA-1 hash.

1. Blobs (Binary Large Objects)

A blob stores the data of a file, but not its filename. If two files in your project have the exact same content, Git stores only one blob and points to it from two different places. This is how Git achieves massive storage efficiency.

2. Trees

A tree object is the Git equivalent of a directory. It lists filenames and pointers to the blobs (files) or other trees (subdirectories) contained within it. It also stores the file permissions (e.g., whether a file is executable).

3. Commits

A commit object is a snapshot of the root tree at a specific point in time. It includes:

  • A pointer to the root tree.
  • The author and committer names/emails.
  • A timestamp.
  • A log message.
  • A pointer to the parent commit(s).

4. Annotated Tags

A tag object points to a commit and contains a name, a message, and a timestamp. It is essentially a permanent, named pointer to a specific point in history.

CommitTreeBlob (File)Blob (File)


3. The Content-Addressable Database

Every object in Git is named using a SHA-1 checksum. This means:

  1. Data Integrity: If a file is corrupted, its hash won't match, and Git will immediately know.
  2. Immutability: You cannot change a commit's message or content without changing its SHA-1 hash. This creates a chain of trust.
  3. Deduplication: As mentioned before, identical files across different branches or history only take up space once.

4. The Three States of Git

Git manages your project across three distinct areas. Understanding the transitions between these areas is key to using Git effectively.

  1. Working Directory: The actual files on your disk that you are currently editing.
  2. Staging Area (Index): A "waiting room" for changes. When you run git add, changes move here.
  3. Repository (.git folder): Where Git permanently stores snapshots as commits.

The Lifecycle of a Change:

  • Modified: You change a file in your working directory.
  • Staged: You mark the modified file in its current version to go into your next commit.
  • Committed: Git takes the staged files and stores them permanently in the repository.

Working DirectoryStaging AreaGit Repositorygit addgit commitgit checkout / switch


5. Summary: Why It Matters

Git’s architecture is designed for speed and reliability. By treating history as a Directed Acyclic Graph (DAG) of immutable objects, Git can perform complex operations like branching and merging in milliseconds. In the next chapter, we will see how these internal mechanisms are triggered by the commands you use every day.