Chapter 0: Introduction to Version Control

Before diving into Git, it's essential to understand what Version Control Systems (VCS) are and why they are the foundation of modern software development.

What is Version Control?

Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. For the examples in this book, you will use software source code as the files being version controlled, though in reality you can do this with nearly any type of file on a computer.

If you are a graphic or web designer and want to keep every version of an image or layout (which you would most certainly want to), a Version Control System (VCS) is a very wise thing to use. It allows you to revert selected files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Using a VCS also generally means that if you screw things up or lose files, you can easily recover.

The Evolution of VCS

1. Local Version Control Systems

Many people's version-control method of choice is to copy files into another directory (perhaps a time-stamped directory, if they're clever). This approach is very common because it is so simple, but it is also incredibly error-prone. It is easy to forget which directory you're in and accidentally write to the wrong file or copy over files you don't mean to.

2. Centralized Version Control Systems (CVCS)

To deal with the problem of collaborating with developers on other systems, Centralized Version Control Systems (CVCSs) were developed. These systems, such as Subversion (SVN) and Perforce, have a single server that contains all the versioned files, and a number of clients that check out files from that central place.

For many years, this was the standard for version control. However, this setup has some serious downsides. The most obvious is the single point of failure that the centralized server represents. If that server goes down for an hour, then during that hour no one can collaborate at all or save versioned changes to anything they're working on. If the hard disk the central database is on becomes corrupted, and proper backups haven't been kept, you lose everything—the entire history of the project except whatever single snapshots people happen to have on their local machines.

3. Distributed Version Control Systems (DVCS)

This is where Git comes in. In a DVCS (such as Git, Mercurial, or Bazaar), clients don't just check out the latest snapshot of the files; rather, they fully mirror the repository, including its full history. Thus, if any server dies, and these systems were collaborating via that server, any of the client repositories can be copied back up to the server to restore it. Every clone is really a full backup of all the data.

Why Use Git?

Speed: Most operations are performed locally, making them nearly instantaneous.
Offline Support: You can commit, branch, and browse history without an internet connection.
Data Integrity: Everything in Git is check-summed before it is stored and is then referred to by that checksum. It's impossible to change the contents of any file or directory without Git knowing about it.
Branching Model: Git allows and encourages you to have multiple local branches that can be entirely independent of each other. The creation, merging, and deletion of those lines of development takes seconds.

In the next chapter, we'll explore the fascinating history of how Git was born out of a crisis in the Linux kernel community.