Internals of Git [TBD]
Last updated
Was this helpful?
Last updated
Was this helpful?
Git's internal workings is quite complex, involving a range of data structures and algorithms to manage the repository's history, branches, commits, and more. Here's a high-level overview of some key aspects of Git's internal workings.
The object model in Git is a core component that underpins how Git stores and manages data. In Git, everything is stored as an object, and these objects are content-addressable, meaning they are referenced by a hash of their content. The primary types of objects in Git are blobs, trees, commits, and tags. Here's a detailed look at each of these objects and how they work together.
Purpose: Represents the content of a file.
Content: Stores the raw file data (i.e., the contents of a file).
Identification: Identified by a SHA-1 hash of the file content.
Metadata: Does not store any metadata such as the file name or permissions.
Example of creating a blob:
Purpose: Represents a directory and its contents.
Content: Contains references (hashes) to blobs and other trees, along with the associated metadata (file names, types, and modes).
Identification: Identified by a SHA-1 hash of its contents.
Example of creating a tree:
Purpose: Represents a snapshot of the repository at a point in time.
Content: Contains metadata (author, committer, message), a reference to a tree object (the state of the file system), and references to parent commits.
Identification: Identified by a SHA-1 hash of its contents.
Example of creating a commit:
Purpose: Marks a specific commit as significant, often used to mark release points.
Content: Contains metadata (tagger, message) and a reference to a commit object.
Identification: Identified by a SHA-1 hash of its contents.
Example of creating an annotated tag:
Content-Addressable Storage:
Git uses SHA-1 hashes to uniquely identify each object. The SHA-1 hash is derived from the content of the object.
The content of each object is stored in the .git/objects
directory in a subdirectory named after the first two characters of the hash, with the remaining characters as the filename.
Loose Objects and Packed Objects:
Loose Objects: Individual files stored in the .git/objects
directory. These are typically created during initial commits.
Packed Objects: Over time, loose objects are packed into packfiles to save space and improve performance. Packfiles are stored in the .git/objects/pack
directory.
Blob Creation:
When we add a file to Git (git add
), Git creates a blob object containing the file's content.
Tree Creation:
When you commit changes (git commit
), Git creates a tree object representing the directory structure, containing references to blob objects for files and other tree objects for subdirectories.
Commit Creation:
A commit object is created referencing the tree object and containing metadata about the commit (author, message, parent commits).
Tag Creation:
A tag object is created to reference a specific commit, often annotated with additional information (tagger, message).
Here's a simple visualization of how these objects might be linked together.
Reference -