Git
day 9
gdb Demonstration
Here is the relevant mailing list discussion that I based this on https://lore.kernel.org/git/xmqqbjjqslgq.fsf@gitster.g/T/#m7d9288bd28a9f9a51781bf42330a9c15fe9016ff
There is a bug that offers the chance to learn how to use gdb. If you are feeling ambitious, go ahead and attempt solving this on your own before reading my proposed solution.
The bug is that the command `git restore -source branch` gives an error message that seems to be a typo.
$ git checkout 66ce5f8e8872f0183bb137911c52b07f1f242d13
Go ahead and try it on that commit and then continue reading for a demonstration of how to use gdb to diagnose what is going wrong.
Tutorial
Build git with meson (or use the Makefile). See here.
$ meson setup build/ $ cd build $ meson compile
Use gdb to find the bug
$ gdb ./git (gdb) run restore -source branch
Aha! There is the error
fatal: could not resolve ource
which seems like a typo? …
It isn't. Set a break point for the `die` function. Yes that is really what the function responsible for bringing you that fatal error is. Run until you get there and observe.
(gdb) break die
(gdb) r
Starting program: ~/repos/github.com/git/git/build/git restore -source branch
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Breakpoint 1, die (err=0x5555559d8b6a "could not resolve %s") at ../usage.c:202
202 {
(gdb)
You will see the reason for the issue if you have the coding guidelines in your marrow (further inspection is needed to understand why the bug happens).
From the `Documentation/CodingGuidelines.adoc` we have:
Enclose the subject of an error inside a pair of single quotes
among other tips. And that is how we fix it, wrapping that %s with single quotes.
It feels like I am only scratching the surface of what is possible with gdb, but I am happy to demonstrate some usage based on a trivial patch. I have plans to look for more interesting one line changes and continue this journey in the future.
day 8
I am not sure how to best use this site. It seems that I need more structure. With that said, I am returning to the baby-git repository and the Decoding Git book in order to understand the initial commit of Git.
My goal is to get through this book before January 22nd which is when the next semester starts. I've gone through Part 1 already, so I am reviewing that and then moving on to Part 2.
It'd be kind of neat to traverse the entirety of Git's code by starting at the initial commit and checking out the next commit until getting to the HEAD.
Here is a summary of the history of Git:
https://about.gitlab.com/blog/journey-through-gits-20-year-history/
I used grep in the git.github.io repo in order to find where people were talking about "Emacs", "Magit", "GDB" or "gdb". These tools seem important to me, and I'd like to see if other people are using them. I installed the lei package, but haven't set much up with it or unsubscribed from the mailing list.
day 7
I am setting up an agenda to outline my studies leading up to March 31st when the GSoC applications are due. I also cloned the git.github.io repo so I can see the developer pages from Emacs. This is important because when they release those microprojects, I want to be ready to work on them, so I will be fetching the changes to this repo on Emacs startup. I found the fun Gitstery repository which is a murder mystery that you can solve by using Git commands.
day 6
I'm moving all of my private repositories off of GitHub and using Git over SSH to work on them from my laptop.
I learned about `git rebase` and feel enlightened. The words my high school math teacher told me when he found out that I wanted to study computer science echo in my head
With great power comes great responsibility.
day 5
Today I learned how to install Git on Windows and helped install Git on a colleagues machine. They use Codium, which seems to be pretty similar to VSCode. It was satisfying installing git and then seeing their editor now capable of committing their changes.
day 4
day 3
Using the baby-git and Decoding Git book, I've found the following resources;
- https://www.gnu.org/software/libc/manual/html_node/index.html
- https://pubs.opengroup.org/onlinepubs/9699919799/
- https://www.cplusplus.com/reference/
- https://zlib.net/manual.html
- https://zlib.net/zlib_how.html
Git uses the SHA-1 hash function to map file contents to hash values.
There are the following four basic components in Git's initial commit:
- objects
- an object database
- a current directory cache
- a working directory
Objects
Object types:
- blob
- tree
- commit
An object is an abstraction of data and metadata. It is indexed and referenced through its hash value. The name of an object is its hash value. This hash value is used to refer and look up to the specific content.
The general structure of an object is:
object tag ' ' (single space) size of object data (in bytes) '\0' (null character) object binary data
The first part of an object consists of the object metadata. The second part consists of the object data (the binary data). The space and null byte are used to separate the two. The object tag is simply what type of object it is (one of: blob, tree, commit) and the size of the object data in bytes before deflation.
Blobs (binary large objects)
Whoa that just blew my mind, didn't know blobs are just binary large objects. Any file that the user adds could be a blob, it should be the binary representation of a video, plain text, or any file. Git generates a blob object that is named, indexed and referenced through the deflated blob objects SHA-1 hash value.
Tree object
A tree object contains a list of files added to a repository. Each file has a mode, path and spa-1 hash. The size of the tree is the sum of the sizes of the file information entries in the tree object data.
Commit object
A commit contains the hash value of a tree object being committed and the hash value of any parent tree objects specified by the user, metadata about the user who committed the tree, the time and date when the commit was made and a user-supplied comment known today as the commit message.
day 2
Today I am exploring the Git source code and trying to figure out how things work.
Finding list of commands
Git has lots of commands. Here is how you can find where the commands are in the source. I used the command
grep -nr "list of commands"
to find that there is a list of commands in the git.c file;
Documentation/MyFirstContribution.adoc:220:The list of commands lives in `git.c`.
In that file is the list of commands. Here are the first and last five;
"add" "am" "annotate" "apply" "archive" ... "verify-tag" "version" "whatchanged" "worktree" "write-tree"
How does `git add` work?
Let's focus on a command I've probably used hundreds of times already:
git add
We can find the following in builtins/add.c;
static struct option builtin_add_options[] = {
OPT__DRY_RUN(&show_only, N_("dry run")),
OPT__VERBOSE(&verbose, N_("be verbose")),
OPT_GROUP(""),
OPT_BOOL('i', "interactive", &add_interactive, N_("interactive picking")),
OPT_BOOL('p', "patch", &patch_interactive, N_("select hunks interactively")),
OPT_DIFF_UNIFIED(&add_p_opt.context),
OPT_DIFF_INTERHUNK_CONTEXT(&add_p_opt.interhunkcontext),
OPT_BOOL('e', "edit", &edit_interactive, N_("edit current diff and apply")),
OPT__FORCE(&ignored_too, N_("allow adding otherwise ignored files"), 0),
OPT_BOOL('u', "update", &take_worktree_changes, N_("update tracked files")),
OPT_BOOL(0, "renormalize", &add_renormalize, N_("renormalize EOL of tracked files (implies -u)")),
OPT_BOOL('N', "intent-to-add", &intent_to_add, N_("record only the fact that the path will be added later")),
OPT_BOOL('A', "all", &addremove_explicit, N_("add changes from all tracked and untracked files")),
OPT_CALLBACK_F(0, "ignore-removal", &addremove_explicit,
NULL /* takes no arguments */,
N_("ignore paths removed in the working tree (same as --no-all)"),
PARSE_OPT_NOARG, ignore_removal_cb),
OPT_BOOL( 0 , "refresh", &refresh_only, N_("don't add, only refresh the index")),
OPT_BOOL( 0 , "ignore-errors", &ignore_add_errors, N_("just skip files which cannot be added because of errors")),
OPT_BOOL( 0 , "ignore-missing", &ignore_missing, N_("check if - even missing - files are ignored in dry run")),
OPT_BOOL(0, "sparse", &include_sparse, N_("allow updating entries outside of the sparse-checkout cone")),
OPT_STRING(0, "chmod", &chmod_arg, "(+|-)x",
N_("override the executable bit of the listed files")),
OPT_HIDDEN_BOOL(0, "warn-embedded-repo", &warn_on_embedded_repo,
N_("warn when adding an embedded repository")),
OPT_PATHSPEC_FROM_FILE(&pathspec_from_file),
OPT_PATHSPEC_FILE_NUL(&pathspec_file_nul),
OPT_END(),
};
Okay … That is how it works.
day 1
TIL about `git shortlog`
This feature is awesome, you can use it to easily see how many commits people are making to a repository. With the command
git shortlog -ns
you are able to see who has committed the most to a repository. Here is the output of that command on the Git repository;
27457 Junio C Hamano 4611 Jeff King 2390 Johannes Schindelin 1945 Ævar Arnfjörð Bjarmason 1824 Nguyễn Thái Ngọc Duy 1810 Patrick Steinhardt 1401 Shawn O. Pearce 1314 René Scharfe 1203 Elijah Newren 1118 Linus Torvalds 954 Michael Haggerty 902 brian m. carlson
Fascinating. Junio C Hamano is legendary!
git baby steps
My goal is to improve enough such that I am able to communicate and contribute to whatever project.
These notes are meant to serve as a road map for people that come after me, and are looking to learn enough to be useful and contribute to Git.