Friday, January 8, 2016

12 advanced Git commands I wish my co-workers would know

© xkcd

Once you have internalized the basic work flow, Git is a powerful tool for distributed version control that offers a lot of advantages over more clunky alternatives like SVN. You clone, you pull, you commit, you push; nothing simpler than that. Right.

But then you find yourself stuck with a merge conflict, and git sends you down the rabbit hole. Or you accidentally added a commit to the wrong branch and already pushed it to the remote repo. Or you need to switch to a different branch (just for a second!) but git won't let you because you have unsaved changes. And what if you need to patch your code with that one commit from a completely different branch (but nothing else)?

The following guide compiles a list of useful advanced Git commands that will make your everyday coding life easier. Oh, and make sure your co-workers know them, too...

1. Pull upstream changes with rebase instead of merge

Because branch merges are recorded with a merge commit, they are supposed to be meaningful. For example, they can be used to indicate that a new feature has been merged into a release branch. However, when multiple team members work on a single project and sync branches with a regular git pull, the commit timeline gets polluted with unnecessary merge commits. A better alternative might be to use git rebase to rebase a feature branch into a master branch:

$ git checkout feature
$ git rebase master

This will move the entire feature branch to begin on the tip of the master branch, effectively incorporating all of the new commits in master. But, instead of using a merge commit, rebasing re-writes the project history by creating brand new commits for each commit in the original branch.

The major benefit of rebasing is that you get a much cleaner project history. The pitfalls of rebasing are discussed here.

2. Resolve merge conflicts after a Git rebase

With great power comes great responsibility. When you perform a git rebase, chances are you will run into a situation where a merge conflict is introduced. A merge conflict indicates that two commits modified the same line in the same file, and Git does not know which change to apply. This will result in an error message like the following:

You are given three choices to fix the commit that is causing a conflict (fa39187):

  • You can run git rebase --abort to completely undo the rebase. This will unwind all rebase changes and put the branch back in the state it was in before git rebase was called.
  • You can run git rebase --skip to completely skip the commit. Hence none of the changes introduced by the problematic commit will be included in the history.
  • You can fix the conflict using the standard procedure for merge conflict resolution.

3. Temporarily stash changes

With a work in progress, things are often in a messy state. So what if you need to switch to a different branch (just for a second!) to work on something else? Git won't let you because you have unsaved changes, and frankly you do not want to commit half-baked work just so that you can come back to it and fix it later. The answer to this dilemma is the git stash command.

Stashing takes the dirty state of your working directory (i.e., your modified tracked files and staged changes) and saves it on a stack of unfinished changes that you can re-apply at any time. Your work is stashed with the following command:

$ git stash
Saved working directory and index state WIP on feature: 3fc175f fix race condition
HEAD is now at 3fc175f fix race condition

The working directory is now clean:

$ git status
# On branch feature
nothing to commit, working directory clean

You can now safely switch branches and work on other stuff. But don't worry, the stashed away commits are still around:

$ git stash list
stash@{0}: WIP on feature: 3fc175f fix race condition

Later on, when you are back on the feature branch, you can re-apply all stashed away changes:

$ git stash pop
On branch feature
Changes not staged for commit:
  (use "git add ..." to update what will be committed)

     modified:   index.html
Dropped refs/stash@{0} (ac2321cc3a33ba712b8e50c99a99d3c20da9d6b8)

There are a bunch of other handy options when it comes to stashing:

$ git stash save "describe it"   # give the stash a name
$ git stash clear                # delete a stashed commit
$ git stash save --keep-index    # stash only unstaged files

4. Clone a specific remote branch

What if you want to clone only a specific branch from a remote repo? Usually with git clone, you would have to clone all the other branches, too. A handy alternative is to use git remote add:

$ git init  
$ git remote add -t <remoteBranchName> -f origin <remoteRepoUrlPath>
$ git checkout <localBranchName>

5. Merge a cherry-picked remote commit with your branch

Even more wicked, what if you want only a specific commit from a remote repo on your branch? You can use git cherry-pick to cherry-pick the commit with a given SHA and merge it into the current branch:

$ git cherry-pick <commitSHA>

6. Apply a patch from an unrelated local repository

What if you need to apply a patch from a commit on some other unrelated local repository to your current repository? Here is a shortcut:

$ git --git-dir=<pathToOtherLocalRepo>/.git format-patch -k -1 --stdout <otherLocalCommitSHA> | git am -3 -k

7. Ignore changes in a tracked file

If you and your co-workers are operating on the same branch, chances are you are going to git merge or git rebase quite often. However, this may reset your environment-specific configuration files, which you would then have to change after every merge. Instead, you can use the following command to permanently tell git not to touch a certain local file:

$ git update-index --assume-unchanged <pathToLocalFile>

8. Have git pull running every X seconds, with screen

Often merge conflicts happen simply because the local repo you are working on does no longer reflect the current state of the remote repo. This is why doing a git pull first thing in the morning might be a good idea. Alternatively, you could have a script running in the background (or using GNU Screen) that calls git pull say every X seconds:

$ screen
$ for((i=1;i<=10000;i+=1)); do sleep X && git pull; done

Use Ctrl+a Ctrl+d to detach the screen.

9. Split a subfolder out into a new repository

Sometimes you may want to turn a specific folder within your Git repo into a brand new repo. This can be done with git filter-branch:

$ git filter-branch --prune-empty --subdirectory-filter <folderName> master
# Filter the master branch to your directory and remove empty commits
Rewrite 48dc599c80e20527ed902928085e7861e6b3cbe6 (89/89)
Ref 'refs/heads/master' was rewritten

The repository now contains all the files that were in the specified subfolder. Although all of your previous files have been removed, they still exist within the Git history. You can now push your new local repo to the remote.

10. Clean

Sometimes git might be complaining about "untracked working tree files" that would be "overwritten by checkout". This can have a variety of reasons. However, often these issues can be prevented by keeping your working tree clean using the following commands:

$ git clean -f     # remove untracked files
$ git clean -fd    # remove untracked files/directories
$ git clean -nfd   # list all files/directories that would be removed

11. Tar project files, excluding .git directory

Sometimes you may want provide a copy of your project to an outside member that does not have access to the repo on GitHub. The easiest way is to tar or zip up all project files. However, if you are not careful, the hidden .git directory will be included in the tar ball, too, which drastically inflate the file size and cause the recipient a lot of headaches if the files are being mixed with their own git repo.

An easier alternative is simply to automatically exclude the .git directory from the tar ball:

$ tar cJf <projectName>.tar.xz <projectFolder>/ --exclude-vcs

12. Who f'ed it all up?

Finally, when all hell breaks loose, you may find no other way out than to assign blame. If your production server is broken, it is very easy to find the culprit: Just do a git blame. This will reveal for every line in the file the author, the commit hash that saw the last change in that line, and the timestamp of the commit:

$ git blame <fileName>

But actually... Come to think of it, it might be best to keep this last one to yourself... ;-)