Sunday, November 29, 2015

How to version large files with Git LFS

Versioning large files (such as audio samples, videos, datasets, and graphics) can be difficult when working with distributed version control systems like Git. Fortunately, a new extension to Git makes handling of large files easier: Git Large File Storage (LFS) is an open-source project that replaces large files with text pointers inside Git, while storing the contents of the files on a remote server like GitHub or an AWS bucket.


Installation

Installers for Mac, Linux, and Windows are available online at git-lfs.github.com. This site also contains the following brief installation guide. In essence, you only need to download the installer, decompress it, and run the installation script. If you have a Mac, Git LFS is also available via Homebrew:

$ brew install git-lfs

After running the installation script, set up LFS via the following command:

$ git lfs install

Tracking file types

All you need to do now is to tell Git LFS which file types to track. Navigate to your Git repository, and issue a git lfs track command. For example, if you want Git LFS to automatically handle all .mat files in your repository (although it's rarely a smart idea to have binaries under version control), you would call:

$ git lfs track "*.mat"

If your Git repository has subdirectories, you can use globbing to track all .mat files in all subdirectories:

$ git lfs track "**/*.mat"

Or you can track single files:

$ git lfs track myLargeFile.mat

That's it! Continue your work using git commit and git push as usual.

Storing large files

If you have tried uploading large files to the remote repository before, you might have noticed a warning popping up telling you that GitHub does not recommend to upload files larger than 50MB. You won't even be able to upload files larger than 100MB. With Git LFS installed, the file will instead be uploaded to a dedicated remote host that is different from your remote repository, and the git push command will go through as usual:

$ git commit -am "add large file"
$ git push origin master

Instead of storing the file in the remote repository, Git LFS will upload only a small file reference. If you try to inspect the file on GitHub, you will only find the following note:

Back in the local repository, you will notice that the file is still accessible, until you switch branches.

Retrieving large files

As soon as you switch branches, the locally stored binaries will be gone. If you now inspect the file controlled by Git LFS, all you will find is a tiny text file that might look something like this:

version https://git-lfs.github.com/spec/v1
oid sha256:d63d7c81d9191f17263b0c65f97101083dade9637e069aea23c6be778cbf89bdf
size 68536835

So where did your file go, you might ask? It is still on the LFS remote host. To download the file from the remote host, use the following command:

$ git lfs fetch

To see a list of all LFS-related commands, simply type:

$ git lfs