Want more? Subscribe to my free newsletter:

Pro-tip: How To Backup All Of Your GitHub Repositories In One Go

March 27, 2012

Today, I’m going to present solutions for easily backing up all of your GitHub repositories in just a step or two. Minimal knowledge of GitHub fundamentals are assumed but if you have experience with writing in Ruby that'll be a plus. This won't however hinder your ability to get the most out of this post, so let's get started.

Discussion

We live in a great time for open-source development and the availability of free plans for services like GitHub and Google Code have undeniably had an implicit affect on our ability to push projects out as a community.

One side-effect of this that with each new commit we make, we have an increasing reliance on these services having both perfect uptime and always being available. So..crazy thought of the day: what if GitHub wasn’t available for a few hours? Or even more pragmatically, what if you were concerned about your cloud-hosted repositories simply dissapearing.

I might not be a typical user. I have a (relatively) large number of GitHub repos and almost no single system at work or at home has a local, up to date copy of all of these on them.

I was reminded about this the other day and thought it would be useful useful to share some solutions for backing up an account with just a few keystrokes at the terminal.

Solutions

Backup all repositories (but no additional content: e.g wikis, forks)

We're initially going to review the simplest of solutions. These will allow you to get copies of every one of your repositories locally, but won't include additional content such as pull requests, comments, wikis, forks or issues. If backing up this information (and other meta-content) is important to you, please review Option 3 lower down. 

Option 1: GitBack

The first solution we're going to look at is a simple Ruby script I’m calling ‘GitBack’. With it, you simply customize it with your GitHub username and it then maps through all the repositories on your account, locally cloning them using the GitHub API. By default, it includes timestamps in your local directory name for each backup, however you can easily opt for something simpler if you prefer.

#gitback 0.1
#credits: Walter White, Updates: Addy Osmani
#!/usr/bin/env ruby

# dependencies
require "yaml"
require "open-uri"

# your github username
username = "addyosmani"

time = Time.new
# feel free to comment out the option you don't wish to use.
backupDirectory = "/backups/github/#{time.year}.#{time.month}.#{time.day}"
#or simply: backupDirectory = "/backups/github/"

#repositories =
# .map{|r| %Q[#{r[:name]}] }

#FileUtils.mkdir_p #{backupDirectory}

YAML.load(open("http://github.com/api/v2/yaml/repos/show/#{username}"))['repositories'].map{|repository|

    puts "discovered repository: #{repository[:name]} ... backing up ..."
    #exec
    system "git clone git@github.com:#{username}/#{repository[:name]}.git #{backupDirectory}/#{repository[:name]}"
}

This can then be run at the terminal as follows:

sudo ruby gitback.rb

Most of the credit for this solution goes to Walter White. I only made a few changes which I thought helped improve the overall readability of the code, so props to him! \o/

sudo ruby gitback.rb

Option 2: GitHub-Backup

If you prefer to simply specify the GitHub username and location to backup your repositories at the terminal-level instead, there’s a Ruby Gem available that lets you achieve something very similar. It's called GitHub-Backup and it's written by @ddollar. (Should you need to install the Ruby Gems prerequisite, this will help).

To install GitHub-Backup, simply run:

$ gem install github-backup

and it can then be used as follows (where addyosmani is my username and /storage/backups is the target location for backups)

$ github-backup addyosmani /storage/backups

 

Backup everything (pull requests, forks, issues, wikis and more)

Option 3: GitHub Backup

A much more robust solution to the previous options is GitHub Backup by @joeyh. This is tool that can either run against an account or run in a git repository cloned from GitHub, backing up everything published about the repository. It's backups include:

  • Forks
  • Branches
  • Issues
  • Comments
  • Wikis
  • Milestones
  • Pull requests
  • Watchers

To install GitHub Backup, clone the gitback-backup repo then run make.

git clone git://github.com/joeyh/github-backup
cd github-backup
make

You then have two options. Running github-backup without any additional parameters will simply backup the repository you are currently in, whilst github-backup username (e.g github-backup addyosmani) will backup all of the repositories for a specific user. I personally find the latter the most useful.

All of the data backed up is stored into a branch named "github", which you can of course checkout and store independently as needed. GitHub Backup keeps all backed up content relatively well organized. Each fork gets it's own directory (e.g jquery, modernizr) and within each directory are subdirectories such as jquery/watchers for meta-content.

The only real limitations worth noting (although there are more in the project README) are that GitHub Backup will re-download all issues, comments and content each time it's run. Inline code comments are also not something that can be backed up as GitHub doesn't expose these through their API.

As long as you're okay with these limits, GitHub Backup really is the ideal redundancy solution for power-users. 

That's it!.

Conclusions

Backing up all your repositories might seem like an exercise in overkill for some, but I found the above particularly useful if you have concerns about files in the cloud not always being available.

If you disagree, that's completely understandable, but at minimum the solutions presented are useful for getting all your repositories local before setting off for at trip somewhere without a consistent connection.

In any case, I hope these solutions are useful to someone out there!.