permalink

13

Pro-tip: How To Backup All Of Your GitHub Repositories In One Go

Today, I’m going to present solutions for easily backing up all of your GitHub repositories in just a step or two. Minimal knowledge of GitHub fundamentals are assumed but if you have experience with writing in Ruby that'll be a plus. This won't however hinder your ability to get the most out of this post, so let's get started.

Discussion

We live in a great time for open-source development and the availability of free plans for services like GitHub and Google Code have undeniably had an implicit affect on our ability to push projects out as a community.

One side-effect of this that with each new commit we make, we have an increasing reliance on these services having both perfect uptime and always being available. So..crazy thought of the day: what if GitHub wasn’t available for a few hours? Or even more pragmatically, what if you were concerned about your cloud-hosted repositories simply dissapearing.

I might not be a typical user. I have a (relatively) large number of GitHub repos and almost no single system at work or at home has a local, up to date copy of all of these on them.

I was reminded about this the other day and thought it would be useful useful to share some solutions for backing up an account with just a few keystrokes at the terminal.

Solutions

Backup all repositories (but no additional content: e.g wikis, forks)

We're initially going to review the simplest of solutions. These will allow you to get copies of every one of your repositories locally, but won't include additional content such as pull requests, comments, wikis, forks or issues. If backing up this information (and other meta-content) is important to you, please review Option 3 lower down. 

Option 1: GitBack

The first solution we're going to look at is a simple Ruby script I’m calling ‘GitBack’. With it, you simply customize it with your GitHub username and it then maps through all the repositories on your account, locally cloning them using the GitHub API. By default, it includes timestamps in your local directory name for each backup, however you can easily opt for something simpler if you prefer.

#gitback 0.1
#credits: Walter White, Updates: Addy Osmani
#!/usr/bin/env ruby

# dependencies
require "yaml"
require "open-uri"

# your github username
username = "addyosmani"

time = Time.new
# feel free to comment out the option you don't wish to use.
backupDirectory = "/backups/github/#{time.year}.#{time.month}.#{time.day}"
#or simply: backupDirectory = "/backups/github/"

#repositories =
# .map{|r| %Q[#{r[:name]}] }

#FileUtils.mkdir_p #{backupDirectory}

YAML.load(open("http://github.com/api/v2/yaml/repos/show/#{username}"))['repositories'].map{|repository|

    puts "discovered repository: #{repository[:name]} ... backing up ..."
    #exec
    system "git clone git@github.com:#{username}/#{repository[:name]}.git #{backupDirectory}/#{repository[:name]}"
}

This can then be run at the terminal as follows:

sudo ruby gitback.rb

Most of the credit for this solution goes to Walter White. I only made a few changes which I thought helped improve the overall readability of the code, so props to him! \o/

sudo ruby gitback.rb

Option 2: GitHub-Backup

If you prefer to simply specify the GitHub username and location to backup your repositories at the terminal-level instead, there’s a Ruby Gem available that lets you achieve something very similar. It's called GitHub-Backup and it's written by @ddollar. (Should you need to install the Ruby Gems prerequisite, this will help).

To install GitHub-Backup, simply run:

$ gem install github-backup

and it can then be used as follows (where addyosmani is my username and /storage/backups is the target location for backups)

$ github-backup addyosmani /storage/backups

 

Backup everything (pull requests, forks, issues, wikis and more)

Option 3: GitHub Backup

A much more robust solution to the previous options is GitHub Backup by @joeyh. This is tool that can either run against an account or run in a git repository cloned from GitHub, backing up everything published about the repository. It's backups include:

  • Forks
  • Branches
  • Issues
  • Comments
  • Wikis
  • Milestones
  • Pull requests
  • Watchers

To install GitHub Backup, clone the gitback-backup repo then run make.

git clone git://github.com/joeyh/github-backup
cd github-backup
make

You then have two options. Running github-backup without any additional parameters will simply backup the repository you are currently in, whilst github-backup username (e.g github-backup addyosmani) will backup all of the repositories for a specific user. I personally find the latter the most useful.

All of the data backed up is stored into a branch named "github", which you can of course checkout and store independently as needed. GitHub Backup keeps all backed up content relatively well organized. Each fork gets it's own directory (e.g jquery, modernizr) and within each directory are subdirectories such as jquery/watchers for meta-content.

The only real limitations worth noting (although there are more in the project README) are that GitHub Backup will re-download all issues, comments and content each time it's run. Inline code comments are also not something that can be backed up as GitHub doesn't expose these through their API.

As long as you're okay with these limits, GitHub Backup really is the ideal redundancy solution for power-users. 

That's it!.

Conclusions

Backing up all your repositories might seem like an exercise in overkill for some, but I found the above particularly useful if you have concerns about files in the cloud not always being available.

If you disagree, that's completely understandable, but at minimum the solutions presented are useful for getting all your repositories local before setting off for at trip somewhere without a consistent connection.

In any case, I hope these solutions are useful to someone out there!.

13 Comments

  1. +1′ed, Bookmarked. Great post. Only thing that caught me (as a Windows user) is that the github username is case sensitive. After that it worked beautifully.

    Thank you!

  2. Hey Addy.

    Recently I have been thinking about this, and it has me worried. Every so often we hear stories of “cloud services” going offline .These companies usually have an ‘online status’ blog notifying users of when their servers have been hit, or otherwise taken offline. Usually when that happens, there is either a loss of revenue, or in most cases users end up not trusting the company they invested their time (or in some cases, money) in.

    My solution was to create an independent, separate repository (based on the cloud – but we’ll get to that later) … that curates, and helps expose repositories that I, personally thought deserve the most recognition.

    The solution? The Github Pirates…

    Last updated: 2/FEB/2012
    Number of Projects Plundered: 60+

    The Github Pirates is a constantly updated resource for those interested in HTML5, CSS3, and JavaScript. Despite the name, the project doesn’t actually pirate Github repositories, but gathers open-source content from Github and puts it into a single repository for the following reasons:

    Ease of reference
    Quality
    Preservation
    Exposure

    Your jQuery UI bootstrap tool was included in this. I sent a tweet to you regarding this, however I’d imagine you are over-run with @mentions with your new GC dev-relations thing.

    Anyways; here’s the link. Look forward to engaging more with you on twitter. BTW, are you on AIM?

    http://codepirat.es/

    PS * Multiple cloud services for extra security, and peace of mind

  3. Pingback: Recent Tech Tweets | Toby's Technical Ramblings

  4. Pingback: Pulling back all repos of a github user | Onoffswitch.net

Leave a Reply

Required fields are marked *.