April 23, 2013

Repoman

So here's the scenario. A year or two ago we had an SVN server. In house because it was easy enough to set up and the options for hosted services weren't as abundant and cool as they are now. We had all sorts of space to commit anything and everything to SVN and the added benefit of local network speeds when checking out repositories.

I'm not sure if we're alone here at Fuse, but when we say we would commit everything, we mean everything! We were committing our clients site files (sites/default/files), wireframes, database backups and even Photoshop mockups. Most of which were binary files and generally on the larger side and got none of the real benefits of version control. We knew it was a bit taboo, but it seemed so nice and easy when you could check out a repo and have everything necessary to get going on a project in one place.

After our move from SVN to Git we realized the obvious errors in our ways, but now how do we fix it?

Enter the filter-branch command, it traverses your entire git repo and expunges any existence of files/directories so it's like they never existed. Good for removing things like hard coded password, compiled objects and in our case our design files and databases.

It can take a bit of time to run and since we really haven'€™t got into feature branches, or branching at all it was fairly simple. The one thing to take note of is making sure all of your cloned repos are committed and up to date since we'€™re making some big changes, merging isn'€™t really an option so you'€™ll be creating new clones everywhere you need these files.

git filter-branch -f --tree-filter 'rm -rf source' HEAD

would get rid of our source directory which included mockups and wireframes.

git filter-branch -f --tree-filter 'rm -rf db' HEAD

would get rid of our db backups.

Then the always frightening:

git push origin master --force

One final step that we went with (because it'€™s easier) is cloning your newly cleaned repo and importing it into a new repo. This handles the regeneration of your pack files which tend to take up a bunch of space and can be tricky to get rid of. We did this by renaming the old repo after cleaning, cloning it, then importing the clone into a newly created repo with the original repos name.

After all is said and done you'€™ll have a trimmed down repo with all your commit history intact, save for anything that referenced the files/directories you just deleted.

I'€™m no expert with Git so I may have missed something or there may be better methods so your best bet is to read up on the filter-branch command. If you do find something, please leave a comment. We'€™re only a couple repos in so if anything comes up on our end we'€™ll do the same.