GitHub API repo size

This is like a black hole out there… Nobody can tell you exactly the answer. But the answer is not simple at all…
Ok, first of all, there’s a pretty big difference between how the size is calculated for GitHub.com and for GitHub Enterprise.

The short answer is that the size shown doesn’t match the repo’s size after clone. The size shown is as close as possible to the .git folder present in your repo.

Now the long answer:

NOTE: Git repositories on the server end are bare git repositories. They have no working directory, no working files. You can’t commit to a bare repository, you can’t run anything that acts on a working tree of files. Bare repositories only contain some meta repository information, including the configuration, and the objects/packfiles that all copies of the repository contain.
In short, a bare git repository is only the contents of the .git folder that a regular clone contains. That folder’s size makes up the size of the repository on the server side.

For github.com: The size attribute represents disk usage in KB, however due to the way github stores repositories, it may be way off for large repositories and repositories with a lot of forks (i.e. large networks).
Currently, making fresh clones of the repositories and checking their sizes on disk would be a very good approximation.
Yeah I know, pretty vague…

For GitHub Enterprise:
The .git folder on the server and on a clone are not identical. Additional objects are stored on the server that are not transferred when cloning a repository. The most notable differences are that server side repos contain audit logs, tracking all pushes made against the repository, and the server side repository has additional objects in the form of objects used to power pull requests.

There are two sets of logs, the push audit log contents duplicated in the repository’s directory. The first is an overall audit_log that records all pushes, the other is a per-branch version of the audit_log.

Pull requests work by inserting git objects from forks into the parent repository. Each pull request stores all of the objects from the base commit forward. Including a test merge of the changes made in the branch to the base it was based on. An actual git merge is run for each pull request, but the result is not stored in the repository’s history. The result of this merge is how pull requests determine whether or not a pull request can be merged.

These additional objects server side are the likely reason why the API call returns larger sizes than you see locally. You mentioned that the older the project, the bigger the discrepancy. Between the growth of the audit log, and the amount of pull requests, this should still add up as well.

Since all of these additional objects do not get transferred to clones of the repository, there will be discrepancies from a local clone’s disk usage and the server side copy. If you have admin access to your organization’s GitHub Enterprise instance and have your public key installed at http(s)://[your-hostname]/setup/settings , you can compare repository disk usage by logging in via SSH ( ssh -i /path/to/your/private.key admin@[your-hostname] ), and running du against the repository’s directory ( du -sk /data/repositories/[owner]/[repository-name].git ). Note that GNU/Linux’s du command doesn’t have -I, the closest equivalent is –exclude. However as I previously stated, server side clones don’t have working directory files, the repository only consistents of the contents of a .git directory, so there’s no need to ignore any files/paths anyway. One last related aside: Mac OS X uses BSD coreutils, where Linux distributions use GNU’s.

One last note is that a repository’s disk usage is cached, so the size returned by du via these instructions may not immediately match the sizes reported on the web in the admin pages, or via the API.

THE END: I know this may not be the best explanation but it gives you a pretty good idea on what’s with the github size.

Source: GitHub Staff

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s