Authentication
To access the GitHub api you need an access token. You can create one here: https://github.com/settings/tokens.
The access token will require repo scope. When working with this module locally it's probably easiest to put this token in a .env file, and use dot_env to load it. See the python-dotenv for further documentation. Alternatively you may want to save the token in a GitHub Secret, especially if you are planning to use this code as part of a GitHub Action.
GH_TOKEN = os.getenv("GH_TOKEN")
create_github_session(GH_TOKEN)
OrgStats
OrgStats is a class that contains functionality for getting statistics for a GitHub organization.
load_dotenv()
GH_TOKEN = os.getenv("GH_TOKEN")
To use org_stats you need to pass in a token to authenticate the GitHub API, and the name of a GitHub organization. We use ghorgstatstestorg for these examples.
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org
get_repos returns all repositories associated with an organization. We can optionally filter by public status.
test_org.get_repos()
This can also be access via repos, public_repos and private_repos OrgStats attributes
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
assert L(test_org.public_repos).map(lambda x: x.private).unique()[0] == False
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
assert L(test_org.private_repos).map(lambda x: x.private).unique()[0] == True
The repo attributues can be used to access repositories by type, for example accessing only public repos via public_repos
test_org.public_repos
Files can also be access via the files, files_public and files_private attributes.
test_org.get_repo_file_ext_frequency('repo2')
You can also access get_org_snapshot_stats via OrgStats snapshot_stats property.
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org.snapshot_stats
Long view stats
These are the other flavour of GitHub stats, these are traffic stats which include visits to a repository on GitHub and clones of organization repositories. By default GitHub only provides access two recent information for these stats. This means if we want to be able to access longer term information for these stats we need to store and update this information on a regular basis ourselves. This is what the below do in combination with Github actions.
Gets views traffic for repo and saves as csv in save_dir.
repo is an repository under the GitHub Organization.
save_dir is the directory where output CSV should be saved, by default view_data
load is an optional flag which loads data into a Pandas DataFrame, by default False
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org.get_repo_views_traffic(test_org.repos[0], 'test_dir',load=True).head(3)
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org.get_org_views_traffic(load=True).head(3)
assert len(test_org.get_org_views_traffic(load=True).columns)/2 == len(test_org.public_repos)
assert len(test_org.get_org_views_traffic(repos=test_org.repos, load=True).columns)/2 == len(test_org.repos)
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org.get_repo_clones_traffic('repo1',save_dir='test_dir', load=True)
assert len(test_org.get_repo_clones_traffic(test_org.public_repos[1], load=True).columns) == 2
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
assert type(test_org.get_org_clones_traffic(repos=test_org.repos, save_dir='test_dir',load=True)) == pd.core.frame.DataFrame
assert (len(test_org.get_org_clones_traffic(save_dir='test_dir',load=True).columns) /2) == test_org.public_repo_count
from nbdev.export import notebook2script; notebook2script()