Authentication
To access the GitHub api you need an access token. You can create one here: https://github.com/settings/tokens.
The access token will require repo
scope. When working with this module locally it's probably easiest to put this token in a .env
file, and use dot_env
to load it. See the python-dotenv for further documentation. Alternatively you may want to save the token in a GitHub Secret, especially if you are planning to use this code as part of a GitHub Action.
GH_TOKEN = os.getenv("GH_TOKEN")
create_github_session(GH_TOKEN)
OrgStats
OrgStats
is a class that contains functionality for getting statistics for a GitHub organization.
load_dotenv()
GH_TOKEN = os.getenv("GH_TOKEN")
To use org_stats
you need to pass in a token to authenticate the GitHub API, and the name of a GitHub organization. We use ghorgstatstestorg for these examples.
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org
get_repos
returns all repositories associated with an organization. We can optionally filter by public status.
test_org.get_repos()
This can also be access via repos
, public_repos
and private_repos
OrgStats
attributes
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
assert L(test_org.public_repos).map(lambda x: x.private).unique()[0] == False
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
assert L(test_org.private_repos).map(lambda x: x.private).unique()[0] == True
The repo attributues can be used to access repositories by type, for example accessing only public repos via public_repos
test_org.public_repos
Files can also be access via the files
, files_public
and files_private
attributes.
test_org.get_repo_file_ext_frequency('repo2')
You can also access get_org_snapshot_stats
via OrgStats
snapshot_stats
property.
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org.snapshot_stats
Long view stats
These are the other flavour of GitHub stats, these are traffic stats which include visits to a repository on GitHub and clones of organization repositories. By default GitHub only provides access two recent information for these stats. This means if we want to be able to access longer term information for these stats we need to store and update this information on a regular basis ourselves. This is what the below do in combination with Github actions.
Gets views traffic for repo
and saves as csv in save_dir
.
repo
is an repository under the GitHub Organization.
save_dir
is the directory where output CSV should be saved, by default view_data
load is an optional flag which loads data into a Pandas DataFrame, by default False
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org.get_repo_views_traffic(test_org.repos[0], 'test_dir',load=True).head(3)
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org.get_org_views_traffic(load=True).head(3)
assert len(test_org.get_org_views_traffic(load=True).columns)/2 == len(test_org.public_repos)
assert len(test_org.get_org_views_traffic(repos=test_org.repos, load=True).columns)/2 == len(test_org.repos)
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org.get_repo_clones_traffic('repo1',save_dir='test_dir', load=True)
assert len(test_org.get_repo_clones_traffic(test_org.public_repos[1], load=True).columns) == 2
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
assert type(test_org.get_org_clones_traffic(repos=test_org.repos, save_dir='test_dir',load=True)) == pd.core.frame.DataFrame
assert (len(test_org.get_org_clones_traffic(save_dir='test_dir',load=True).columns) /2) == test_org.public_repo_count
from nbdev.export import notebook2script; notebook2script()