gh_orgstats
is intended to provide some easy ways of getting stats for a GitHub org. gh_orgstats
does this by wrapping some functions around PyGithub. This code is mainly intended to help generate reports as part of a GitHub actions pipeline to update stats for a funder. For an example of this being used for generating a weekly report see the Living with Machines Github stats report
To use PyGithub
we need to authenticate with GitHub this is done via a token. This token is used to authenticate access and requires at least scope for public repos. See https://github.com/settings/tokens to register a token.
from dotenv import load_dotenv
import os
In this case we use dotenv
to load the token from a .env
files.
load_dotenv()
GH_TOKEN = os.getenv("GH_TOKEN")
Currently all functionality is contained within the stats
module.
from gh_orgstats.stats import *
The OrgStats
class is used to get stats for a GitHub organization. To create an instance of this class we need to pass a GitHub token to authenticate and the name of the Organization you want stats for.
test_org = OrgStats(GH_TOKEN, "ghorgstatstestorg")
test_org
Organization repositories
As a start we can grab the repositories for an organization via the repos
attribute of our OrgStats instance
test_org.repos
We can also get a sense of what is in the repository by looking at the file extensions for each repository.
test_org.get_org_file_ext_frequency()
test_org.get_org_file_ext_frequency(pub_status='public')
test_org.snapshot_stats.to_dict()
test_org.get_org_views_traffic(save_dir='readme_dir')
get_org_views_traffic
will grab data via the GitHub api and update a CSV for each repository under the organization (by default only public) with views counts. This is largely intended to be used to semi-regularly update these stats by running this code as part of a GitHub Action or cron job.
If you want to load a DataFrame of traffic you can pass load=True
test_org.get_org_views_traffic(save_dir='readme_dir', load=True).to_dict()
Similarly the same can be done for clones