Contributing
Please see our Code of Conduct for policies on contributing. We also broadly follow the Turing Way Code of Conduct to encourage a pleasant experience contributing and collaborating on this project.
Documentation
If you would only like to contribute to documentation, the easiest way to deploy and see changes rendered with each edit is to run outside docker:
$ git clone https://github.com/living-with-machines/lwmdb
$ cd lwmdb
$ poetry install --with dev --with docs
$ poetry run mkdocs serve --dev-addr=0.0.0.0:8080
Note
The --with dev and --with docs options are currently included by default, but they may be set as optional in the future.
Documentation should also be available on https://localhost:9000 when running
but it does not auto update as local changes are made. Port 8080 is specified in the example above to avoid conflict with a local docker compose run (which defaults to 0.0.0.0:9000).
Local docker test runs
Local environment
Tests are built and run via pytest and docker using pytest-django. To run tests ensure a local docker install, a local git checkout of lwmdb and a build (see install instructions for details).
Running locally with local.yml in a terminal deploys the site and this documentation:
- Site at
localhost:3000 - Docs at
localhost:9000
Note
If there are issues starting the server, shutting it down and then starting up again may help
Running tests
To run tests, open another terminal to run pytest within the django docker container while docker is running.
These will print out a summary of test results like:
Test session starts (platform: linux, Python 3.11.3, pytest 7.3.1, pytest-sugar 0.9.7)
django: settings: config.test_settings (from ini)
rootdir: /app
configfile: pyproject.toml
plugins: pyfakefs-5.2.2, anyio-3.6.2, sugar-0.9.7, cov-4.0.0, django-4.5.2
collected 33 items / 1 deselected / 32 selected
gazetteer/tests.py ✓ 3% ▍
lwmdb/tests/test_commands.py xx 9% ▉
mitchells/tests.py x✓ 100% ██████████
newspapers/tests.py ✓✓✓✓✓ 28% ██▊
lwmdb/utils.py ✓✓✓✓✓✓✓✓✓ 56% █████▋
lwmdb/tests/test_utils.py ✓✓✓✓✓✓✓✓✓✓✓✓✓ 97% █████████▊
------------ coverage: platform linux, python 3.11.3-final-0 ---------------
Name Stmts Miss Cover
----------------------------------------------------------------------------
lwmdb/management/commands/connect.py 10 3 70%
lwmdb/management/commands/createfixtures.py 42 30 29%
lwmdb/management/commands/fixtures.py 126 78 38%
lwmdb/management/commands/load_json_fixtures.py 20 11 45%
lwmdb/management/commands/loadfixtures.py 27 8 70%
lwmdb/management/commands/makeitemfixtures.py 78 62 21%
lwmdb/tests/test_commands.py 15 2 87%
lwmdb/tests/test_utils.py 25 7 72%
lwmdb/utils.py 120 48 60%
----------------------------------------------------------------------------
TOTAL 508 284 44%
8 files skipped due to complete coverage.
============================ slowest 3 durations ===========================
3.85s setup gazetteer/tests.py::TestGeoSpatial::test_create_place_and_distance
1.06s call lwmdb/tests/test_commands.py::test_mitchells
0.14s call lwmdb/utils.py::lwmdb.utils.download_file
Results (6.74s):
29 passed
3 xfailed
1 deselected
Adding all expected failed tests
In the previous example, 29 tests passed, 3 failed as expected (hence xfailed) and 1 test was skipped (deselected). To see the deatils of what tests failed, adding the --runxfail option will add reports like the following:
...
def __getattr__(self, name: str):
"""
After regular attribute access, try looking up the name
This allows simpler access to columns for interactive use.
"""
# Note: obj.x will always call obj.__getattribute__('x') prior to
# calling obj.__getattr__('x').
if (
name not in self._internal_names_set
and name not in self._metadata
and name not in self._accessors
and self._info_axis._can_hold_identifiers_and_holds_name(name)
):
return self[name]
> return object.__getattribute__(self, name)
E AttributeError: 'Series' object has no attribute 'NLP'
/usr/local/lib/python3.11/site-packages/pandas/core/generic.py:5989: AttributeError
-------------------------- Captured stdout call ----------------------------
Warning: Model mitchells.Issue is missing a fixture file and will not load.
Warning: Model mitchells.Entry is missing a fixture file and will not load.
Warning: Model mitchells.PoliticalLeaning is missing a fixture file and will not load.
Warning: Model mitchells.Price is missing a fixture file and will not load.
Warning: Model mitchells.EntryPoliticalLeanings is missing a fixture file and will not load.
Warning: Model mitchells.EntryPrices is missing a fixture file and will not load.
lwmdb/tests/test_commands.py ⨯ 6% ▋
...
and summaries at the end of the report
...
============================ slowest 3 durations ===========================
3.87s setup gazetteer/tests.py::TestGeoSpatial::test_create_place_and_distance
1.07s call lwmdb/tests/test_commands.py::test_mitchells
0.15s call lwmdb/utils.py::lwmdb.utils.download_file
========================== short test summary info =========================
FAILED lwmdb/tests/test_commands.py::test_mitchells - AttributeError: 'Series' object
has no attribute 'NLP'
FAILED lwmdb/tests/test_commands.py::test_gazzetteer - SystemExit: App(s) not allowed: ['gazzetteer']
FAILED mitchells/tests.py::MitchelsFixture::test_load_fixtures - assert 0 > 0
Results (6.90s):
29 passed
3 failed
- lwmdb/tests/test_commands.py:9 test_mitchells
- lwmdb/tests/test_commands.py:19 test_gazzetteer
- mitchells/tests.py:18 MitchelsFixture.test_load_fixtures
1 deselected
Terminal Interaction
Adding the --pdb option generates an ipython shell at the point a test fails:
def __getattr__(self, name: str):
"""
After regular attribute access, try looking up the name
This allows simpler access to columns for interactive use.
"""
# Note: obj.x will always call obj.__getattribute__('x') prior to
# calling obj.__getattr__('x').
if (
name not in self._internal_names_set
and name not in self._metadata
and name not in self._accessors
and self._info_axis._can_hold_identifiers_and_holds_name(name)
):
return self[name]
> return object.__getattribute__(self, name)
E AttributeError: 'Series' object has no attribute 'NLP'
/usr/local/lib/python3.11/site-packages/pandas/core/generic.py:5989: AttributeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>
> /usr/local/lib/python3.11/site-packages/pandas/core/generic.py(5989)__getattr__()
5987 ):
5988 return self[name]
-> 5989 return object.__getattribute__(self, name)
5990
5991 def __setattr__(self, name: str, value) -> None:
ipdb>
Development
Commits
Pre-commit
The .pre-commit-config.yaml file manages configurations to ensure quality of each git commit. Ensure this works by installing pre-commit before making any git commits.
Note
pre-commit is included in the pyproject.toml dev dependencies group, so it’s possible to run all git commands within a local poetry install of lwmdb without installing pre-commit globally.
This will automatically download and install dependencies specified in .pre-commit-config.yaml and then run all those checks for any git commit.
You can run all of these checks outside a commit with
Commit messages
For git commit messages we try to follow the conventional commits spec, where commits are prefixed by categories:
fix: something fixedfeat: a new featuredoc: documentationrefactor: a significant rearangement code structuretest: adding testsci: continuous integrationschore: something relatively small like updating a dependency
App
Once docker compose is up, any local modifications should automatically be loaded in the local django docker container and immediately applied. This suits reloading web app changes (including css etc.) and writing and running tests. No additional docker build commands should be required unless very significant modifcations, such as shifting between git branches.
Tests
Doctests
Including docstrings with example tests is an efficient way to add tests, document usage and help ensure documentation is consistent with code changes.
Pytest Tests
We use pytest for tests, and their documentation is quite comprehensive. The django-pytest module is crucial to the test functionality as well.
Pytest Configuration
The config for running tests is shared between pyproject.toml and lwmdb/tests/conftest.py.
The pyproject.toml section below provides automatic test configuration whenever pytest is run. An example config at the time of this writing:
[tool.pytest.ini_options]
DJANGO_SETTINGS_MODULE = "config.test_settings"
python_files = ["tests.py", "test_*.py"]
addopts = """
--cov=lwmdb
--cov-report=term:skip-covered
--pdbcls=IPython.terminal.debugger:TerminalPdb
--doctest-modules
--ignore=compose
--ignore=jupyterhub_config.py
--ignore=notebooks
--ignore=docs
--ignore=lwmdb/contrib/sites
-m "not slow"
--durations=3
"""
markers = [
"slow: marks tests as slow (deselect with '-m \"not slow\"')"
]
--cov=lwmdbspecifies the path to test (in this case the name of this project)--cov-report=term:skip-coveredexcludes files with full coverage from the coverage report--pydbcls=Ipython.terminal.debugger:TerminalPdbenables theipythonterminal for debugging--doctest-modulesindicatesdoctestsare included in test running--ignoreexcludes folders from testing (eg:--ignore=composeskips thecomposefolder)-m "not slow"skips tests marked with@pytest.mark.slow--duration=3lists the duration of the 3 slowest running tests
Example Tests
Within each django app in the project there is either a tests.py file or a tests folder, where any file name beginning with test_ is included (like test_commands.py).
An example test from mitchells/tests.py:
def test_download_local_mitchells_excel(caplog, mitchells_data_path) -> None:
"""Test downloading `MITCHELLS_EXCEL_URL` fixture.
Note:
`assert LOG in caplog.messages` is designed to work whether the file is
downloaded or not to ease caching and testing
"""
caplog.set_level(INFO)
success: bool = download_file(mitchells_data_path, MITCHELLS_EXCEL_URL)
assert success
LOG = f"{MITCHELLS_EXCEL_URL} file available from {mitchells_data_path}"
assert LOG in caplog.messages
mitchells_data_path fixture is defined in conftest.py and returns a Path for the folder where raw mitchells data is stored prior to processing into json.
Fixtures in pytest work by automatically populating any functions names beginning with test_ with whatever is returned from registered fixture functions. Here the mitchells_data_path Path object is passed to the download_file function and saved to MITCHELLS_LOCAL_LINK_EXCEL_URL. download_file returns a bool to indicate if the dowload was successful, hence then testing if the value returned is True via the line:
The lines involving caplog aid testing logging. The logging level is to INFO to capture levels lower than the default WARNING level.
This then means the logging is captured and can be tested on the final line
assert caplog.messages == [
f'{MITCHELLS_LOCAL_LINK_EXCEL_PATH} file available from {mitchells_data_path}'
]
Note
To ease using python logging and django logging features we use our log_and_django_terminal wrapper to ease managing logs that might also need to be printed at the terminal alongside commands.
Crediting Contributions
We use All Contributors in our semi-automated file citation file .all-contributorsrc and Citation File Format via CITATION.cff to help manage attributing contributions to both this code base and datasets we release for use with lwmdb. We endeavour to harmonise contributions from collaborators across Living with Machines whose copious, interdisciplinary collaboration led to lwmdb.
All Contributors
All Contributors is a service for managing credit for contributions to a git repository. .all-contributorsrc is a json file in the root directory of the alnm repository. It also specifies design for what’s rendered in README.md and intro contributors section of this documentation.
The json structure follows the All Contributors specification. Below is an example of this format
{
"files": [
"README.md"
],
"imageSize": 100,
"commit": false,
"commitType": "docs",
"commitConvention": "angular",
"contributors": [
{
"login": "github-user-name",
"name": "Person Name",
"avatar_url": "https://avatars.githubusercontent.com/u/1234567?v=4",
"profile": "http://www.a-website.org",
"contributions": [
"code",
"ideas",
"doc"
]
},
{
"login": "another-github-user-name",
"name": "Another Name",
"avatar_url": "https://avatars.githubusercontent.com/u/7654321?v=4",
"contributions": [
"code",
"ideas",
"doc",
"maintenance"
]
},
],
"contributorsPerLine": 7,
"skipCi": true,
"repoType": "github",
"repoHost": "https://github.com",
"projectName": "lwmdb",
"projectOwner": "Living-with-machines"
}
The contribution component per user indicates type of contributionat present we consider these:
codeideasmentoringmaintenancedoc
At present we aren’t crediting other types of contribution but may expand in the future. For more other contribtuion types provided by allcontributors by default, see the emoji-key table.
Adding credit, including types, via GitHub comments
For All Contributors git accounts with at least moderator status with our GitHub repository should have permission to modify credit by posting in the following form on an lwmdb github ticket:
@all-contributors
please add @github-user for code, ideas, planning.
please add @github-other-user for code, ideas, planning.
This should cause the all-contributors bot to indicated success:
@ModUserWhoPosted
I've put up a pull request to add @github-user! 🎉
I've put up a pull request to add @github-other-user! 🎉
or report errors:
This project's configuration file has malformed JSON: .all-contributorsrc. Error:: Unexpected token : in JSON at position 2060
CITATION.CFF
We also maintain a Citation File Format (CFF) file for citeable, academic credit for contributions via our zenodo registration. This helps automate the process of releasing academically citeable Digital Object Identifyer (DOI) for releases of lwmdb.
CFF supports Open Researcher and Contributor IDs (orcid), which eases automating academic credit for evolving contribtuions to academic work, even as individuals change academic positions.
For reference a simplified example based on cff-version 1.2.0:
cff-version: 1.2.0
title: Living With Machines Database
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Person
family-names: Name
orcid: 'https://orcid.org/0000-0000-0000-0000'
affiliation: A UNI
- given-names: Another
family-names: Name
orcid: 'https://orcid.org/0000-0000-0000-0001'
affiliation: UNI A
identifiers:
- type: doi
value: 10.5281/zenodo.8208204
repository-code: 'https://github.com/Living-with-machines/lwmdb'
url: 'https://livingwithmachines.ac.uk/'
license: MIT
Troubleshooting
Unexpected lwmdb/static/css/project.css changes
At present (see issue #110 for updates) running docker compose is likely to truncate the last line of /lwmdb/static/css/project.css which, can then appear as a local change in a git checkout:
$ git status
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: lwmdb/static/css/project.css
This should be automatically fixed via pre-commit, and if necessary you can run pre-commit directly to clean that issue outside of a git commit. Given how frequently this may occur, it is safest to simply leave that until commiting a change.