Contributing

Please see our Code of Conduct for policies on contributing. We also broadly follow the Turing Way Code of Conduct to encourage a pleasant experience contributing and collaborating on this project.

Documentation

If you would only like to contribute to documentation, the easiest way to deploy and see changes rendered with each edit is to run outside docker:

$ git clone https://github.com/living-with-machines/lwmdb
$ cd lwmdb
$ poetry install --with dev --with docs
$ poetry run mkdocs serve --dev-addr=0.0.0.0:8080

Note

The --with dev and --with docs options are currently included by default, but they may be set as optional in the future.

Documentation should also be available on https://localhost:9000 when running

docker compose -f local.yml up

but it does not auto update as local changes are made. Port 8080 is specified in the example above to avoid conflict with a local docker compose run (which defaults to 0.0.0.0:9000).

Warning

The schema currently raises an error. See ticket #115 for updates.

Local `docker` test runs

Local environment

Tests are built and run via pytest and docker using pytest-django. To run tests ensure a local docker install, a local git checkout of lwmdb and a build (see install instructions for details).

Running locally with local.yml in a terminal deploys the site and this documentation:

usersudo

docker compose -f local.yml up

sudo docker compose -f local.yml up

Site at localhost:3000
Docs at localhost:9000

Note

If there are issues starting the server, shutting it down and then starting up again may help

usersudo

docker compose -f local.yml down

sudo docker compose -f local.yml down

Running tests

To run tests, open another terminal to run pytest within the django docker container while docker is running.

usersudo

docker compose -f local.yml exec django pytest

sudo docker compose -f local.yml exec django pytest

These will print out a summary of test results like:

Test session starts (platform: linux, Python 3.11.3, pytest 7.3.1, pytest-sugar 0.9.7)
django: settings: config.test_settings (from ini)
rootdir: /app
configfile: pyproject.toml
plugins: pyfakefs-5.2.2, anyio-3.6.2, sugar-0.9.7, cov-4.0.0, django-4.5.2
collected 33 items / 1 deselected / 32 selected

 gazetteer/tests.py ✓                                          3% ▍
 lwmdb/tests/test_commands.py xx                               9% ▉
 mitchells/tests.py x✓                                       100% ██████████
 newspapers/tests.py ✓✓✓✓✓                                    28% ██▊
 lwmdb/utils.py ✓✓✓✓✓✓✓✓✓                                     56% █████▋
 lwmdb/tests/test_utils.py ✓✓✓✓✓✓✓✓✓✓✓✓✓                      97% █████████▊
------------ coverage: platform linux, python 3.11.3-final-0 ---------------
Name                                                     Stmts   Miss  Cover
----------------------------------------------------------------------------
lwmdb/management/commands/connect.py                        10      3    70%
lwmdb/management/commands/createfixtures.py                 42     30    29%
lwmdb/management/commands/fixtures.py                      126     78    38%
lwmdb/management/commands/load_json_fixtures.py             20     11    45%
lwmdb/management/commands/loadfixtures.py                   27      8    70%
lwmdb/management/commands/makeitemfixtures.py               78     62    21%
lwmdb/tests/test_commands.py                                15      2    87%
lwmdb/tests/test_utils.py                                   25      7    72%
lwmdb/utils.py                                             120     48    60%
----------------------------------------------------------------------------
TOTAL                                                      508    284    44%

8 files skipped due to complete coverage.

============================ slowest 3 durations ===========================
3.85s setup    gazetteer/tests.py::TestGeoSpatial::test_create_place_and_distance
1.06s call     lwmdb/tests/test_commands.py::test_mitchells
0.14s call     lwmdb/utils.py::lwmdb.utils.download_file

Results (6.74s):
      29 passed
       3 xfailed
       1 deselected

Adding all expected failed tests

In the previous example, 29 tests passed, 3 failed as expected (hence xfailed) and 1 test was skipped (deselected). To see the deatils of what tests failed, adding the --runxfail option will add reports like the following:

usersudo

docker compose -f local.yml exec django pytest --runxfail

sudo docker compose -f local.yml exec django pytest --runxfail

...
    def __getattr__(self, name: str):
        """
        After regular attribute access, try looking up the name
        This allows simpler access to columns for interactive use.
        """
        # Note: obj.x will always call obj.__getattribute__('x') prior to
        # calling obj.__getattr__('x').
        if (
            name not in self._internal_names_set
            and name not in self._metadata
            and name not in self._accessors
            and self._info_axis._can_hold_identifiers_and_holds_name(name)
        ):
            return self[name]
>       return object.__getattribute__(self, name)
E       AttributeError: 'Series' object has no attribute 'NLP'

/usr/local/lib/python3.11/site-packages/pandas/core/generic.py:5989: AttributeError
-------------------------- Captured stdout call ----------------------------
Warning: Model mitchells.Issue is missing a fixture file and will not load.
Warning: Model mitchells.Entry is missing a fixture file and will not load.
Warning: Model mitchells.PoliticalLeaning is missing a fixture file and will not load.
Warning: Model mitchells.Price is missing a fixture file and will not load.
Warning: Model mitchells.EntryPoliticalLeanings is missing a fixture file and will not load.
Warning: Model mitchells.EntryPrices is missing a fixture file and will not load.

 lwmdb/tests/test_commands.py ⨯                           6% ▋
...

and summaries at the end of the report

...
============================ slowest 3 durations ===========================
3.87s setup    gazetteer/tests.py::TestGeoSpatial::test_create_place_and_distance
1.07s call     lwmdb/tests/test_commands.py::test_mitchells
0.15s call     lwmdb/utils.py::lwmdb.utils.download_file
========================== short test summary info =========================
FAILED lwmdb/tests/test_commands.py::test_mitchells - AttributeError: 'Series' object
has no attribute 'NLP'
FAILED lwmdb/tests/test_commands.py::test_gazzetteer - SystemExit: App(s) not allowed: ['gazzetteer']
FAILED mitchells/tests.py::MitchelsFixture::test_load_fixtures - assert 0 > 0

Results (6.90s):
      29 passed
       3 failed
         - lwmdb/tests/test_commands.py:9 test_mitchells
         - lwmdb/tests/test_commands.py:19 test_gazzetteer
         - mitchells/tests.py:18 MitchelsFixture.test_load_fixtures
       1 deselected

Terminal Interaction

Adding the --pdb option generates an ipython shell at the point a test fails:

usersudo

docker compose -f local.yml exec django pytest --runxfail --pdb

sudo docker compose -f local.yml exec django pytest --runxfail --pdb

    def __getattr__(self, name: str):
        """
        After regular attribute access, try looking up the name
        This allows simpler access to columns for interactive use.
        """
        # Note: obj.x will always call obj.__getattribute__('x') prior to
        # calling obj.__getattr__('x').
        if (
            name not in self._internal_names_set
            and name not in self._metadata
            and name not in self._accessors
            and self._info_axis._can_hold_identifiers_and_holds_name(name)
        ):
            return self[name]
>       return object.__getattribute__(self, name)
E       AttributeError: 'Series' object has no attribute 'NLP'

/usr/local/lib/python3.11/site-packages/pandas/core/generic.py:5989: AttributeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>
> /usr/local/lib/python3.11/site-packages/pandas/core/generic.py(5989)__getattr__()
   5987         ):
   5988             return self[name]
-> 5989         return object.__getattribute__(self, name)
   5990
   5991     def __setattr__(self, name: str, value) -> None:

ipdb>

Development

Commits

Pre-commit

The .pre-commit-config.yaml file manages configurations to ensure quality of each git commit. Ensure this works by installing pre-commit before making any git commits.

Note

pre-commit is included in the pyproject.toml dev dependencies group, so it’s possible to run all git commands within a local poetry install of lwmdb without installing pre-commit globally.

This will automatically download and install dependencies specified in .pre-commit-config.yaml and then run all those checks for any git commit.

You can run all of these checks outside a commit with

shellpoetry

pre-commit run --all-files

poetry run pre-commit run --all-files

Commit messages

For git commit messages we try to follow the conventional commits spec, where commits are prefixed by categories:

fix: something fixed
feat: a new feature
doc: documentation
refactor: a significant rearangement code structure
test: adding tests
ci: continuous integrations
chore: something relatively small like updating a dependency

App

Once docker compose is up, any local modifications should automatically be loaded in the local django docker container and immediately applied. This suits reloading web app changes (including css etc.) and writing and running tests. No additional docker build commands should be required unless very significant modifcations, such as shifting between git branches.

Tests

Doctests

Including docstrings with example tests is an efficient way to add tests, document usage and help ensure documentation is consistent with code changes.

Pytest Tests

We use pytest for tests, and their documentation is quite comprehensive. The django-pytest module is crucial to the test functionality as well.

Pytest Configuration

The config for running tests is shared between pyproject.toml and lwmdb/tests/conftest.py.

The pyproject.toml section below provides automatic test configuration whenever pytest is run. An example config at the time of this writing:

[tool.pytest.ini_options]
DJANGO_SETTINGS_MODULE = "config.test_settings"
python_files = ["tests.py", "test_*.py"]
addopts = """
--cov=lwmdb
--cov-report=term:skip-covered
--pdbcls=IPython.terminal.debugger:TerminalPdb
--doctest-modules
--ignore=compose
--ignore=jupyterhub_config.py
--ignore=notebooks
--ignore=docs
--ignore=lwmdb/contrib/sites
-m "not slow"
--durations=3
"""
markers = [
  "slow: marks tests as slow (deselect with '-m \"not slow\"')"
]

--cov=lwmdb specifies the path to test (in this case the name of this project)
--cov-report=term:skip-covered excludes files with full coverage from the coverage report
--pydbcls=Ipython.terminal.debugger:TerminalPdb enables the ipython terminal for debugging
--doctest-modules indicates doctests are included in test running
--ignore excludes folders from testing (eg: --ignore=compose skips the compose folder)
-m "not slow" skips tests marked with @pytest.mark.slow
--duration=3 lists the duration of the 3 slowest running tests

Example Tests

Within each django app in the project there is either a tests.py file or a tests folder, where any file name beginning with test_ is included (like test_commands.py).

An example test from mitchells/tests.py:

def test_download_local_mitchells_excel(caplog, mitchells_data_path) -> None:
    """Test downloading `MITCHELLS_EXCEL_URL` fixture.

    Note:
        `assert LOG in caplog.messages` is designed to work whether the file is
        downloaded or not to ease caching and testing
    """
    caplog.set_level(INFO)
    success: bool = download_file(mitchells_data_path, MITCHELLS_EXCEL_URL)
    assert success
    LOG = f"{MITCHELLS_EXCEL_URL} file available from {mitchells_data_path}"
    assert LOG in caplog.messages

mitchells_data_path fixture is defined in conftest.py and returns a Path for the folder where raw mitchells data is stored prior to processing into json.

Fixtures in pytest work by automatically populating any functions names beginning with test_ with whatever is returned from registered fixture functions. Here the mitchells_data_path Path object is passed to the download_file function and saved to MITCHELLS_LOCAL_LINK_EXCEL_URL. download_file returns a bool to indicate if the dowload was successful, hence then testing if the value returned is True via the line:

assert success

The lines involving caplog aid testing logging. The logging level is to INFO to capture levels lower than the default WARNING level.

caplog.set_level(INFO)

This then means the logging is captured and can be tested on the final line

assert caplog.messages == [
    f'{MITCHELLS_LOCAL_LINK_EXCEL_PATH} file available from {mitchells_data_path}'
]

Note

To ease using python logging and django logging features we use our log_and_django_terminal wrapper to ease managing logs that might also need to be printed at the terminal alongside commands.

Crediting Contributions

We use All Contributors in our semi-automated file citation file .all-contributorsrc and Citation File Format via CITATION.cff to help manage attributing contributions to both this code base and datasets we release for use with lwmdb. We endeavour to harmonise contributions from collaborators across Living with Machines whose copious, interdisciplinary collaboration led to lwmdb.

All Contributors

All Contributors is a service for managing credit for contributions to a git repository. .all-contributorsrc is a json file in the root directory of the alnm repository. It also specifies design for what’s rendered in README.md and intro contributors section of this documentation.

The json structure follows the All Contributors specification. Below is an example of this format

{
  "files": [
    "README.md"
  ],
  "imageSize": 100,
  "commit": false,
  "commitType": "docs",
  "commitConvention": "angular",
  "contributors": [
    {
      "login": "github-user-name",
      "name": "Person Name",
      "avatar_url": "https://avatars.githubusercontent.com/u/1234567?v=4",
      "profile": "http://www.a-website.org",
      "contributions": [
        "code",
        "ideas",
        "doc"
      ]
    },
    {
      "login": "another-github-user-name",
      "name": "Another Name",
      "avatar_url": "https://avatars.githubusercontent.com/u/7654321?v=4",
      "contributions": [
        "code",
        "ideas",
        "doc",
        "maintenance"
      ]
    },
  ],
  "contributorsPerLine": 7,
  "skipCi": true,
  "repoType": "github",
  "repoHost": "https://github.com",
  "projectName": "lwmdb",
  "projectOwner": "Living-with-machines"
}

The contribution component per user indicates type of contributionat present we consider these:

code
ideas
mentoring
maintenance
doc

At present we aren’t crediting other types of contribution but may expand in the future. For more other contribtuion types provided by allcontributors by default, see the emoji-key table.

Adding credit, including types, via GitHub comments

For All Contributors git accounts with at least moderator status with our GitHub repository should have permission to modify credit by posting in the following form on an lwmdb github ticket:

@all-contributors
please add @github-user for code, ideas, planning.
please add @github-other-user for code, ideas, planning.

This should cause the all-contributors bot to indicated success:

@ModUserWhoPosted

I've put up a pull request to add @github-user! 🎉
I've put up a pull request to add @github-other-user! 🎉

or report errors:

This project's configuration file has malformed JSON: .all-contributorsrc. Error:: Unexpected token : in JSON at position 2060

CITATION.CFF

We also maintain a Citation File Format (CFF) file for citeable, academic credit for contributions via our zenodo registration. This helps automate the process of releasing academically citeable Digital Object Identifyer (DOI) for releases of lwmdb.

CFF supports Open Researcher and Contributor IDs (orcid), which eases automating academic credit for evolving contribtuions to academic work, even as individuals change academic positions. For reference a simplified example based on cff-version 1.2.0:

cff-version: 1.2.0
title: Living With Machines Database
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Person
    family-names: Name
    orcid: 'https://orcid.org/0000-0000-0000-0000'
    affiliation: A UNI
  - given-names: Another
    family-names: Name
    orcid: 'https://orcid.org/0000-0000-0000-0001'
    affiliation: UNI A
identifiers:
  - type: doi
    value: 10.5281/zenodo.8208204
repository-code: 'https://github.com/Living-with-machines/lwmdb'
url: 'https://livingwithmachines.ac.uk/'
license: MIT

Troubleshooting

Unexpected `lwmdb/static/css/project.css` changes

At present (see issue #110 for updates) running docker compose is likely to truncate the last line of /lwmdb/static/css/project.css which, can then appear as a local change in a git checkout:

$ git status
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   lwmdb/static/css/project.css

This should be automatically fixed via pre-commit, and if necessary you can run pre-commit directly to clean that issue outside of a git commit. Given how frequently this may occur, it is safest to simply leave that until commiting a change.

shellpoetry

pre-commit run --all-files

poetry run pre-commit run --all-files

Contributing

Documentation

Local docker test runs

Local environment

Running tests

Adding all expected failed tests

Terminal Interaction

Development

Commits

Pre-commit

Commit messages

App

Tests

Doctests

Pytest Tests

Pytest Configuration

Example Tests

Crediting Contributions

All Contributors

Adding credit, including types, via GitHub comments

CITATION.CFF

Troubleshooting

Unexpected lwmdb/static/css/project.css changes

Local `docker` test runs

Unexpected `lwmdb/static/css/project.css` changes