Living With Machines Database: lmwdb
A package containing database access to the Living with Machines newspaper collectionβs metadata, designed to facilitate quicker and easier humanities research on heterogeneous and complex newspaper data.
Background on the development of the database is available in Metadata Enrichment in the Living with Machines Project: User-focused Collaborative Database Development in a Digital Humanities Context from the Digital Humanities 2023 book of abstracts.
Installation
Install Docker
It is possible to run this code without Docker, but at present we are only maintaining it via Docker Containers so we highly recommend installing Docker to run and/or test this code locally. Instructions are available for most operating systems here: https://docs.docker.com/desktop/
Clone the repository
Clone the repository via either
or (using a GitHub ssh key)
followed by:
Local deploy of documentation
If you have a local install of poetry you can run the documentation locally without using docker:
Running locally
Note: this uses the .envs/local file provided in the repo. This must not be used in production, it is simply for local development and to ease demonstrating what is required for .envs/production, which must be generated separately for deploying via production.yml.
It will take some time to download a set of docker images required to run locally, after which it should attempt to start the server in the django container. If successful, the console should print logs resembling
lwmdb_local_django | WARNING: This is a development server. Do not use it in a production
deployment. Use a production WSGI server instead.
lwmdb_local_django | * Running on all addresses (0.0.0.0)
lwmdb_local_django | * Running on http://127.0.0.1:8000
lwmdb_local_django | * Running on http://172.20.0.4:8000
lwmdb_local_django | Press CTRL+C to quit
lwmdb_local_django | * Restarting with stat
lwmdb_local_django | Performing system checks...
lwmdb_local_django |
lwmdb_local_django | System check identified no issues (0 silenced).
lwmdb_local_django |
lwmdb_local_django | Django version 4.2.1, using settings 'lwmdb.settings'
lwmdb_local_django | Development server is running at http://0.0.0.0:8000/
lwmdb_local_django | Using the Werkzeug debugger (http://werkzeug.pocoo.org/)
lwmdb_local_django | Quit the server with CONTROL-C.
lwmdb_local_django | * Debugger is active!
lwmdb_local_django | * Debugger PIN: 139-826-693
Indicating it’s up and running. You should then be able to go to http://127.0.0.1:8000 in your local browser and see a start page.
To stop the app call the down command:
Importing data
If a previous version of the database is available as either json fixtures or raw sql via a pg_dump (or similar) command.
json import
json fixtures need to be placed in a fixtures folder in your local checkout:
cd lwmdb
mkdir fixtures
cp DataProvider-1.json Ingest-1.json Item-1.json Newspaper-1.json Digitisation-1.json Issue-1.json Item-2.json fixtures/
The files can then be imported via
docker compose -f local.yml exec django /app/manage.py loaddata fixtures/Newspaper-1.json
docker compose -f local.yml exec django /app/manage.py loaddata fixtures/Issue-1.json
docker compose -f local.yml exec django /app/manage.py loaddata fixtures/Item-2.json
...
:warning: Note the import order is important, specifically:
Newspaper,Issueand any other datajsonfiles prior toItemjson.
Importing a postgres database
Importing from json can be very slow. If provided a postgres data file, it is possible to import that directly. First copy the database file(s) to a backups folder on the postgres instance (assuming you’ve run the build command)
Next make sure the app is shut down, then start up with only the postgres container running:
Then run the restore command with the filename of the backup. By default backup filenames indicates when the backup was made and are compressed (using gzip compression in the example below backup_2023_04_03T07_22_10.sql.gz ):
:warning: There is a chance the default
dockersize allocated is not big enough for a full version of the dataset (especially if running on a desktop). If so, you may need to increase the allocated disk space. For example, seeDocker Mac FAQsfor instructions to increase available disk space.
:warning: If the version of the database you are loading is not compatible with the current version of the python package, this can cause significant errors.
Querying the database
Jupyter Notebook
In order to run the Django framework inside a notebook, open another terminal window once you have it running via docker as described above and run
This should launch a normal Jupyter Notebook in your browser window where you can create any notebooks and access the database in different ways.
Important: Before importing any models and working with the database data, you will want to run the import django_initialiser in a cell, which will set up all the dependencies needed.
Note: For some users we provide two jupyter notebooks:
getting-started.ipynbexplore-newspapers.ipynb
Both will give some overview of how one can access the databaseβs information and what one can do with it. They only scratch the surface of what is possible, of course, but will be a good entry point for someone who wants to orient themselves toward the database and Django database querying.
Upgrade development version
In order to upgrade the current development version that you have, make sure that you have synchronised the repository to your local drive:
Step 1: git pull
Step 2: docker compose -f local.yml up --build
Run on a server
To run in production, an .envs/production ENV file must be created. This must befilled in with new passwords for each key rather than a copy of .envs/local. The same keys set in .envs/local are needed, as well as the follwing two:
TRAEFIK_EMAIL="email.register.for.traefik.account@test.com"HOST_URL="host.for.lwmdb.deploy.org"
A domain name (in this example "host.for.lwmdb.deploy.org) must be registered for https (encripyted) usage, and a TLS certificate is needed. See traefik docs for details.