alto2txt2fixture
alto2txt2fixture
is a standalone tool to convert alto2txt
XML
output and other related datasets into JSON
(and where feasible CSV
) data with corresponding relational IDs to ease general use and ingestion into a relational database.
We target the the JSON
produced for importing into lwmdb
: a database built using the Django
python
webframework database fixture
structure.
Installation and simple use
We provide a command line interface to process alto2txt
XML
files stored locally (or mounted via azure
blobfuse
), and for additional public data we automate a means of downloading those automatically.
Installation
We recommend downloading a copy of the reposity or using git clone
. From a local copy use poetry
to install dependencies:
If you would like to test, render documentation and/or contribute to the code included dev
dependencies in a local install:
Simple use
To processing newspaper metadata with a local copy of alto2txt
XML
results, it's easiest to have that data in the same folder as your alto2txt2fixture
checkout and poetry
installed folder. One arranged, you should be able to begin the JSON
converstion with
To generate related data in JSON
and CSV
form, assuming you have an internet collection and access to a living-with-machines
azure
account, the following will download related data into JSON
and CSV
files. The JSON
results should be consistent with lwmdb
tables for ease of import.