Understanding the Results
The resulting file structure
The examples below follow standard settings
If you choose other settings for when you run the program, your output directory may look different from the information on this page.
Reports
Reports are automatically generated with a unique hash as the overarching folder structure. Inside the reports
directory, you’ll find a JSON file for each alto2txt
directory (organised by NLP identifier).
The report structure, thus, looks like this:
The JSON file has some good troubleshooting information. You’ll find that the contents are structured as a Python dictionary
(or JavaScript Object
). Here is an example:
Here is an explanation of each of the keys in the dictionary:
Key | Explanation | Data type |
---|---|---|
path |
The input path for the zip file that is being converted. | string |
bytes |
The size of the input zip file represented in bytes. | integer |
size |
The size of the input zip file represented in a human-readable string. | string |
contents |
#TODO #3 | integer |
start |
Date and time when processing started (see also end below). |
datestring |
newspaper_paths |
#TODO #3 | list (string ) |
publication_codes |
A list of the NLPs that are contained in the input zip file. | list (string ) |
issue_paths |
A list of all the issue paths that are contained in the cache directory. | list (string ) |
item_paths |
A list of all the item paths that are contained in the cache directory. | list (string ) |
end |
Date and time when processing ended (see also start above). |
datestring |
seconds |
Seconds that the script spent interpreting the zip file (should be added to the microseconds below). |
integer |
microseconds |
Microseconds that the script spent interpreting the zip file (should be added to the seconds above). |
integer |
Fixtures
The most important output of the script is contained in the fixtures
directory. This directory contains JSON files for all the different columns in the corresponding Django metadata database (i.e. DataProvider
, Digitisation
, Ingest
, Issue
, Newspaper
, and Item
). The numbering at the end of each file indicates the order of the files as they are divided into a maximum of 2e6
elements*:
Each JSON file contains a Python-like list
(JavaScript Array
) of dictionaries
(JavaScript Objects
), which have a primary key (pk
), the related database model (in the example below the Django newspapers
app’s newspaper
table), and a nested dictionary
/Object
which contains all the values for the database’s table entry:
* The maximum elements per file can be adjusted in the settings.py
file’s settings
object’s MAX_ELEMENTS_PER_FILE
value.