In this notebook, we move to the process of making predictions. As mentioned in other places, one of the aims of nnanno
was to show a possible way of using computer vision/deep learning without requiring unconstrained resources. This notebook tries to demo this approach in action. This notebook was all run on Google Colab full disclaimer I have a 'pro' account that bumps up the maximum run time, but this approach will work fine on the free version of Colab.
If you are running this on Colab you can run the below cell to mount your google drive. Saving progress on Colab can be a bit annoying, but it's simple to store things in Google Drive. This means you can avoid reloading data every time you interact with Colab and also allows you to save the progress of your model, so you don't have to retrain a model multiple times.
# drive.mount("/content/drive")
!pip install ./nnanno/.
Install fastai. The latest version is likely to be fine but if there is a breaking change to fastai you can use the pinned version.
!pip install fastai -U
#!pip install fastai==2.2.5
For reference the version of fastai used for this notebook
import fastai
fastai.__version__
Check that we have cuda available (GPU). If this returns false click on 'runtime' and choose 'change runtime type' and choose GPU acceleration
import torch
torch.cuda.is_available()
from pathlib import Path
import pandas as pd
from fastai.vision.all import *
So far in this series of example notebooks we have used square resizing on all of our images. This maybe okay but since newspapers images tend to be more vertical it might be worth considering a different way of resizing. We can use a handy helper function get_image_files
to grab all of our images
images = get_image_files(path / "images")
This function returns the image paths wrapped in an L
instance. L
is essentially a fancy Python list that includes some extra goodies. See the fastcore docs for a fuller explanation.
We'll use map
to go through our images, load them, grab the shape and then calculate the average ratio
images.map(lambda x: load_image(x).shape).map(lambda x: x[0] / x[1]).sum() / len(images)
256 * 1.68
We'll use this ratio to reshape our images. This won't always make sense but if for example you are working with photos of a standard size, or film clips with a consistent aspect ratio it makes sense to batch them in this way.
df = pd.read_json(first(path.rglob("images/*.json")))
df.pub_date = pd.to_datetime(df.pub_date)
Most of the next cells are the same as in our previous notebook. The main thing we change is that we use a non square size for our transform.
from sklearn.model_selection import train_test_split
valid, train = train_test_split(df.copy(), train_size=0.4, stratify=df.pub_date.dt.year)
train["is_valid"] = False
valid["is_valid"] = True
df = pd.concat([train, valid])
df.reset_index(drop=True, inplace=True)
full_data = DataBlock(
blocks=(ImageBlock, CategoryBlock),
splitter=ColSplitter(),
get_x=ColReader("download_image_path", pref=path / "images/"),
get_y=ColReader("label"),
item_tfms=Resize((1024), ResizeMethod.Squish),
batch_tfms=[
*aug_transforms(size=(430, 256), max_rotate=0.01, max_warp=0.0),
Normalize.from_stats(*imagenet_stats),
],
)
Once we have created our DataBlock
we use the dataloaders
method to load our data.
dls = full_data.dataloaders(df, bs=32, num_workers=2)
dls.show_batch()
We can see that this time our images are no longer square.
metrics = [F1Score(average="weighted"), Precision(), Recall(), accuracy]
Choosing a model
There are various things we might care about when choosing a model. These include (but are not limited to):
- performance
- memory usage
- how data-hungry the model is
- training speed
- inference speed
Since we have committed to working with relatively constrained resources, we might want to try a model architecture with lower resource requirements. In addition, a smaller model should help speed up inference. One option is a variant of squeezenet
.
The title of the paper SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size already tell us that this model aims to perform with more constrained resources. Another paper, Benchmark Analysis of Representative Deep Neural Network Architectures, gives some comparisons between a range of computer vision model architectures. Of particular interest here is that inference speed comparison.
We might have to trade some performance in terms of accuracy for speed; let's see how this model does:
learn = cnn_learner(
dls, squeezenet1_0, loss_func=LabelSmoothingCrossEntropy(), metrics=metrics
)
min_, steep = learn.lr_find()
learn.fine_tune(
30,
base_lr=1e-2,
cbs=[
EarlyStoppingCallback(monitor="f1_score", patience=5),
ReduceLROnPlateau(monitor="f1_score", patience=4),
SaveModelCallback(monitor="f1_score"),
MixUp(),
],
)
The final performance we get isn't too bad so we'll continue using this model for our inference process.
learn.path = path
learn.export()
from nnanno.inference import *
We can load our previous learner
learn = load_learner(path / "export.pkl")
learn
learn.dls = dls
We now create an instance of nnPredict
and pass in our learner and a flag to use the GPU
predict = nnPredict(learn, try_gpu=True)
predict
We'll create a new folder for storing our inference outputs
Path(path / "inference").mkdir(exist_ok=True)
How to sample?
When we created a sample for generating training data, we sampled (roughly) an equal number of images for each decade. We did this because we wanted our model to be equally good for all decades we're looking at. We evaluated in the previous notebook.
Now we are turning to predict on the dataset we might have different criteria. Since we are looking to explore trends across a broad time (1850-1950), we will replicate our strategy from our initial sample of taking snapshots every ten years. One thing we will change this time, is to sample a certain percentage from each decade rather than taking a fixed amount for each year.. This will allow us to see the relative presence of visual vs non-visual ads for each year and underlying changes in the dataset in terms of total numbers.
Going through our other parameters, ads
is the class we want to predict.
out_dir
is the folder where we'll store the predictions. These predictions consist of the rows from the original newspaper navigator dataset with extra columns for the predicted classes. We'll explore this output in more detail in subsequent notebooks.
bs
is the size of each batch we want to use for inference. nnanno
will grab a batch of 64 images at a time and then pass this to the GPU (if using the GPU).
The sample_size
is the size of our sample as a float (percentage). We could also pass in a specific value. step
is the step between years, in this case, every ten years. Size is used to form the IIIF request. We can use the IIIF URL request to send us a specific sized image. By default, this will try and get the size as close as possible to this without warping the image. This means we only send the size of the image required. force_dir
is a flag that can be used to stop nnanno
complaining if you are using a non-empty directory to store your annotations. Usually, this check is there to help you avoid overwriting predictions accidentally. However, it's often helpful to get a sense of how long things are going to take. When we start the predict_sample
method, we will see two progress bars. The top progress bar shows the total progress through the entire sample. The second progress bar shows you the progress through each batch. If a sample for a particular year fits into less than one batch, the second progress bar won't show for that batch. We can use this progress bar to quickly give a sense of how long our total requested sample will take. We can then use this to adjust the size of our sample if necessary.
Considerations
As pointed out in other parts of the documentation nnanno
is intended to help smaller samples of the newspaper navigator dataset. Although IIIF can handle large request loads, we should still be considerate. If you want to predict the entire dataset, the newspaper navigator data is available via an s3 bucket.
predict.predict_sample(
"ads",
out_dir=path / "inference",
bs=64,
sample_size=0.003,
step=10,
size=(430, 256),
force_dir=True,
)
Now is an excellent time to get a coffee or have a nap...
For these long-running jobs in Colab, you may occasionally deal with timeouts or getting kicked off Colab. This is one of the trade-offs for getting something 'free'. This is why it's good to save the results of training and inference as you go, so it's easier to pick up where you left off. For example, making sure you save to a Google Drive folder (or a mounted s3 bucket etc.). In the next notebook, we'll start to explore our predictions 😀
fin