Classifying Track Bikes Using Neural Networks

Teaching neural nets to guess the manufacturer of a track bike.

toc: true
branch: master
badges: false
comments: false
author: Nishan Mann
categories: [fastpages, jupyter, neural networks]

This notebook has various goals - document initial attempts at using a CNN to classify track bikes - demonstrate the use of fastai python library to quickly build models - demonstrate the use of fastpages to convert Jupyter notebooks into blog posts

At the time of writing, please be aware of this bug and do not run v2.2.7 of the fastai library.

#hide
# !pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

import pandas as pd

from fastbook import *
from fastai.vision.widgets import *

All Makes Multi-Class Classfier

Loading and Preparing Data

This dataset was collected by manual scraping various public facebook pages of track day photographers and manually labelling each image. This is a painstaking process hence the lack of scale.

path_all_makes = Path("/home/nishan/Datasets/TrackBikes/")

all_makes_db = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.3, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

all_makes_dls = all_makes_db.dataloaders(path_all_makes)

all_makes_dls.categorize.vocab

['aprilia', 'bmw', 'ducati', 'honda', 'kawasaki', 'not_a_bike', 'suzuki', 'triumph', 'yamaha']

8 classes of bikes and 1 class of not a bike. Hence this is a multiclass problem.

pd.Series([all_makes_db.get_y(fpath) for fpath in all_makes_db.get_items(path_all_makes)]).value_counts()

yamaha        106
not_a_bike     66
kawasaki       61
suzuki         43
honda          37
bmw            23
aprilia        19
ducati         18
triumph        13
dtype: int64

this breakdown is before the random train test split
class imbalance here is representative of reality: yamaha rules on the track! Yamaha makes excellent track bikesw with wide availabilty of parts. European branfs on the other hand such as Triumph or Ducati don’t fare well on the track due to their expensive upkeep.

all_makes_dls.show_batch()

as one can see it is a hard job to tell one track bike from another, even veterans sometimes have a hard time
there are also riders in the image that really don’t need to be there.
there is one exception, not_a_bike images are clearly distinct

Train

learner_all_makes = cnn_learner(all_makes_dls, 
                                resnet18, 
                                metrics=[error_rate, F1Score(average="weighted")], 
                                lr=0.001)
learner_all_makes.fine_tune(20)

epoch	train_loss	valid_loss	error_rate	f1_score	time
0	3.387504	3.434762	0.843478	0.150317	00:07

epoch	train_loss	valid_loss	error_rate	f1_score	time
0	2.458725	2.720732	0.782609	0.238803	00:06
1	2.288914	2.325032	0.721739	0.297734	00:06
2	2.100991	2.084926	0.660870	0.350149	00:06
3	1.928781	1.972242	0.600000	0.390711	00:06
4	1.728745	1.952721	0.582609	0.403826	00:06
5	1.554006	1.984210	0.565217	0.418602	00:06
6	1.371065	2.067232	0.556522	0.425561	00:06
7	1.210570	2.109502	0.539130	0.447560	00:06
8	1.068196	2.182838	0.530435	0.452751	00:06
9	0.949947	2.231217	0.547826	0.439832	00:06
10	0.855398	2.246051	0.530435	0.464127	00:06
11	0.767150	2.230837	0.530435	0.464127	00:06
12	0.687040	2.237967	0.539130	0.457851	00:06
13	0.617871	2.246048	0.539130	0.457851	00:06
14	0.560617	2.254689	0.530435	0.466674	00:06
15	0.511584	2.272471	0.530435	0.466674	00:07
16	0.473674	2.285174	0.539130	0.454330	00:06
17	0.434411	2.279321	0.539130	0.454330	00:06
18	0.396292	2.275615	0.539130	0.455911	00:06
19	0.365418	2.275396	0.530435	0.461386	00:06

due to class imbalance, error_rate is perhaps not the best measure of performance
valid_loss is stagnant after epoch 14 while train_loss is decreasing. Hence the model is starting to overfit to the train set after epoch 14
the weighted f1_score is ~0.46 which is quite bad

Eval

interp_all_bikes = ClassificationInterpretation.from_learner(learner_all_makes)

interp_all_bikes.print_classification_report()

              precision    recall  f1-score   support

     aprilia       0.40      0.40      0.40         5
         bmw       0.11      0.17      0.13         6
      ducati       0.00      0.00      0.00         3
       honda       0.33      0.19      0.24        16
    kawasaki       0.50      0.40      0.44        15
  not_a_bike       0.94      0.94      0.94        17
      suzuki       0.40      0.25      0.31        16
     triumph       0.50      0.33      0.40         3
      yamaha       0.45      0.62      0.52        34

    accuracy                           0.47       115
   macro avg       0.40      0.37      0.38       115
weighted avg       0.47      0.47      0.46       115

interp_all_bikes.plot_confusion_matrix(dpi=100)

the first thing to note is the lack of labelled data in the validation set for some classes
- for example, ducati has 3 images while yamaha has 34, bmw has 6
not_a_bike class does the best. this means the classifier has indeed figured out what constitutes a motorcycle. This could be because during the training of resnet, one class was actually motorcycle.
suzuki is often confused with yamaha
honda is the most confused of all classes
most bike classes get confused with yamaha

interp_all_bikes.most_confused(2)

[('suzuki', 'yamaha', 9),
 ('kawasaki', 'yamaha', 7),
 ('yamaha', 'bmw', 4),
 ('bmw', 'yamaha', 3),
 ('honda', 'aprilia', 3),
 ('honda', 'bmw', 3),
 ('honda', 'kawasaki', 3),
 ('yamaha', 'ducati', 3),
 ('aprilia', 'yamaha', 2),
 ('bmw', 'suzuki', 2),
 ('ducati', 'honda', 2),
 ('honda', 'yamaha', 2),
 ('kawasaki', 'honda', 2),
 ('triumph', 'yamaha', 2),
 ('yamaha', 'kawasaki', 2),
 ('yamaha', 'suzuki', 2)]

interp_all_bikes.plot_top_losses(9,figsize=(18, 18))

Challenges in classification - lack of data - the rider adds unnecessary noise - different orientations of the bike - there are genuinely very few features from the bodywork that separate the makes - bike is not always center in the picture - make contains different models R6 R1 just in yamaha for example

# interp_all_bikes.top_losses()

# interp_all_bikes.preds

# learner_all_makes.get_preds()
# y_probs, y_preds = learner_all_makes.get_preds(dl=dls_all_makes.valid)

# y_probs

# y_preds

# dls_all_makes.valid.vocab

# y_preds.where(y_preds == 4, torch.scalar_tensor(0))

# torch.scalar_tensor(0,)

# dls_all_makes.valid.categorize?

# dls_all_makes.valid.items

Yamaha or Not Multi-Class Classifier

We now try an easier problem, is the trackbike a {Yamaha, Not a Yamaha, Not a bike}. Since we’ve a lot of Yamahas in our labelled dataset, by bunching not Yamahas into one class, hopefully we can alleviate the class imbalance problem.

path_yamaha_binary = Path("/home/nishan/Datasets/YamahaBinary/")

yamaha_binary_db = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

# other possible datablocks. Mainly they differ in the transforms applied to the images
# bikes = DataBlock(
#     blocks=(ImageBlock, CategoryBlock), 
#     get_items=get_image_files, 
#     splitter=None,
#     get_y=parent_label,
#     item_tfms=Resize(128))

# bikes = bikes.new(
#     item_tfms=RandomResizedCrop(224, min_scale=0.5),
#     batch_tfms=aug_transforms())

# bikes = bikes.new(
#     item_tfms=RandomResizedCrop(224, min_scale=0.5),
#     batch_tfms=None)
# dls = bikes.dataloaders(path_trackbikes)

# bikes = bikes.new(
#     item_tfms=Resize(224, ResizeMethod.Squish),
#     batch_tfms=aug_transforms())

# bikes = bikes.new(
#     item_tfms=Resize(224, ResizeMethod.Pad, pad_mode='zeros'),
#     batch_tfms=None)

pd.Series([yamaha_binary_db.get_y(fpath) for fpath in yamaha_binary_db.get_items(path_yamaha_binary)]).value_counts()

not_yamaha    224
yamaha        165
not_a_bike     66
dtype: int64

the value count for yamaha doesn’t align with the previous section because I added more bikes when assembling this dataset
class imbalance is now partially alleviated as we have more not yamahas

yamaha_binary_dls = yamaha_binary_db.dataloaders(path_yamaha_binary)

yamaha_binary_dls.show_batch(max_n=6, nrows=2)

you can see that the cropping sometimes takes out the bike from the image

Train

learner_yamaha_binary = cnn_learner(yamaha_binary_dls, 
                                    resnet18, metrics=[error_rate, F1Score(average="weighted")], 
                                    lr=0.001)
learner_yamaha_binary.fine_tune(20)

epoch	train_loss	valid_loss	error_rate	f1_score	time
0	1.930770	1.247743	0.329670	0.624340	00:07

epoch	train_loss	valid_loss	error_rate	f1_score	time
0	1.424345	0.862586	0.329670	0.651071	00:07
1	1.304576	0.737989	0.296703	0.705542	00:07
2	1.214939	0.711476	0.252747	0.744928	00:07
3	1.092670	0.700428	0.263736	0.731569	00:07
4	0.979581	0.731643	0.263736	0.725127	00:07
5	0.886004	0.733914	0.274725	0.724728	00:07
6	0.779175	0.749347	0.274725	0.725338	00:07
7	0.675493	0.742499	0.274725	0.719206	00:07
8	0.592519	0.745616	0.296703	0.689342	00:07
9	0.531942	0.762820	0.296703	0.693753	00:07
10	0.467197	0.800540	0.296703	0.697521	00:07
11	0.413226	0.844249	0.296703	0.700701	00:07
12	0.369834	0.868356	0.285714	0.713129	00:07
13	0.328100	0.861939	0.274725	0.725338	00:07
14	0.294613	0.849984	0.263736	0.735216	00:07
15	0.267953	0.865282	0.285714	0.715429	00:07
16	0.241579	0.875345	0.274725	0.727325	00:07
17	0.219295	0.864667	0.285714	0.715429	00:07
18	0.198167	0.866620	0.274725	0.727325	00:07
19	0.182071	0.863331	0.274725	0.727325	00:07

the model now has an accepetable weighted f1_score of around ~0.72

Eval

interp_yamaha_binary = ClassificationInterpretation.from_learner(learner_yamaha_binary)

interp_yamaha_binary.plot_confusion_matrix(dpi=100)

interp_yamaha_binary.print_classification_report()

              precision    recall  f1-score   support

  not_a_bike       0.93      1.00      0.96        13
  not_yamaha       0.79      0.73      0.76        51
      yamaha       0.53      0.59      0.56        27

    accuracy                           0.73        91
   macro avg       0.75      0.77      0.76        91
weighted avg       0.73      0.73      0.73        91

its still challenging to predict a yamaha for a yamaha. the precision and recall for yamaha class is the worst
not a bike class is the easiest and brings up the weighted f1 score.
not_yamaha precision and recall are acceptable

interp_yamaha_binary.plot_top_losses(9, nrows=3, figsize=(18, 18))

how much does the rider affect the model?

Production Demo

For a quick interactive demo of the model, launch binder or see this repo for more information.

Conclusion and Future Work

TODO settle on a consistent set of image transforms
TODO remove rider from the image
TODO collect more data