I wrote this article on Linkedin while I was enrolled in the Masters program in Data Science.
Fig 1. Sample Images
Imagine you are given 1.2 million images of different categories, colors and sizes and asked to memorize features of all the images, so you can classify an unseen image correctly based on your memory and feature learning.
Google has implemented a deep learning neural network for image classification called TensorFlow. Google has provided python libraries which allows you to train TensorFlow with your own images (In the background, it retrains last layer of the neural network).
I supplied 3500+ images of five different categories of flower — Tulips, Sunflowers, Daisy, Roses and Dandelions. The training session lasted about 20 minutes and the model reported high (>89%) accuracy for 500 steps and (90.5% accuracy for) 1000 steps for learning.
After training, I asked the model to label following images into proper categories — Daisy or Sunflower. I intentionally tested machine learning model on three images of sunflowers and daisies together. (See test Images at the top).
The model was largely successful but there were few surprises:
- The model failed to identify sunflowers in the image ‘sunflower3’ and classified it as Daisy with 90%+ certainty. The spiral pattern likely confused the model.
- The model did an excellent job of recognizing sunflowers in the Van Gogh’s painting.
- The model recognized roses in the image ‘daisyandsunflower1’ with 60% certainty. Sunflowers scored 33%, while Dandelions (3.2%) scored higher than daises (1.2%). There are no roses in that image.
- The model overwhelming recognized sunflowers (91%) and daisies (3.2%) ranked higher than the rest in the image ‘daisyandsunflower2’
- In case of image ‘daisyandsunflower3’, the model gave 59% score to sunflowers and 29% to roses. Daisy received merely 0.5% and was largely ignored in the image. Daisies and roses are white in the image.
The impact of doubling steps was marginal but the results skewed in the negative direction in the majority of the results in the 1000 steps test.
In this semester, I am learning about about frequent pattern mining for various types of data including images. This simple experiment demonstrated the promise, as well as challenges of image classification through machine learning.