How does a computer know the difference between two cats? An image classifier created from a trained neural network, of course! Seriously though, I’ve always figured the realm of machine learning is out of my reach, but some of the tools are becoming more accessible to lay-people. This video was quite reassuring:
Learning About Machine Learning
With a confident attitude I went out searching for some ideas on how to create my own classifier. Luckily I didn’t have to look far. Google’s most basic introductory tutorial happened to fit my problem like a glove. Tensorflow for Poets takes you from nothing to a custom image classifier in less than an hour. It also leaves you with all the tools you need to keep using it with your own images… which is what I did.
The basics of creating the classifier went like this…
I needed pictures of my cats. Lots of them. Like hundreds. So I fired up the picam on the Raspberry Pi to record for a couple minutes while ‘nudging’ each cat in front of the camera in various positions and with different lighting. For this I used the raspivid program that comes with Raspbian. The following command records ten seconds of video:
pre>raspivid -o video.h264 -t 10000
For image extraction, I used ffmpeg and the following command to output an image for every second of video:
ffmpeg -i video.h264 -vf fps=1 out%d.png
The ffmpeg docs have lots of useful snippets. Oh, and ffmpeg can be installed as a command line utility via Homebrew on Mac (brew install ffmpeg).
The images were all in PNG format, and I needed .jpeg for Tensorflow. Luckily, OS X has a built in batch conversion tool right in the Preview app. See OSXDaily for details. After the conversion, the images were sorted into two folders – one for Lola, the other for Maddie.
For the training, I simply repeated step 4 in the Tensorflow for Poets tutorial, but used ‘lola’ and ‘maddie’ pictures instead of roses and daisies. I used 4000 iterations for extra thorough training. The basic command is below. Of course, instead of flower photos, I obviously had cat photos.
# In Docker python tensorflow/examples/image_retraining/retrain.py \ --bottleneck_dir=/tf_files/bottlenecks \ --model_dir=/tf_files/inception \ --output_graph=/tf_files/retrained_graph.pb \ --output_labels=/tf_files/retrained_labels.txt \ --image_dir /tf_files/flower_photos
The main output of this training is the model file (retrained_graph.pb), and the accompanying labels file (retrained_labels.txt). These are the files used in the program that will examine new pictures of my cats and determine which one is which (if any cats are present at all).
Finally, I could take the label_image.py script from Step 5 of the tutorial, and use that to analyze new photos of my cats and tell me the confidence of Lola or Maddie being present in the image. All this runs from inside the original Docker image like so:
curl -L https://goo.gl/tx3dqg > $HOME/tf_files/label_image.py docker run -it -v $HOME/tf_files:/tf_files gcr.io/tensorflow/tensorflow:latest-devel python /tf_files/label_image.py /tf_files/new_lola_pic.jpg
Notice in the last line I used a picture of lola instead of a daisy. The output was something like:
lola (score = 0.99138) maddie (score = 0.35342)
Even though Maddie was not in the picture, the confidence was pretty high. I’m guessing because Maddie is a nearly solid black cat, any dark mass in the photo may appear to the model as Maddie. Luckily, if Maddie actually is in the photo, the confidence is much higher. Here’s a score with Maddie, but no Lola:
lola (score = 0.01483) maddie (score = 0.99683)
Good Job, computer!