Using Machine Learning to Identify Individual Ants

Machine Learning, Python, Computer Vision, OpenCV, PyTorch, C++

Overview

The goal of this project was to create a system that could identify individual ants in a colony. Using a machine vision camera equipped with a macro lens, I was able to collect 27,244 images of 55 different harvester ants. These images were then used to train a machine learning network to create a feature vector, or embedding, to describe each ant. By comparing embeddings of two ants, a threshold could be used to determine if the ants are the same or different with and accuracy of 86.04%.

Why Ants?

The first reaction when I tell people about this project is typically something along the lines of “how is this useful?” or “why would you want to do that?”. The answer is that for most people, it’s not useful at all. However, for myrmecologists (ant scientists), it is very useful. Ants are very social insects that are commonly used to study social behavior and can be used to learn more about how swarms of robots can be used to accomplish tasks. The current way of tracking individual ants in a colony is an extremely tedious process that involves manually labeling each ant in a colony by painting different patterns on them. Check out this video to see exactly how tedious this process can be. By creating a system that can automatically identify individual ants, we can save researchers a lot of time and effort.

Project Setup and Hardware

To collect images of the ants, I used a Blackfly S camera from FLIR equipped with a macro lens. The camera was mounted on a microscope stand and placed above a 3D printed channel that the ants could walk through. By using this setup, the ants were forced to walk in a straight line in the camera’s field of view. This entire setup was then placed inside of a light box to ensure consistent lighting.

The image below shows the entire setup. There were two main living chambers for the ants. One for ants that data had already been collected for and one for ants that data still needed to be imaged. There are also multiple 3D printed gates to control the flow of ants from the living chambers to the data collection area.

Setup Overview

Data Collection

By using the setup described above, I was able to I was able to collect 27,244 images of 55 different harvester ants. The traffic control gates were used to isolated one ant in the data collection area. This allowed me to collect a videos of one ant at a time.

Ant Imaging Channel and Image

To interact with the Blackfly camera and collect videos of the ants walking through the imaging channel, I had to first create a library in C++ to interface with the camera. It allowed me to control all of the camera’s settings and collect videos. I also created a scripts to semi-automatically collect videos of the ants.

When collecting videos, I would first let an ant walk into the data collection area and then start recording. Once the ant walked under the camera enough, I would stop recording and have the ant walk to the separate living chamber. I would then repeat this process for each unfilmed ant. Videos were named sequentially as “ant_#.avi” so that I could keep track of which ant was in each video.

Methodology

System Pipeline

Once videos of the ants were collected, I used YOLO V8 to find video frames where ants were present. I then saved a cropped image of the ant from the YOLO detection in a folder corresponding to that ant’s identity.

Ant images are then put into the same convolutional neural network, known as a siamese network, that outputs a feature vector, or embedding, for each image. By finding the difference between embeddings of two images and taking the norm of that difference a distance is obtained. By comparing this distance to a threshold, I was able to determine if ants were the same or different.

Model Summary

The model itself outputs a 128 element feature vector. To train it I used a loss function known as triplet loss. Training with triplet loss can be understood intuitively by giving the trainer 3 images at each training step, an anchor, a positive, and a negative. The anchor is an image of any ant, the positive is then a different picture of the same ant, and the negative is a picture of a different ant. During training, the model weights will be updated so that the distance between anchor and positive embeddings is smaller than the distance between anchor and negative embeddings.

Triplet Loss Function

Cost Function with Triplet Loss

Triplet loss makes this work because the distances between the anchor positive pair minus the anchor negative pair plus some margin must be less than zero. Because the difference between distances must be negative, the anchor negative distance must be larger than the anchor positive distance. The margin is then used to try and push the distances from the anchor to positive and negative embeddings further from each other. Because the max is taken between the difference of distances and zero, the loss will be zero when distances are separated by the defined margin.

The model was trained on 18,845 images of 45 different ants.

Results

To choose a threshold, I tested a range of thresholds between the average anchor-positive distance and average anchor-negative distance. The threshold was chosen by picking the value that achieved the highest accuracy.

Threshold Test

The model was tested on two different test sets. One contained 2,357 different images of 45 ants that the model had seen in training and the other had 3,687 images of 10 unseen ants. On the dataset of previously seen ants, the model achieved a true positive rate of 95.21%, a true negative rate of 99.6% and an accuracy of 97.3%. On the dataset of unseen ants, the model achieved a true positive rate of 80.67%, a true negative rate of 93.68% and an accuracy of 86.04%. Accuracy here is defined as the number of true positives plus the number of true negative over the total number of predictions.