Michael Stanley, Researcher Data Science
A great start to the Christmas season for the Lynker Analytics team in collaboration with Ai.Fish was being selected for the NOAA fisheries & Nvidia GPU Hackathon. The Hackathon was a multi-day event designed to help teams of three to six developers accelerate their own code on high-end Nvidia GPUs using a programming model, or machine learning framework of their choice.
The hackathon saw as many as 9 teams each with a different focus ranging from mathematical modelling to machine learning and AI approaches. Our team’s goal was to build a fish detection and classification model using active learning. The overall objective being to demonstrate how quickly a machine learning solution can be built without the availability of hundreds of thousands of training examples.
The teams main aim was to create a solution that could easily complete the task of classifying fish and invertebrate families and species across 11 classes ideally saving time and human effort. As an AI and ML focused company we were really excited to take up this challenge to try and build a sustainable solution that could be used by the industry in the long term.
Having worked previously with NOAA Fisheries, who are experts in monitoring aquatic life NOAA, we knew that they had gathered thousands of hours of stereo still imagery. Analyzing such imagery is data intensive, time consuming and requires a high level of domain knowledge.
When we began reviewing the dataset we had little knowledge of the different fish species present in our dataset. A huge shoutout to NOAA’s scientists who were very helpful in explaining to us the different fish species and invertebrates present in the imagery.
Due to the nature of the hackathon there would inevitably be a crunch to create the final solution which had to be completed within 3 days. It would be virtually impossible to annotate such a large dataset. The team locked upon taking the path of active learning and the help from NOAA’s scientists contributed towards an error free database for the active learning system.
The best way to ensure prompt completion and to have a working solution by the end of the hackathon was to divide the core tasks between Ai.Fish and Lynker Analytics. Ai.Fish having a strong background in fish detection took up the task of creating a fish detection model from images, while Lynker Analytics took on the responsibility of creating a fish classifier system.
For building a workable and an accurate solution for any machine learning application data is the biggest contributor and we were short on annotated data available to us, the teams used a total of 1,733 images and 15,000 bounding box annotations which were used by Ai.Fish to create a fish detection system. Out of these, the team at Lynker Analytics used 7,000 image chips, with multiple chips coming from a single image frame.
The active learning process relied on these image chips generated by AI.Fish’s object detection model to provide images cropped to an area that included a singular sea animal as training or validation data.
A separate set of 300 images, the only pre annotated data available to us, were used for validation data. The source images were approximately 4k each and the image chip sizes varied per species.
After three long days, late nights, an unhealthy amount of coffee and continuous hours of coding the team ended up building a reliable and accurate model. Tune into the next part of the article where we discuss the technical details about the hackathon and the results in depth.
Find Lynker Analytics and Myself on LinkedIn to stay up to date on future exciting environment related ML and AI projects.