-
Notifications
You must be signed in to change notification settings - Fork 7
Image Object Detection ‐ Detecto
Workshops sessions are recorded and posted to the UA Datalab Youtube Channel about 1 week after the live session.
Detecto is a light weight python library built on top of deep learning framework Pytorch.
By default, Detecto uses the convolutional neural network architecture Faster R-CNN ResNet-50 FPN. The architecture was pre-trained on the COCO (common objects in context) dataset which contains over 330,000 images, each annotated with 80 object classes (e.g., person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, kite, knife, spoon, tv, book). The images and labels are generally not from aerial view points. Therefore, Detecto is not ready to identify objects in aerial images out-of-the-box. It has to be trained to do so.


Traditional image analysis approaches such as looking at color values of individual pixels or grouping adjacent pixels based on similar values (i.e., objects), will both likely to fail at the task of identifying objects in high-resolution images.
A convolutional neural network (CNNs) can be trained to identify objects in an image more analogously to how humans can identify objects. We generally look at multiple features of an object to decide what the object is. For example, if presented with an image of a tree, we may look at features such as crown shape, leaf architecture, and color to help us identify the object as a tree.
Please watch these VERY useful videos to understand how Convolutional Neural Networks operate:
-
What are Convolutional Neural Networks (CNNs)? from IBM Technology
-
But what is a neural network? from 3Blue1Brown
-
How Convolutional Neural Networks Work from Brandon Rohrer
-
Convolutional Neural Networks Explained (CNN Visualized) from Futurology
CNN models can take a few different approaches to image classification. Detecto specifically carries out object detection.
The NWPU VHR-10 dataset ( Cheng et al., 2014a ) is a very high resolution (VHR) aerial imagery dataset that consists of 800 total images. The dataset has has ten classes of labeled objects: 1. airplane(757), 2. ship(302), 3. storage tank(655), 4. baseball diamond(390), 5. tennis court(524), 6. basketball court(159), 7. ground track field(163), 8. harbor(224), 9. bridges(124), and 10. vehicle(477). Labels were manually annotated with axis-aligned bounding boxes. 715 color images were acquired from Google Earth with the spatial resolution ranging from 0.5-m to 2-m, and 85 pansharpened color infrared (CIR) images were acquired from Vaihingen data with a spatial resolution of 0.08-m.


I have pre-written python code in a Jupyter Notebook that will you show:
-
How to use an existing training dataset to fine-tune the Detecto model in order to detect objects from aerial imagery.
-
How to output and assess the quality of the prediction model
-
How to upload your fine-tuned model to Hugging Face
One option to run the notebook is in Google Colab. This is probably the easiest option to get started. Google provides you with a virtual machine with optional (but limited) GPU resources.
Intersection over Union (IoU) is a measure that shows how well the prediction bounding box aligns with the ground truth box. It's one of the main metrics for evaluating the accuracy of object detection algorithms and helps distinguish between "correct detection" and "incorrect detection".


Object labels need to be in an xml file using the PASCAL VOC format. Try Label Studio tool to create labels.
https://huggingface.co/datasets/satellite-image-deep-learning/SODA-A