7 Object Detection w/ DL A Review - Reg Frameworks


The first on the list was fairly seminal in its success of being able to merge the pipeline into a single regression + classification output.

The paper reviews the basics of YOLO drawing on the original work. They note that many researchers had previously tried to model OD as a regression task but were apparently not as successful as YOLO and SSD

YOLO is essentially an SxS grid where each grid cell is tasked to predict the main class and B bounding boxes. This is effective for speed, but suffers in its ßability to detect small objects. This is because you can only divide up the grid so fine, so it is difficult to resolve beyond the resolution of the grid.

YOLOv2 fixes this problem among others, which was solved in a large part by the contributions of SSD.

Single Shot Multibox Detection (SSD)

This is the other primary regression only OD framework that enjoys large speed improvements. Instead of a grid framework, SSD takes advantage of a set of default anchor boxes. These boxes have been fine tuned to perform well on the test sets, which might be acceptable given typical objects come in standard aspect ratios, but will certainly not generalize well to any deviations from those aspect ratios unless they are well-known before the productization of the model.

SSD300 (300x300) runs at 59 fps, but similar to small objects, suffers as YOLO. The paper notes by strengthening your backbone (ie ResNet101), you are able to alleviate the small object problem a little.