This paper proposes to take a 30000 ft view of the object detector landscape. Its interesting to read the unbiased views of models from the outside, compared to the obviously biased versions from their individual papers… YOLO, I’m looking at you (only once).
The task of precisely estimating the concepts and locations of objects contained in images.
Now we know what we are looking for, how specifically will it be approached. Three systems in the pipeline versions are noted:
Information Region Selection - a piece of the pipe that predicts where objects may appear
Feature extraction - the extraction of visual features
Classification - predicting the objects identity
R-CNN seems to be a wasteful over redundant methodology; however, I see it popping up a lot, so there must be something to it.