A major issue that a lot of these algorithms are running into is the problem of scale, in terms of variance in the image. An object is linearly going to change in scale the farther away it is in distance from the image sensor, and this changes how the pixels are grouped, which is an issue for OD schemes. The FPN (Feature Pyramid Networks) build upon image pyramid ideas. It is structured with a bottom-up, top-down and several lateral connections between pyramid layers. This structure show below allows high resolution information to be linked to semantically weak features, while low resolution information is linked with semantically strong features. This allows a rich semantic extraction from all layers and can be trained end-to-end with all scales. Also, it is independent of backbone architecture and can be applied to a variety of stages of OD.
Aside from these architectures, the paper looks at upcoming ideas that might prove useful to the field. They include thinking about imbalances of background examples, per region classification importance, imbalance in number of different classes, and compute expense of these heavy models.