Realtime System Multiple times, YOLO makes the point it is able to run in realtime. R-CNN, Fast R-CNN, Faster R-CNN and the fastest DPM are still (at the time of writing) not running at realtime.
Error Analysis YOLO was found to have increased errors for localization. The localization error alone was larger than all other errors combined. This is not true for Fast R-CNN. These differences in errors allows the two models to be ensembled.
Ensemble Models Since YOLO does better with context because it runs the image all at the same time, it has less error on misclassifying background as an object. Fast R-CNN is almost 3x more likely to predict background than YOLO is, so when they are combined, the paper showed an increase from 71.8% to 75.0% in mAP.
YOLO is fast because it ditches the pipeline for a streamlined neural network, and the network is enabled due to the custom targeted loss function. Though the loss function is not perfect, it enabled a quick and effective model.