The papers starts off by showing positive trends in the computer vision research side, particularly how CNNs are creating big gains on accuracy. For instance, the proposed architecture is better than the winning architecture two years ago (then) and the proposed uses 12x fewer parameters than two years ago, all while being more accurate.
Inception is becoming a fundamental architecture as many other ideas will draw from it, and utilize parts or all of it. I think it is interesting to consider the source of the paper and how Google as a company produces slightly different focus compared to others, namely a focus on compute efficiency.
They note the importance of reducing compute complexity because more and more of our world will depend on mobile compute where naturally the raw power will be factors less than data centers.
This idea is indicated in the introduction and will probably play out larger in the body. The idea of network in network was another paper, but it derives its name from the internet meme about going deeper, which fits with the trend in ML to create deeper models. They also propose the “Inception Module” which is a form of organizing the network.
Figure 2 shows a glimpse of what is to come.