The ResNeXt Architecture pulls heavily from the VGG/ResNet style with repeatable blocks stacked to build the network. The hyper-parameter selection process can be simplified to keep the same parameters for each block, or rather template. This allows research to be performed on other variables, if the template provides a controlled environment. Two simple rules are kept from VGG style: (i) blocks of the same size share the same width and filter sizes and (ii) computational complexity is roughly kept for each block. Downsampling could reduce the amount of complexity to a block, so typically the width is scaled to maintain consistent flops per block.
Figure 2 shows the basics of a simple neuron, and it provides some insight into the power of the split-merge-transform strategy. This paper proposes viewing the ResNeXt network as a “Network in Neuron”, where the template block follows split-merge-transform.
In the simple neuron figure, you can see how the input x can be split into D separate lower dimensional values. They are merged with the sum function at the bottom and in the case of the simple neuron, are transformed with a simple scalar w.
The paper uses the idea of the simple neuron as an split-merge-transform function which can be expanded into a greater context of the ResNeXt block. The Cardinality is a measure of the set of transforms to be aggregated. Like D in the above figure, the cardinality composes the amount of transforms used in each aggregating block. This is the papers explanation of the “Next” dimension and they have shown by experiments that the cardinality can be more effective as a measure of an essential dimension.