Platform-aware Progressive Search for Pareto-optimal Neural Architectures

Jin Dong (Mark), Dong*
An-Chieh, Cheng*
Da Cheng, Juan+
Wei Wei+
Min Sun*
*National Tsing-Hua University, Hsinchu, Taiwan
+Google, Mountain View, CA, USA
ICLR 2018 Workshop


Recent breakthroughs in Neural Architectural Search (NAS) have achieved state-of-the-art performances in many applications such as image recognition. However, these techniques typically ignore platform-related constrictions (e.g., inference time and power consumptions) that can be critical for portable devices with limited computing resources. We propose PPP-Net: a multi-objective architectural search framework to automatically generate networks that achieve Pareto Optimality. PPP-Net employs a compact search space inspired by operations used in state-of-the-art mobile CNNs. PPP-Net has also adopted the progressive search strategy used in a recent literature (Liu et al.). Experimental results demonstrate that PPP-Net achieves better performances in both (a) higher accuracy and (b) shorter inference time, comparing to the state-of-the-art CondenseNet.

Search Space

Each block consists of multiple layers of two types - normalization (Norm) and convolutional (Conv) layers. We progressively add layers following the Norm-Conv-Norm-Conv order (Fig.1(a)-Right). The operations available for Norm (yellow boxes) and Conv (green boxes) layers are shown in Fig.1(a)-Left. The block of other efficient CNNs are shown in Fig.1(b). Our search space covers hand-crafted efficient operations to take advantages of prior human knowledge on designing efficient CNNs. This not only ensures good quality of our searched architectures but also reduces the searching time for PPP-Net.

Search Algorithm

  1. Mutate. For each l-layers block, we enumerate all possible l+1-layers blocks.
  2. Regress accuracy. We use a Recurrent Neural Network (RNN) to regress network accuracy given its architecture. This avoids time-consuming training to obtain true accuracy of a network with a slight drawback of regression error.
  3. Select networks. Our main contribution is to use Pareto Optimality over multiple objectives to select K networks (Fig.. 2(b)) rather than simply select top K accurate ones as in Liu et al. Note that other objectives like the number of parameters, FLOPs, and actual inference time can be computed very efficiently
  4. Update regressor. We train the selected K networks each for N epochs. Then, we use the evaluation accuracies (output) and the architectures (inputs) to update the RNN regressor.

Experimental Results