Complexity-Driven Model Compression for Resource-constrained Deep Learning on Edge

Muhammad Zawish, Steven Davy, Lizy Abraham

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Recent advances in artificial intelligence (AI) on the Internet of Things (IoT) devices have realized edge AI in several applications by enabling low latency and energy efficiency. However, deploying state-of-the-art convolutional neural networks (CNNs) such as VGG-16 and ResNets on resource-constrained edge devices is practically infeasible due to their extensive parameter counts and floating-point operations (FLOPs). Thus, the concept of network pruning as a type of model compression is gaining attention for accelerating CNNs on low-power devices. State-of-the-art pruning approaches, either structured or unstructured, postulate a three-stage: training-pruning-retraining pipeline, which results in an inevitable retraining overhead. In this work, we posit an orthogonal conjecture on structured pruning at initialization to find sparse subnetworks realizing \approx 60% less training time than conventional pruning. Moreover, conventional pruning focuses on identifying the saliency at filter granularity while ignoring the importance of characteristics at layer granularity. In contrast, the proposed complexity-driven approach leverages the intrinsic complexities of CNN layers to guide the filter pruning process without requiring dense pretraining of models. Particularly, we characterize the importance of CNN layers with respect to parameters, FLOPs, and memory-based complexities to work in tandem with filter pruning in a structured manner. Experiments show the competitive performance of our approach in terms of accuracy and acceleration for all three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA). For example, reducing \approx 70% parameters, \approx 50% FLOPs, and \approx 50% memory from MobileNetV2 did not result in any accuracy loss, unlike state-of-the-art approaches. Lastly, we present a tradeoff between different resources and accuracy, which can be helpful for developers in making the right decisions in resource-constrained IoT environments.

Original languageEnglish
Pages (from-to)3886-3901
Number of pages16
JournalIEEE Transactions on Artificial Intelligence
Volume5
Issue number8
DOIs
Publication statusPublished - 2024

Keywords

  • Convolutional neural networks (CNNs)
  • edge AI
  • edge computing
  • network compression

Fingerprint

Dive into the research topics of 'Complexity-Driven Model Compression for Resource-constrained Deep Learning on Edge'. Together they form a unique fingerprint.

Cite this