Complexity-Driven Model Compression for Resource-constrained Deep Learning on Edge

Muhammad Zawish, Steven Davy, Lizy Abraham

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advances in Artificial Intelligence (AI) on the Internet of Things (IoT) devices have realized Edge AI in several applications by enabling low latency and energy efficiency. However, deploying state-of-the-art Convolutional Neural Networks (CNNs) such as VGG-16 and ResNets on resource-constrained edge devices is practically infeasible due to their extensive parameter counts and floating-point operations (FLOPs). Thus, the concept of network pruning as a type of model compression is gaining attention for accelerating CNNs on low-power devices. State-of-the-art pruning approaches, either <italic>structured</italic> or <italic>unstructured</italic>, postulate a three-stage: <italic>training-pruning-retraining</italic> pipeline, which results in an inevitable retraining overhead. In this work, we posit an orthogonal conjecture on structured pruning at <italic>initialization</italic> to find sparse subnetworks realizing &#x2248; 60&#x0025; less training time than conventional pruning. Moreover, conventional pruning focuses on identifying the saliency at filter granularity while ignoring the importance of characteristics at layer granularity. In contrast, the proposed complexity-driven approach leverages the intrinsic complexities of CNN layers to guide the filter-pruning process without requiring dense pre-training of models. Particularly, we characterize the importance of CNN layers with respect to parameters, FLOPs, and memory-based complexities to work in tandem with filter-pruning in a structured manner. Experiments show the competitive performance of our approach in terms of accuracy and acceleration for all three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA). For example, reducing &#x2248; 70&#x0025; parameters, &#x2248; 50&#x0025; FLOPs, and &#x2248; 50&#x0025; memory from MobileNetV2 did not result in any accuracy loss, unlike state-of-the-art approaches. Lastly, we present a trade-off between different resources and accuracy, which can be helpful for developers in making the right decisions in resource-constrained IoT environments.

Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalIEEE Transactions on Artificial Intelligence
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • Artificial intelligence
  • Complexity theory
  • Computational modeling
  • Convolutional neural networks
  • Convolutional Neural Networks
  • Edge AI
  • Edge computing
  • Network compression
  • Pipelines
  • Solid modeling
  • Training

Fingerprint

Dive into the research topics of 'Complexity-Driven Model Compression for Resource-constrained Deep Learning on Edge'. Together they form a unique fingerprint.

Cite this