TY - JOUR
T1 - Complexity-Driven Model Compression for Resource-constrained Deep Learning on Edge
AU - Zawish, Muhammad
AU - Davy, Steven
AU - Abraham, Lizy
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2024
Y1 - 2024
N2 - Recent advances in artificial intelligence (AI) on the Internet of Things (IoT) devices have realized edge AI in several applications by enabling low latency and energy efficiency. However, deploying state-of-the-art convolutional neural networks (CNNs) such as VGG-16 and ResNets on resource-constrained edge devices is practically infeasible due to their extensive parameter counts and floating-point operations (FLOPs). Thus, the concept of network pruning as a type of model compression is gaining attention for accelerating CNNs on low-power devices. State-of-the-art pruning approaches, either structured or unstructured, postulate a three-stage: training-pruning-retraining pipeline, which results in an inevitable retraining overhead. In this work, we posit an orthogonal conjecture on structured pruning at initialization to find sparse subnetworks realizing \approx 60% less training time than conventional pruning. Moreover, conventional pruning focuses on identifying the saliency at filter granularity while ignoring the importance of characteristics at layer granularity. In contrast, the proposed complexity-driven approach leverages the intrinsic complexities of CNN layers to guide the filter pruning process without requiring dense pretraining of models. Particularly, we characterize the importance of CNN layers with respect to parameters, FLOPs, and memory-based complexities to work in tandem with filter pruning in a structured manner. Experiments show the competitive performance of our approach in terms of accuracy and acceleration for all three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA). For example, reducing \approx 70% parameters, \approx 50% FLOPs, and \approx 50% memory from MobileNetV2 did not result in any accuracy loss, unlike state-of-the-art approaches. Lastly, we present a tradeoff between different resources and accuracy, which can be helpful for developers in making the right decisions in resource-constrained IoT environments.
AB - Recent advances in artificial intelligence (AI) on the Internet of Things (IoT) devices have realized edge AI in several applications by enabling low latency and energy efficiency. However, deploying state-of-the-art convolutional neural networks (CNNs) such as VGG-16 and ResNets on resource-constrained edge devices is practically infeasible due to their extensive parameter counts and floating-point operations (FLOPs). Thus, the concept of network pruning as a type of model compression is gaining attention for accelerating CNNs on low-power devices. State-of-the-art pruning approaches, either structured or unstructured, postulate a three-stage: training-pruning-retraining pipeline, which results in an inevitable retraining overhead. In this work, we posit an orthogonal conjecture on structured pruning at initialization to find sparse subnetworks realizing \approx 60% less training time than conventional pruning. Moreover, conventional pruning focuses on identifying the saliency at filter granularity while ignoring the importance of characteristics at layer granularity. In contrast, the proposed complexity-driven approach leverages the intrinsic complexities of CNN layers to guide the filter pruning process without requiring dense pretraining of models. Particularly, we characterize the importance of CNN layers with respect to parameters, FLOPs, and memory-based complexities to work in tandem with filter pruning in a structured manner. Experiments show the competitive performance of our approach in terms of accuracy and acceleration for all three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA). For example, reducing \approx 70% parameters, \approx 50% FLOPs, and \approx 50% memory from MobileNetV2 did not result in any accuracy loss, unlike state-of-the-art approaches. Lastly, we present a tradeoff between different resources and accuracy, which can be helpful for developers in making the right decisions in resource-constrained IoT environments.
KW - Convolutional neural networks (CNNs)
KW - edge AI
KW - edge computing
KW - network compression
UR - http://www.scopus.com/inward/record.url?scp=85182939806&partnerID=8YFLogxK
U2 - 10.1109/TAI.2024.3353157
DO - 10.1109/TAI.2024.3353157
M3 - Article
AN - SCOPUS:85182939806
SN - 2691-4581
VL - 5
SP - 3886
EP - 3901
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
IS - 8
ER -