MODEL COMPRESSION VIA STRUCTURAL PRUNING AND FEATURE DISTILLATION FOR ACCURATE MULTI-SPECTRAL OBJECT DETECTION ON EDGE-DEVICES

Abstract

Multi-spectral infrared object detection across different infrared wavelengths is a challenging task. Although some fullsized object detection models, such as YOLOv4 and ScaledYOLO, may achieve good infrared object detection, they are resource-demanding and unsuitable for real-time detection on edge devices. Tiny versions for object detection are proposed to meet the practical requirement, but they usually sacrifice model accuracy and generalization for efficiency. We propose an accurate and efficient object detector capable of performing real-time inference under the hardware constraints of an edge device by leveraging structural pruning, feature distillation, and neural architecture search (NAS). The experiments on FLIR and multi-spectral object detection datasets show that our model achieves comparable mAP to full-sized models while having 14x times fewer parameters and 3.5x times fewer FLOPs. Our model can perform infrared detection well across different infrared wavelengths. The optimal CSPNet configurations of our detection network selected by NAS show that the resulting architectures outperform the baseline.

Contributions

We proposed a novel DoubleCSP module that improves overall accuracy and inference speed over baseline CSP. Furthermore, we explored different γ parameters to get the best accuracy/speed trade-off for the current task.
Our designed object detection algorithm is compact and able to achieve efficient inference speed which makes it suitable for real-time object detection using edge devices (Following Figure). Besides, our compressed model is still generalized enough to achieve high detection accuracy on multi-spectral infrared data.
We attempt to utilize neural architecture search (NAS) to determine our model backbone and detection neck and build a better accuracy/speed trade-off architecture. Our results demonstrate that the compressed network outperforms the basic human-designed network configuration in performance and efficiency.

Our model achieves a good accuracy/speed trade-off and falls between the big and compact algorithms.

Proposed Method

The detailed illustration of the proposed overall workflow. We follow a ratio of 0.25 / 0.75 to split the CSP channel split. The full-size model undergoes filter and layer pruning to create a student network. Finally, the knowledge from the full-size network, is distilled via feature distillation to the target (student) network.

An illustration of our DoubleCSP module. (a) shows the DoubleCSP module structure. (b) is the original CSP architecture. Inside the DoubleCSP, features are further split into γ/2 channels and then added to the main branch in the middle of the module.

Experimental Results

Model
Model	AP	FPS	FLOPs	Parameters
YOLOv3	53.4	2.5	155.1G	61.5M
YOLOv4	54.4	2.5	141.5G	64M
ScaledYOLO	56.2	5	119G	52.5M
Ours	53.5	15.4	33.8G	3.6M

Above table demonstrates that our method achieves comparable performance to the full-sized architectures and is three times faster in terms of inference speed. It also retains the capacity of model generalization in detecting objects from different multi-spectrum infrared images (Following Figure).

Model
Model	AP50	FPS
ScaledYOLO	80.1	70
Ours	80.5	76

The results in above table show the comparison between the proposed novel DoubleCSP module and classical CSP. Evidently, the model constructed from DoubleCSP blocks outperforms the CSP-based models in both performance and efficiency.

Model
Model	AP50	Parameters	FPS
ScaledYOLO	80.1	52.4M	217
ScaledYOLO-NAS-Uniform	79.7	44.9M	238
ScaledYOLO-NAS-Synflow	80.6	40M	243

To prove the benefits of the NAS applied to explore various CSP configurations in different stages, we compare the human-designed and our automatically discovered architectures. (as shown in above table)

Ablation Study

CSP γ = 0.25	Filters	Layers	KD	AP	FPS
CSP γ = 0.25	Filters	Layers	KD	AP	FPS	✘	✘	✘	✘	56.2	5
✔	✘	✘	✘	55.5	7
✔	✔	✘	✘	54	9.5
✔	✔	✔	✘	52.4	15.4
✔	✔	✔	✔	53.5	15.4

We conduct an ablation study to show the effect of every step in the proposed design procedure.