It involves the exploration and innovation of efficiency in deep learning based methods, such as Pruning, Quantization, and the engineering of streamlined models. This pivotal area of research is directed towards enhancing the operational efficiency and effectiveness of deep learning architectures by minimizing their computational demands and memory footprint. Such advancements are crucial for enabling the deployment of advanced neural network models in scenarios limited by hardware capabilities.