Abstract: Mixed-precision quantization is a popular approach for compressing deep neural networks (DNNs). However, it is challenging to scale the performance efficiently with mixed-precision DNNs ...
Abstract: Sparse Matrix-Matrix Multiplication(SpMM) is a commonly utilized operation in various domains, particularly in the increasingly popular Graph Neural Networks(GNN) framework. The current ...