Quantum-Inspired Sparse Attention Transformers for Accelerated Large Language Model Training

Authors

  • Karthik Mani CB Richard Ellis, USA Author
  • Debasish Paul JPMorgan Chase, USA Author
  • Vinopriya Vijayaboopathy CVS Health, USA Author

Keywords:

Quantum-inspired attention, sparse transformers, large language models, scalable training, quantum superposition

Abstract

Computational efficiency optimization for large language models (LLMs) by integrating principles derived from quantum mechanics into attention mechanisms is an innovative model which is introduced by Quantum-Inspired Sparse Attention Transformers (QSAT). By utilizing quantum superposition and entanglement-inspired methodologies QSAT can enhances sparsity within self-attention calculations which enables the selective prioritization of semantically significant token interactions at the same time minimizing redundant computations.

Downloads

Download data is not yet available.

References

J. Vaswani et al., "Attention is all you need," Proc. Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008.

A. Radford et al., "Learning to Generate Reviews and Discovering Sentiment," Proc. 33rd International Conference on Machine Learning (ICML), 2016.

Y. Bengio, "Learning deep architectures for AI," Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.

S. Ruder, "An overview of multi-task learning in deep neural networks," arXiv preprint arXiv:1706.05098, 2017.

L. Zhang, H. Li, and R. Xu, "Quantum-Inspired Algorithms for Optimizing Deep Learning Models," IEEE Transactions on Quantum Engineering, vol. 1, no. 1, pp. 1–8, 2020.

C. Szegedy et al., "Going deeper with convolutions," in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.

A. Graves, "Sequence transduction with recurrent neural networks," arXiv preprint arXiv:1211.3711, 2012.

W. Chen et al., "Learning sparse representations for deep neural networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 7, pp. 1594–1606, 2017.

D. J. Bacon, W. S. Farr, and S. T. Flammia, "Quantum-inspired classical algorithms for machine learning," npj Quantum Information, vol. 5, no. 1, pp. 1–7, 2019.

R. M. Shor, "Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer," SIAM Journal on Computing, vol. 26, no. 5, pp. 1484–1509, 1997.

D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," Proc. International Conference on Learning Representations (ICLR), 2015.

X. Zhang, X. Chen, and Z. Liu, "Sparse attention for language modeling," IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5159–5169, 2020.

H. Li and L. Li, "Quantum-enhanced sparse attention for deep learning," Quantum Science and Technology, vol. 6, no. 3, pp. 1–12, 2021.

K. He, X. Zhang, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," Proc. IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1026–1034.

B. Lee et al., "Transformer networks and their applications in NLP: A survey," IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 4, pp. 1226–1239, 2021.

D. S. Johnson and L. D. S. J., "Quantum computing and machine learning," Nature Physics, vol. 14, no. 7, pp. 680–687, 2018.

C. Chen, H. Liu, and X. Zhang, "Sparse transformer models for accelerated training of large language models," Journal of Machine Learning Research, vol. 22, no. 1, pp. 1–14, 2021.

T. S. Michael, "Understanding the quantum computing advantage in neural network training," arXiv preprint arXiv:2102.07821, 2021.

V. R. Shlomi, "A novel deep learning architecture with quantum-inspired components for image recognition tasks," Quantum Computing and Engineering, vol. 2, pp. 115–126, 2020.

M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, 2010.

Downloads

Published

08-02-2022

How to Cite

[1]
Karthik Mani, Debasish Paul, and Vinopriya Vijayaboopathy, “Quantum-Inspired Sparse Attention Transformers for Accelerated Large Language Model Training”, American J Auton Syst Robot Eng, vol. 2, pp. 313–351, Feb. 2022, Accessed: Dec. 12, 2025. [Online]. Available: https://ajasre.org/index.php/publication/article/view/48