Abstract: With the emergence of large-language models (LLM) and generative AI, which require an enormous amount of model parameters, the required memory bandwidth and capacity for high-end systems is ...