Abstract: This paper addresses the challenge of real-time text streaming with large language models (LLMs) over unstable networks and server contention. To mitigate delays from packet loss and ...