International Journal on Science and Technology

E-ISSN: 2229-7677     Impact Factor: 9.88

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 17 Issue 2 April-June 2026 Submit your research before last 3 days of June to publish your research paper in the issue of April-June.

Memory-Efficient LLM Training and Inference: Balancing Capacity, Speed, and Environmental Impact

Author(s) Smitha Shivashankaraiah
Country United States
Abstract The memory requirements of large language models (LLMs) present a critical bottleneck for both training and inference. While numerous techniques exist to reduce memory usage, most surveys organize them by mechanism or by the specific bottleneck addressed. This paper takes a different approach. We argue that real-world LLM deployment must balance three fundamental constraints: memory capacity, memory speed, and environmental impact. Capacity-focused techniques such as CXL memory pooling prioritize fitting large models and long contexts. Speed-focused techniques such as near-memory compute prioritize low latency for real-time and agentic workloads. Environmental factors — including carbon, water, noise, vibration, and e-waste — impose social and regulatory constraints that can override technical advantages. We present a comparative analysis of these three factors, a decision matrix for different workloads, and recommendations for engineers designing LLM infrastructure. Our conclusion is that memory efficiency is not merely a technical problem but a systems problem requiring trade-offs across capacity, speed, and sustainability.
Keywords Memory efficiency; Large language models; GPU memory; CXL; HBM; Environmental sustainability; AI infrastructure
Field Engineering
Published In Volume 17, Issue 2, April-June 2026
Published On 2026-05-12
DOI https://doi.org/10.71097/IJSAT.v17.i2.11323

Share this