Memory-Efficient LLM Training and Inference: Balancing Capacity, Speed, and  Environmental Impact

Author(s)	Smitha Shivashankaraiah
Country	United States
Abstract	The memory requirements of large language models (LLMs) present a critical bottleneck for both training and inference. While numerous techniques exist to reduce memory usage, most surveys organize them by mechanism or by the specific bottleneck addressed. This paper takes a different approach. We argue that real-world LLM deployment must balance three fundamental constraints: memory capacity, memory speed, and environmental impact. Capacity-focused techniques such as CXL memory pooling prioritize fitting large models and long contexts. Speed-focused techniques such as near-memory compute prioritize low latency for real-time and agentic workloads. Environmental factors — including carbon, water, noise, vibration, and e-waste — impose social and regulatory constraints that can override technical advantages. We present a comparative analysis of these three factors, a decision matrix for different workloads, and recommendations for engineers designing LLM infrastructure. Our conclusion is that memory efficiency is not merely a technical problem but a systems problem requiring trade-offs across capacity, speed, and sustainability.
Keywords	Memory efficiency; Large language models; GPU memory; CXL; HBM; Environmental sustainability; AI infrastructure
Field	Engineering
Published In	Volume 17, Issue 2, April-June 2026
Published On	2026-05-12
DOI	https://doi.org/10.71097/IJSAT.v17.i2.11323

About IJSAT Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics Peer Review & Plagiarism	Join as a Reviewer Editors & Reviewers Reviewer Referral Program Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us	Message on WhatsApp	+91-9687-182-185	editor@ijsat.org

International Journal on Science and Technology