
International Journal on Science and Technology
E-ISSN: 2229-7677
•
Impact Factor: 9.88
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 16 Issue 3
July-September 2025
Indexing Partners



















Accelerating AI Inference on Edge Devices Using Customized Digital Hardware
Author(s) | Mr. Karthik Wali |
---|---|
Country | United States |
Abstract | The rapid growth of artificial intelligence (AI) has transformed the landscape of computation, particularly in sectors requiring real-time processing and intelligent decision-making at the edge of networks. Edge computing has emerged as a compelling alternative to traditional cloud-based systems by enabling low-latency, localized data processing. However, the inherent resource limitations of edge devices, including constrained memory, computational power, and energy availability, pose substantial challenges for executing AI inference workloads. To overcome these barriers, researchers have increasingly turned to customized digital hardware solutions such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and neural processing units (NPUs). These hardware accelerators are specifically tailored to meet the unique demands of AI inference tasks, offering significant improvements in energy efficiency, throughput, and latency. Customized digital hardware is designed to optimize specific computational patterns found in AI workloads, such as matrix multiplications and activation functions in neural networks. By streamlining operations and eliminating unnecessary general-purpose processing overhead, these platforms can deliver orders of magnitude improvements in performance per watt compared to traditional CPUs or GPUs. ASICs, for instance, provide unparalleled energy efficiency and throughput when optimized for fixed-function inference tasks. FPGAs offer reconfigurability, enabling designers to tailor the data flow and logic structure for diverse AI models, which is particularly beneficial in applications requiring flexibility and model updates. NPUs, purpose-built for deep learning, integrate dedicated tensor processing elements that accelerate the execution of convolutional and fully connected layers in neural networks. This paper presents a comprehensive investigation into the deployment of customized digital hardware for accelerating AI inference on edge devices. It evaluates the performance trade-offs among ASICs, FPGAs, and NPUs through benchmarking experiments involving representative edge AI workloads. The methodology includes selection of real-world AI models, such as MobileNet and Tiny-YOLO, and deployment across commercially available edge hardware platforms. Key performance metrics, including inference latency, energy consumption, throughput, and model accuracy, are analyzed to assess the effectiveness of each hardware category. The results demonstrate that customized hardware not only improves inference speed and energy efficiency but also significantly enhances the feasibility of deploying sophisticated AI models on low-power, real-time edge devices. While ASICs lead in performance and power efficiency, FPGAs offer crucial adaptability for evolving workloads, and NPUs strike a balance between specialization and integration in modern system-on-chip architectures. The discussion also addresses practical considerations such as design complexity, cost, and integration challenges. Through this comparative study, the paper aims to guide hardware designers, AI practitioners, and system architects in selecting and optimizing digital hardware for edge AI inference. Ultimately, the research highlights that the co-design of AI algorithms and hardware architectures is essential for meeting the growing demand for intelligent, decentralized systems. |
Keywords | Edge AI, AI inference acceleration, customized digital hardware, application-specific integrated circuits, field-programmable gate arrays, neural processing units, low-power computing, real-time AI, edge computing architectures, energy-efficient hardware, AI model deployment, hardware-software co-design, latency optimization, embedded AI systems, deep learning at the edge, lightweight AI models, tensor operations, digital signal processing, neural network optimization, on-device intelligence. |
Field | Engineering |
Published In | Volume 16, Issue 1, January-March 2025 |
Published On | 2025-01-08 |
DOI | https://doi.org/10.71097/IJSAT.v16.i1.6696 |
Short DOI | https://doi.org/g9rnt6 |
Share this


CrossRef DOI is assigned to each research paper published in our journal.
IJSAT DOI prefix is
10.71097/IJSAT
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
