International Journal on Science and Technology
E-ISSN: 2229-7677
•
Impact Factor: 9.88
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJSAT
Upcoming Conference(s) ↓
Conferences Published ↓
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 17 Issue 1
January-March 2026
Indexing Partners
Experiential Reinforcement Learning for Stock Trading: An LLM-Based Agent Comparison of Mistral and Qwen Without Gradient Updates
| Author(s) | Ms. Sumedha Arya |
|---|---|
| Country | India |
| Abstract | Large Language Models (LLMs) have proved themselves stronger in decision-making tasks. However, they often struggle in financial markets where information related to profits and losses are delayed and uncertain. In this study, we adapt the Experiential Reinforcement Learning (ERL) framework for single-asset stock trading using Dow Jones Industrial Average (DJIA) data and financial news from 2015 to 2020. Unlike traditional reinforcement learning (RL), our method does not update model weights; instead, learning happens through structured self-reflection, FAISS-based memory storage, and reusing successful trades as few-shot examples. We implement the ERL cycle—first decision, simulated outcome, reflection, improved second decision, and selective memory storage—using Mistral-7B-Instruct-v0.3 and Qwen2.5-7B-Instruct in a custom trading simulator with $10,000 initial capital. Results show that Mistral produced weak reward signals and ended with a −2.13% return, while Qwen stored more useful reflections and achieved around +2–3% return in partial runs, showing more stable improvement. Overall, the study highlights the importance of model capability and clean reward signals, and demonstrates that ERL can be an effective no-gradient alternative to traditional RL for delayed-reward financial trading tasks. |
| Keywords | Experiential Reinforcement Learning, Large Language Models, Stock Trading, Reflection, Prompt-based Learning, Zero-Gradient Learning |
| Published In | Volume 17, Issue 1, January-March 2026 |
| Published On | 2026-03-03 |
Share this

CrossRef DOI is assigned to each research paper published in our journal.
IJSAT DOI prefix is
10.71097/IJSAT
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.