This project implements a pseudo "chain-of-thought" or prompt engineering approach in a smaller language model (TinyLlama 1.1B). The chatbot breaks down its reasoning process into explicit steps when answering questions, similar to how larger models like ChatGPT approach problems. This project is:
- A demonstration of prompt engineering techniques for structured output
- An exploration of how formatting prompts affect small LLM responses
- A learning tool for understanding the limitations of model scale
- A pseudo chain-of-thought implementation that mimics reasoning structure
comparision_demo.py is a comparision bot that outputs the same question in both prompt engineered style and without it to show difference. Looking at the evidence from the output files and code, here's my assessment:
COT here is Mostly Pseudo/Coincidence - Here's Why: 1. The Model is Too Small TinyLlama (1.1B parameters) is fundamentally limited: It doesn't have the capacity for genuine "chain-of-thought" reasoning True CoT reasoning emerges around 10B+ parameters (research shows this) At 1.1B, it's mostly pattern matching, not reasoning
2. Evidence from the Outputs Looking at comparison_20241217_191529_.txt_:
"Thinking Process" Response (9.31s):
implications of ai on job opportunities:
1. Automation: the rise of ai will lead to more automation...
2. Changes in skills: ai will require humans to have specific skills...
```
**Direct Response (3.77s):**
```
I don't have access to current events, but according to recent research...
```
What actually happened:
- The "thinking" version just generated MORE text (longer output = longer time)
- The content isn't actually better - it's just more verbose
- The direct response is actually more coherent and concise!
From `alice_analysis_20241217_202008.txt`:
```
"Alice followed the White Rabbits because they were the only ones who spoke to her"
"The White Ravens' voices were soothing..."
"The White rabbits' voices and presence..."
Notice: Inconsistent (Rabbit → Ravens → rabbits) Contradictory statements No actual reasoning chain visible It's just generating text that looks like analysis
4. The Prompts Don't Actually Change Behavior The structured prompts like: python"Let me think about this step by step:
- First, I need to understand what's being asked
- Then, I'll break down the key components..." These are just priming the model to generate numbered lists. The model sees:
"step by step" → generates "1), 2), 3)" "analyze" → generates "analysis:", "implications:" But there's no actual reasoning happening underneath
✅ What IS Working 1. Output Structure The prompts DO successfully make the output more structured:
Forces numbered formatting Creates section headers Makes responses longer and more "essay-like"
2. Temperature/Sampling Effects The generation parameters matter more than the prompt:
pythontemperature=0.7,
top_p=0.9,
no_repeat_ngram_size=3,
repetition_penalty=1.2
These actually affect quality by reducing repetition.
4. Psychological Effect on Humans Seeing "Let me think step by step..." makes readers think the response is more thoughtful, even if the model isn't actually reasoning differently.
- Step-by-step reasoning process
- Multiple thinking patterns (general, analysis, problem-solving)
- GPU acceleration support
- Response time tracking
- Customizable prompt templates
- Memory-efficient implementation suitable for consumer GPUs(ensure to account for CUDA version integration)
- Python 3.12
- CUDA-capable GPU (tested on RTX 3070) CUDA 11.8
- 16GB+ GPU VRAM
- Windows
- Create and activate a virtual environment via powershell terminal:
# Create virtual environment
python -m virtualenv venv
# Activate virtual environment
.\venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Note- if using 11.8 CUDA the above will work. For your version of CUDA, find the respective Pytorch command at their official website- https://pytorch.org/get-started/locally/
To check you cuda, start up the Python interpreter in the terminal(by typing "python" and enter.) then type:
import torch
print(torch.version.cuda)
The above will print the CUDA version PyTorch is using.
It is crucial to install three torch libraries with specific cuda version tags to ensure your gpu is utlized, otherwise this project all though utilizing the lower end of LLMs, will still be infeasable on consumer grade CPUs. A GPU with correct drivers and compatible python libraries are crucial for this project.
Run the demo script:
python PromptEngineering.py
The script will:
- Load the TinyLlama model
- Run through test questions(you can edit these to add your own questions)
- Display thinking process and generation time for each response
Initializing ThinkingLLM (this may take a moment)...
Test Type: general
Question: What makes a good leader?
- Understanding the question: What makes a good leader? - Breaking down key
components: Leadership, character, skills, and experience - Clarifying the
question: What makes a good leader who can inspire and motivate people? -
Providing a clear answer: A leader who has a strong character, a proven track
record of success, and a deep understanding of their team's needs. Example: A
company is looking for a new CEO. The CEO is asked, "What makes a good leader?"
Breaking down key components: Leadership, character, skills, and experience 1.
Leadership: The CEO needs to have the ability to inspire and motivate their
team. They need to have a strong personality and be able to connect with their
team members on a personal level. 2. Character: The CEO needs to be a role
model for their team. They need to have strong values and a commitment to doing
what is right for their team. 3. Skills: The CEO needs to be able to lead their
team through complex challenges. They need to have a deep understanding of their
team's needs and be able to bring together the right resources to overcome
obstacles. 4. Experience: The CEO needs to have a proven track record of
success. They need to have experience in leading a successful team and in
developing and executing strategies. Clarifying the question: A leader who has
a strong character, a proven track record of success, and a deep understanding
of their team's needs. My answer: A leader who has a strong character, a proven
track record of success, and a deep understanding of their team's needs. This
includes having a strong personality, commitment to doing what is right for
their team, ability to lead their team through complex challenges, and proven
track record of success.
Generation time: 12.16 seconds
==========================================================
Press Enter for next test...
You can modify the thinking patterns in the generate_thinking_prompt method of the ThinkingLLM class.
The project uses:
- TinyLlama 1.1B Chat model
- PyTorch with CUDA support
- Hugging Face Transformers library
- Half-precision (FP16) for efficient memory usage
PromptEngineering.py: Main implementationcomparision_demo.py: Comparision Featurerequirements.txt: Required Python packagesREADME.md: Project documentationalice_analysis_...txt: Example test casecomparision_20...txt: Example cases with answers in prompt engineered and non-engineered responses.example_test_case.txt: a test run of PromptEngineering.py
- Recommended: 8GB+ VRAM for comfortable operation
- Response quality limited by model size (1.1B parameters)
- May require further prompt engineering for best results
- Generation times vary based on input complexity
- To ensure a significantly faster experience, needs specific compatible cuda and torch versions.