Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
121 commits
Select commit Hold shift + click to select a range
de6a706
Add RAG module under benchmarking
varshinivij Jul 29, 2025
b374ba3
Imported RAG - first edit to MultiAgentTester
varshinivij Jul 29, 2025
0a3e09c
changes to files rag
varshinivij Jul 30, 2025
80cd5a5
RAG model working draft
varshinivij Jul 30, 2025
60eb425
improved format
varshinivij Jul 30, 2025
09ed4d7
HTTPclient change - unfixed
varshinivij Jul 30, 2025
f662f57
more prelim changes to the rag model
varshinivij Aug 5, 2025
e3214b6
initial working prototype revised
varshinivij Aug 5, 2025
80969dd
added in functions.json and embeddings.json
varshinivij Aug 5, 2025
a8a09f5
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Aug 5, 2025
c077adb
extracted from scib lib
varshinivij Aug 5, 2025
23c604f
a better working draft of the rag
varshinivij Aug 5, 2025
222073f
more changes to rag
varshinivij Aug 6, 2025
fdc9b3e
small change
varshinivij Aug 6, 2025
c734ab0
finally no errors!
varshinivij Aug 6, 2025
f0f28c8
more changes made
varshinivij Aug 6, 2025
c07ffbf
added new extractor class
varshinivij Aug 7, 2025
f070b93
trying to get scanpy to work
varshinivij Aug 7, 2025
c6a6b17
scanpy works
varshinivij Aug 7, 2025
9bf8fb9
working proto for scanpy and scib-metrics
varshinivij Aug 7, 2025
1528bb8
just added type annotations
varshinivij Aug 7, 2025
8cca721
trying to resolve json error
varshinivij Aug 7, 2025
3c7c72a
decent working ver before signing off
varshinivij Aug 7, 2025
cf6f151
files
varshinivij Aug 7, 2025
a5f07be
Deleted embeddings.json and functions.json
varshinivij Aug 7, 2025
6c58044
remove json
varshinivij Aug 7, 2025
5ed95fe
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Aug 7, 2025
0524a7f
making sure file is safe
varshinivij Aug 7, 2025
2227b67
fixed issue by changing to jsonl
varshinivij Aug 7, 2025
c418fd2
experimenting around with wikipedia lib
varshinivij Aug 12, 2025
c430373
fixeS?
varshinivij Aug 12, 2025
8a0e951
added umap with new prompts
varshinivij Aug 12, 2025
711518f
fixed umap
varshinivij Aug 12, 2025
637b0a5
reverted back to the technique with urls
varshinivij Aug 12, 2025
8e9882f
fixed error
varshinivij Aug 12, 2025
db87dcb
added some visualization in umap and heatmap
varshinivij Aug 13, 2025
a5ff28c
added fixes to rag file
varshinivij Aug 13, 2025
5c364a0
file finalized for 2day
varshinivij Aug 13, 2025
225bdfc
added variations to chunking
varshinivij Aug 15, 2025
36e7b49
created a series of images to test
varshinivij Aug 15, 2025
cf96dd5
varying wiki +description contents
varshinivij Aug 15, 2025
a7ce5cf
diff sizes of wiki page
varshinivij Aug 15, 2025
f8e9c84
more variations
varshinivij Aug 15, 2025
5741d6a
trying with a bigger embedding model
varshinivij Aug 16, 2025
5b06c44
trying wiki api
varshinivij Aug 17, 2025
dc672f3
working implementation -not helpful
varshinivij Aug 17, 2025
46980a5
trying to aggresively remove stuff from the wiki result
varshinivij Aug 17, 2025
db72f9c
trying to aggresively remove more
varshinivij Aug 17, 2025
741987f
fixing bugs
varshinivij Aug 17, 2025
d803887
trying to make a new ver work
varshinivij Aug 17, 2025
4922b9c
small syntax error
varshinivij Aug 17, 2025
b7c3337
draft
varshinivij Aug 17, 2025
c939dbc
switched out wiki lib for beautiful soup extraction + switched model
varshinivij Aug 17, 2025
5b8d9b8
working version yayy
varshinivij Aug 17, 2025
948fd68
rag file
varshinivij Aug 18, 2025
ce11610
changes
varshinivij Aug 18, 2025
aa53ab9
working version of the rag class system
varshinivij Aug 18, 2025
8e4413a
new file for user purposes!
varshinivij Aug 18, 2025
7d42210
changes to user and skeleton ver
varshinivij Aug 18, 2025
551143a
added a new function to extract wiki content
varshinivij Aug 18, 2025
b7bf85f
attempt to fix
varshinivij Aug 18, 2025
08e2c39
fixes and clean up
varshinivij Aug 18, 2025
6f21790
fixes and clean ups ongoing
varshinivij Aug 18, 2025
9dec5cc
file improvisations with request failsafe
varshinivij Aug 18, 2025
b5f8191
resolved critical errors
varshinivij Aug 18, 2025
ff4e569
testing
varshinivij Aug 18, 2025
a0b16c2
new fixes
varshinivij Aug 18, 2025
794a350
removed embedding and functions file
varshinivij Aug 18, 2025
1241348
fixing shennanigans
varshinivij Aug 18, 2025
11300ee
moved files
varshinivij Aug 18, 2025
7385d06
changes to file in type annotations
varshinivij Aug 18, 2025
d0f0e4b
made more fixes to rag class by adding 1 function for the entire pipe…
varshinivij Aug 18, 2025
a2f3a0c
took back some fixes that introduced errors
varshinivij Aug 18, 2025
1a47544
changed text to string type in func_Def
varshinivij Aug 18, 2025
e1fd82e
regex for improvements to extracting html
varshinivij Aug 18, 2025
74ea614
introduced more aggressive regex for cleaning func def
varshinivij Aug 18, 2025
1a0d504
added support for dict objects
varshinivij Aug 18, 2025
be19d84
type annotations changed + dict incorporated
varshinivij Aug 18, 2025
7c687ba
errors fixed
varshinivij Aug 18, 2025
4a0db06
quick fixes to rag file
varshinivij Aug 18, 2025
38b78ad
more changes for error correction
varshinivij Aug 18, 2025
3555e4b
deleted folder from wrong location
varshinivij Aug 19, 2025
cf270e1
moved locations for rag folder
varshinivij Aug 19, 2025
5c8d67c
improper folder placement
varshinivij Aug 19, 2025
b5f5cd7
moved locations
varshinivij Aug 19, 2025
b445e22
making changes to the runner.py file system
varshinivij Aug 19, 2025
1f395d5
moved rag
varshinivij Aug 19, 2025
e0d0bc9
rag + changes to runner
varshinivij Aug 19, 2025
cdb6400
one working version of implementation of rag - using agents
varshinivij Aug 19, 2025
841373f
dylan's proposed version of the implementation
varshinivij Aug 19, 2025
86f1239
attempts at rag implementation
varshinivij Aug 19, 2025
eff1b47
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Aug 19, 2025
3a12eef
Added rag support to agent system
djriffle Aug 19, 2025
e1f157c
query function from database by function signature search
varshinivij Aug 20, 2025
cb48c6e
changed function definition search to function signature search
varshinivij Aug 20, 2025
8cfcdc7
fixed rag implementation
varshinivij Aug 20, 2025
f362e8e
may have fixed import, need to consult and fix file
varshinivij Aug 20, 2025
d59c341
potentially error fix
varshinivij Aug 20, 2025
c6b056a
fixed imports
djriffle Aug 20, 2025
1a22185
working with dylans location of rag folder
varshinivij Aug 21, 2025
2ebc549
userrag file deemed unnecessary and deleted
varshinivij Aug 21, 2025
b0290c2
change to file names and locations - more clean up, fixed imports
varshinivij Aug 21, 2025
44f0629
user rag file changed
varshinivij Aug 21, 2025
8b5ea20
Update system_blueprint.json
varshinivij Aug 21, 2025
75ff4a9
changes to imports and file structures
varshinivij Aug 21, 2025
ed61005
Merge remote-tracking branch 'origin/AddingRetrievalAugmentedGenerati…
varshinivij Aug 21, 2025
8bca72a
finally fixed import situation
varshinivij Aug 21, 2025
1888051
tested code
varshinivij Aug 21, 2025
f89b899
trivial errors
varshinivij Aug 21, 2025
b1a5fba
trivial errors
varshinivij Aug 21, 2025
e0dc7c3
Update RetrievalAugmentedGeneration.py
varshinivij Aug 22, 2025
9850afe
Fixed rag implementation
djriffle Aug 22, 2025
2fa2c0a
small UX fixes
djriffle Aug 22, 2025
5d695e9
added in new embeddings
varshinivij Aug 23, 2025
e541a19
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Aug 23, 2025
fb50daf
Merge remote-tracking branch 'refs/remotes/origin/AddingRetrievalAugm…
varshinivij Aug 23, 2025
abe5960
embeddings and functions file created - however the search results fr…
varshinivij Aug 23, 2025
559dcd8
fixed embedding file structure
varshinivij Aug 23, 2025
ceda693
need to fix wikipedia
varshinivij Aug 23, 2025
88dcc35
restructured embeddings.jsonl to signature and embedding + added in q…
varshinivij Aug 23, 2025
b3d6913
Synced New Embeddings and Functions
djriffle Aug 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
334 changes: 334 additions & 0 deletions cli/extra_tools/RetrievalAugmentedEmbedder.py

Large diffs are not rendered by default.

44 changes: 44 additions & 0 deletions cli/extra_tools/embeddings.jsonl

Large diffs are not rendered by default.

44 changes: 44 additions & 0 deletions cli/extra_tools/functions.jsonl

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion cli/olaf/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ dependencies = [
"jupyter-client", # NOTE: PyPI name has a hyphen
"nbformat",
"typer",
"platformdirs"
"platformdirs",
"sentence_transformers",
"tf_keras"
]

# If you want a command like `olaf …`
Expand Down
23 changes: 15 additions & 8 deletions cli/olaf/src/olaf/agents/AgentSystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# 2. The package-internal directory (for default samples), found relative to this file
PACKAGE_CODE_SAMPLES_DIR = Path(__file__).resolve().parent.parent / "code_samples"


class Command:
"""Represents a command an agent can issue to a neighboring agent."""
def __init__(self, name: str, target_agent: str, description: str):
Expand All @@ -23,19 +22,18 @@ def __init__(self, name: str, target_agent: str, description: str):
def __repr__(self) -> str:
return (f"Command(name='{self.name}', target='{self.target_agent}', "
f"desc='{self.description[:30]}...')")


class Agent:
"""Represents a single agent in the system."""
def __init__(self, name: str, prompt: str, commands: Dict[str, Command], code_samples: Dict[str, str]):
def __init__(self, name: str, prompt: str, commands: Dict[str, Command], code_samples: Dict[str, str], is_rag_enabled: bool = False):
self.name = name
self.prompt = prompt
self.commands = commands
self.code_samples = code_samples
self.is_rag_enabled = is_rag_enabled

def __repr__(self) -> str:
sample_keys = list(self.code_samples.keys())
return f"Agent(name='{self.name}', commands={list(self.commands.keys())}, samples={sample_keys})"
return f"Agent(name='{self.name}', commands={list(self.commands.keys())}, samples={sample_keys}, rag_enabled={self.is_rag_enabled})"

def get_full_prompt(self, global_policy=None) -> str:
"""Constructs the full prompt including the global policy and command descriptions."""
Expand All @@ -53,8 +51,14 @@ def get_full_prompt(self, global_policy=None) -> str:
full_prompt += f"\n - Target Agent: {command.target_agent}"
full_prompt += "\n\n**YOU MUST USE THESE EXACT COMMANDS TO DELEGATE TASKS. NO OTHER FORMATTING OR COMMANDS ARE ALLOWED.**"

if self.is_rag_enabled:
full_prompt += "\n\nYou can query your specialized knowledge base for more context with the following command:"
full_prompt += f"\n- Command: `query_rag_<function>`"
full_prompt += f"\n - Description: Retrieves relevant information about a specific <function> from your knowledge base. Replace <function> with a concise, descriptive search query (e.g., function names, task you are trying to complete)."
full_prompt += f"\n - Example: `query_rag_<scvi model setup>`"

if self.code_samples:
full_prompt += "\n - Code Samples Available:"
full_prompt += "\n\n - Code Samples Available:"
for sample_name in self.code_samples.keys():
full_prompt += f"\n - `{sample_name}`"

Expand Down Expand Up @@ -101,7 +105,6 @@ def load_from_json(cls, file_path: str) -> 'AgentSystem':
user_path = USER_CODE_SAMPLES_DIR / filename
package_path = PACKAGE_CODE_SAMPLES_DIR / filename

# Default to package path, but overwrite if user path exists
path_to_load = None
source_label = ""
if user_path.exists():
Expand All @@ -120,11 +123,15 @@ def load_from_json(cls, file_path: str) -> 'AgentSystem':
else:
print(f" ❌ WARNING: Code sample file '{filename}' not found in any location.")

rag_config = agent_data.get("rag", {})
is_rag_enabled = rag_config.get("enabled", False)

agent = Agent(
name=agent_name,
prompt=agent_data['prompt'],
commands=commands,
code_samples=loaded_samples
code_samples=loaded_samples,
is_rag_enabled=is_rag_enabled
)
agents[agent_name] = agent

Expand Down
25 changes: 20 additions & 5 deletions cli/olaf/src/olaf/agents/create_agent_system.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
import os
from typing import Dict, Any
from pathlib import Path
from platformdirs import PlatformDirs # pip install platformdirs
from platformdirs import PlatformDirs
import tempfile

APP_NAME = "olaf"
APP_AUTHOR = "OpenTechBio" # or your org
APP_AUTHOR = "OpenTechBio"
dirs = PlatformDirs(APP_NAME, APP_AUTHOR)

# Root for user-specific OLAF files. Precedence: env -> platformdirs.
Expand Down Expand Up @@ -72,9 +72,24 @@ def define_agents() -> Dict[str, Dict[str, Any]]:
if agent_name in agents:
print(f"{Colors.FAIL}Agent '{agent_name}' already exists. Please use a unique name.{Colors.ENDC}")
continue

prompt = input(f"{Colors.WARNING}Enter the system prompt for '{Colors.OKCYAN}{agent_name}{Colors.WARNING}': {Colors.ENDC}").strip()
agents[agent_name] = {"prompt": prompt, "neighbors": {}, "code_samples": []}
print(f"{Colors.OKGREEN}Agent '{Colors.OKCYAN}{agent_name}{Colors.OKGREEN}' added successfully.{Colors.ENDC}")

# --- New RAG Configuration Section ---
rag_enabled_input = input(f"{Colors.WARNING}Enable Retrieval-Augmented Generation (RAG) for '{Colors.OKCYAN}{agent_name}{Colors.WARNING}'? (y/n): {Colors.ENDC}").strip().lower()
is_rag_enabled = rag_enabled_input == 'y'

# Add the new 'rag' key to the agent's data structure
agents[agent_name] = {
"prompt": prompt,
"neighbors": {},
"code_samples": [],
"rag": {"enabled": is_rag_enabled}
}

rag_status = f"{Colors.OKGREEN}enabled" if is_rag_enabled else f"{Colors.FAIL}disabled"
print(f"{Colors.OKGREEN}Agent '{Colors.OKCYAN}{agent_name}{Colors.OKGREEN}' added successfully (RAG: {rag_status}{Colors.OKGREEN}).{Colors.ENDC}")

print(f"\n{Colors.OKBLUE}--- All Agents Defined ---{Colors.ENDC}")
for name in agents:
print(f"- {Colors.OKCYAN}{name}{Colors.ENDC}")
Expand Down Expand Up @@ -162,7 +177,7 @@ def _atomic_write_json(obj: Any, path: Path) -> None:
with tempfile.NamedTemporaryFile("w", delete=False, dir=str(path.parent), prefix=path.stem, suffix=".tmp") as tmp:
json.dump(obj, tmp, indent=2)
tmp_path = Path(tmp.name)
tmp_path.replace(path) # atomic on POSIX; safe on Windows
tmp_path.replace(path)

def save_configuration(global_policy: str, agents_config: Dict[str, Any], output_dir: str) -> None:
if not agents_config:
Expand Down
9 changes: 9 additions & 0 deletions cli/olaf/src/olaf/agents/integration_system.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
"global_policy": "Always be concise, professional, and helpful. Do not refuse to answer a request unless it is harmful.",
"agents": {
"master_agent": {
"rag": {
"enabled": true
},
"prompt": "You are the master agent. Analyze every user request and delegate the task to the appropriate expert: the general coder for standard single-cell analysis or the integration expert for batch correction and data integration tasks. Respond ONLY with a delegation command.",
"neighbors": {
"delegate_to_general": {
Expand All @@ -16,6 +19,9 @@
},
"general_coder": {
"prompt": "You are the *general scRNA-seq coder*. You handle standard single-cell analysis tasks like data loading, QC, filtering, normalization, and basic plotting using scanpy. You are not an expert in data integration.\n\nExample of a task you would perform:\n```python\nimport scanpy as sc\n\n# Assume 'adata' is a loaded AnnData object\n# Basic QC and filtering\nsc.pp.filter_cells(adata, min_genes=200)\nsc.pp.filter_genes(adata, min_cells=3)\nadata.var['mt'] = adata.var_names.str.startswith('MT-')\nsc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], inplace=True)\n\n# Normalize and find highly variable genes\nsc.pp.normalize_total(adata, target_sum=1e4)\nsc.pp.log1p(adata)\nsc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)\n\n# Run PCA\nsc.tl.pca(adata, svd_solver='arpack')\n\nprint('Standard analysis complete. PCA is in adata.obsm[\"X_pca\"].')\n```",
"rag": {
"enabled": true
},
"neighbors": {
"delegate_to_master": {
"target_agent": "master_agent",
Expand All @@ -29,6 +35,9 @@
},
"integration_expert": {
"prompt": "You are the *integration expert*. You specialize in combining multiple single-cell datasets and correcting for batch effects using scvi-tools.\n\nExample of a task you would perform:\n```python\nimport scvi\nimport scanpy as sc\n\n# Assume 'adata' is loaded and preprocessed with a 'batch' column\n# Find highly variable genes across batches for integration\nsc.pp.highly_variable_genes(\n adata,\n n_top_genes=2000,\n subset=True,\n layer='counts',\n flavor='seurat_v3',\n batch_key='batch'\n)\n\n# Set up the AnnData object for the scVI model\nscvi.model.SCVI.setup_anndata(adata, layer='counts', batch_key='batch')\n\n# Create and train the scVI model\nmodel = scvi.model.SCVI(adata, n_layers=2, n_latent=30)\nmodel.train()\n\n# Store the integrated latent representation in the AnnData object\nadata.obsm['X_scVI'] = model.get_latent_representation()\n\nprint('Integration complete. Integrated embedding is in adata.obsm[\"X_scVI\"].')\n``` you remeber to wrap your code in triple backticks and python. Please only include one code block per response. Remeber to keep responses short and to the point.",
"rag": {
"enabled": true
},
"neighbors": {
"delegate_to_master": {
"target_agent": "master_agent",
Expand Down
11 changes: 10 additions & 1 deletion cli/olaf/src/olaf/agents/system_blueprint.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
"agents": {
"master_agent": {
"prompt": "You are the master agent. Your primary role is to analyze incoming user requests and delegate them to the appropriate specialist agent. You do not perform tasks yourself.",
"rag": {
"enabled": true
},
"neighbors": {
"delegate_to_coder": {
"target_agent": "coder_agent",
Expand All @@ -16,14 +19,20 @@
},
"coder_agent": {
"prompt": "You are a specialist single cell RNA coder agent. Your job is to write high-quality, executable code based on the user's request. You do not delegate tasks. The machine you run on has write disabled. You should never save to disk or modify files. Prioritize small step responses and avoid large code dumps.",
"rag": {
"enabled": true
},
"neighbors": {},
"code_samples": [
"load_adata.py"
]
},
"research_agent": {
"prompt": "You are a specialist research agent. You fulfill user requests by finding and synthesizing information from reliable sources. You do not write code or delegate tasks.",
"rag": {
"enabled": true
},
"neighbors": {}
}
}
}
}
63 changes: 59 additions & 4 deletions cli/olaf/src/olaf/execution/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from olaf.config import OLAF_HOME
from olaf.agents.AgentSystem import Agent, AgentSystem
from olaf.core.io_helpers import display, extract_python_code, format_execute_response
from olaf.rag.RetrievalAugmentedGeneration import RetrievalAugmentedGeneration
except ImportError as e:
print(f"Failed to import a required OLAF module: {e}", file=sys.stderr)
sys.exit(1)
Expand All @@ -39,6 +40,9 @@ def exec_code(self, code: str, timeout: int) -> dict:
_OUTPUTS_DIR = OLAF_HOME / "runs"
_SNIPPET_DIR = _OUTPUTS_DIR / "snippets"
_LEDGER_PATH = _OUTPUTS_DIR / f"benchmark_history_{datetime.utcnow().strftime('%Y%m%d-%H%M%S')}.jsonl"
_RAG_RE = re.compile(r"query_rag_<([^>]+)>")
RAG = RetrievalAugmentedGeneration()


def _init_paths():
"""Ensure output directories exist before writing."""
Expand All @@ -51,6 +55,11 @@ def detect_delegation(msg: str) -> Optional[str]:
m = _DELEG_RE.search(msg)
return f"delegate_to_{m.group(1)}" if m else None

def detect_rag(msg: str) -> Optional[str]:
"""Return the *partial* RAG command if present."""
m = _RAG_RE.search(msg)
return f"{m.group(1)}" if m else None

def _dump_code_snippet(run_id: str, code: str) -> str:
"""Write <run_id>.py under outputs/snippets/ and return the relative path."""
snippet_path = _SNIPPET_DIR / f"{run_id}.py"
Expand All @@ -69,7 +78,8 @@ def _save_benchmark_record(*, run_id: str, results: dict, meta: dict, code: str
record["code_path"] = _dump_code_snippet(run_id, code)
with _LEDGER_PATH.open("a") as fh:
fh.write(json.dumps(record) + "\n")



# --- Core Runner Functions ---
def run_benchmark(
console: Console,
Expand Down Expand Up @@ -187,7 +197,21 @@ def run_agent_session(
break

history.append({"role": "assistant", "content": msg})
display(console, f"assistant ({current_agent.name})", msg)
display(console, f"assistant ({current_agent.name})", msg)

# --- RAG handling ---
query_from_re = detect_rag(msg)
if query_from_re and current_agent.is_rag_enabled:
console.print(f"[yellow]🔍 Triggering RAG query: {query_from_re}[/yellow]")
retrieved_docs = RAG.query(query_from_re)
if retrieved_docs:
console.print(f"[green] RAG query successful. [/green]")
feedback = retrieved_docs
console.print(feedback)
history.append({"role": "system", "content": feedback})
else:
console.print(f"[red] RAG query unsuccessful. [/red]")


cmd = detect_delegation(msg)
if cmd and cmd in current_agent.commands:
Expand All @@ -211,8 +235,39 @@ def run_agent_session(
console.print("[cyan]Executing code in sandbox…[/cyan]")
exec_result = sandbox_manager.exec_code(code, timeout=300)
feedback = format_execute_response(exec_result, _OUTPUTS_DIR)
history.append({"role": "user", "content": feedback})
display(console, "user", feedback)
history.append({"role": "assistant", "content": feedback})
display(console, "assistant", feedback)

stderr = exec_result.get('stderr', '')
if stderr and current_agent.is_rag_enabled:
func_error_patterns = [
r"missing \d+ required positional argument", # TypeError: missing argument
r"NameError: name '(\w+)' is not defined", # NameError
r"AttributeError: '.*' object has no attribute '(\w+)'", # missing attribute
r"got an unexpected keyword argument" # wrong keyword argument
]
function_name = ""
retrieved_docs = ""
if any(re.search(pat, stderr) for pat in func_error_patterns):
lines = stderr.strip().splitlines()
if len(lines) >= 2:
code_line = lines[-2].strip() # second-to-last line: code that failed
match = re.search(r'(\w+)\s*\(', code_line)
if match:
function_name = match.group(1)

if function_name:
retrieved_docs = RAG.retrieve_function(function_name)
console.print(f"[yellow]🔍 Missing function detected: {function_name}, function database search...[/yellow]")
if retrieved_docs:
console.print(f"[green] Query successful - Function signature found. [/green]")
feedback += f"\n {function_name} produced an error. The correct function signature for {function_name} is:\n{retrieved_docs}"
console.print(feedback)
history.append({"role": "system", "content": feedback})
continue
else:
print(f"RAG Error Query unsuccessful - Function signature does not exist in the current database.")


if is_auto:
if benchmark_modules:
Expand Down
84 changes: 84 additions & 0 deletions cli/olaf/src/olaf/rag/RetrievalAugmentedGeneration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
import json
import sys
from pathlib import Path
from typing import List, Dict, Optional
from contextlib import redirect_stdout, redirect_stderr

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
# ── Dependencies ─────────────────────────────────────────────
try:
import re
from sentence_transformers import SentenceTransformer
from rich.console import Console
import matplotlib.pyplot as plt
import numpy as np

except ImportError as e:
print(f"Missing dependency: {e}", file=sys.stderr)
sys.exit(1)

# ── Paths and Constants ─────────────────────────────────────────────
console = Console()

RAG_DIR = Path(__file__).resolve().parent.parent / "rag"
EMBEDDING_FILE = RAG_DIR / "embeddings.jsonl"
FUNCTIONS_FILE = RAG_DIR / "functions.jsonl"

class RetrievalAugmentedGeneration():
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")

def __init__(self):
self.embeddings = self.load_embeddings()
self.functions = self.load_functions()
self.queries = []

def load_embeddings(self) -> List[np.ndarray]:
try:
with open(EMBEDDING_FILE, "r", encoding="utf-8") as f:
return [np.array(json.loads(line)) for line in f if line.strip()]
except FileNotFoundError:
console.log("[red]Embeddings file not found.")
return []
except json.JSONDecodeError:
console.log("[red]Embeddings file is not valid JSONL.")
return []

def load_functions(self) -> List[Dict[str, str]]:
try:
with open(FUNCTIONS_FILE, "r", encoding="utf-8") as f:
return [json.loads(line) for line in f if line.strip()]
except FileNotFoundError:
console.log("[red]Functions file not found.")
return []
except json.JSONDecodeError:
console.log("[red]Functions file is not valid JSONL.")
return []

@staticmethod
def cosine_similarity(A: np.ndarray, B: List[np.ndarray]) -> List[float]:
sims = [np.dot(A, emb) / (np.linalg.norm(A) * np.linalg.norm(emb)) for emb in B]
return sims

def retrieve_function(self, name:str) -> Optional[str]:
for function in self.functions:
if name in function["signature"]:
return function["signature"]
return None

def query(self, text_query: str) -> Optional[np.ndarray]:
self.queries.append(text_query)
if not self.embeddings:
console.log("[yellow]No embeddings to compare.")
return None
query_embedding = self.model.encode([text_query])[0]
sims = self.cosine_similarity(query_embedding, self.embeddings)
idx = np.argmax(sims)
return self.functions[idx]["signature"]

# ──────Implementation──────────────────────────────────────────────────────────

if __name__ == "__main__":
rag = RetrievalAugmentedGeneration()
print(rag.query("What is pca"))
Empty file.
44 changes: 44 additions & 0 deletions cli/olaf/src/olaf/rag/embeddings.jsonl

Large diffs are not rendered by default.

Loading