Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
de6a706
Add RAG module under benchmarking
varshinivij Jul 29, 2025
b374ba3
Imported RAG - first edit to MultiAgentTester
varshinivij Jul 29, 2025
0a3e09c
changes to files rag
varshinivij Jul 30, 2025
80cd5a5
RAG model working draft
varshinivij Jul 30, 2025
60eb425
improved format
varshinivij Jul 30, 2025
09ed4d7
HTTPclient change - unfixed
varshinivij Jul 30, 2025
f662f57
more prelim changes to the rag model
varshinivij Aug 5, 2025
e3214b6
initial working prototype revised
varshinivij Aug 5, 2025
80969dd
added in functions.json and embeddings.json
varshinivij Aug 5, 2025
a8a09f5
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Aug 5, 2025
c077adb
extracted from scib lib
varshinivij Aug 5, 2025
23c604f
a better working draft of the rag
varshinivij Aug 5, 2025
222073f
more changes to rag
varshinivij Aug 6, 2025
fdc9b3e
small change
varshinivij Aug 6, 2025
c734ab0
finally no errors!
varshinivij Aug 6, 2025
f0f28c8
more changes made
varshinivij Aug 6, 2025
c07ffbf
added new extractor class
varshinivij Aug 7, 2025
f070b93
trying to get scanpy to work
varshinivij Aug 7, 2025
c6a6b17
scanpy works
varshinivij Aug 7, 2025
9bf8fb9
working proto for scanpy and scib-metrics
varshinivij Aug 7, 2025
1528bb8
just added type annotations
varshinivij Aug 7, 2025
8cca721
trying to resolve json error
varshinivij Aug 7, 2025
3c7c72a
decent working ver before signing off
varshinivij Aug 7, 2025
cf6f151
files
varshinivij Aug 7, 2025
a5f07be
Deleted embeddings.json and functions.json
varshinivij Aug 7, 2025
6c58044
remove json
varshinivij Aug 7, 2025
5ed95fe
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Aug 7, 2025
0524a7f
making sure file is safe
varshinivij Aug 7, 2025
2227b67
fixed issue by changing to jsonl
varshinivij Aug 7, 2025
c418fd2
experimenting around with wikipedia lib
varshinivij Aug 12, 2025
c430373
fixeS?
varshinivij Aug 12, 2025
8a0e951
added umap with new prompts
varshinivij Aug 12, 2025
711518f
fixed umap
varshinivij Aug 12, 2025
637b0a5
reverted back to the technique with urls
varshinivij Aug 12, 2025
8e9882f
fixed error
varshinivij Aug 12, 2025
db87dcb
added some visualization in umap and heatmap
varshinivij Aug 13, 2025
a5ff28c
added fixes to rag file
varshinivij Aug 13, 2025
5c364a0
file finalized for 2day
varshinivij Aug 13, 2025
225bdfc
added variations to chunking
varshinivij Aug 15, 2025
36e7b49
created a series of images to test
varshinivij Aug 15, 2025
cf96dd5
varying wiki +description contents
varshinivij Aug 15, 2025
a7ce5cf
diff sizes of wiki page
varshinivij Aug 15, 2025
f8e9c84
more variations
varshinivij Aug 15, 2025
5741d6a
trying with a bigger embedding model
varshinivij Aug 16, 2025
5b06c44
trying wiki api
varshinivij Aug 17, 2025
dc672f3
working implementation -not helpful
varshinivij Aug 17, 2025
46980a5
trying to aggresively remove stuff from the wiki result
varshinivij Aug 17, 2025
db72f9c
trying to aggresively remove more
varshinivij Aug 17, 2025
741987f
fixing bugs
varshinivij Aug 17, 2025
d803887
trying to make a new ver work
varshinivij Aug 17, 2025
4922b9c
small syntax error
varshinivij Aug 17, 2025
b7c3337
draft
varshinivij Aug 17, 2025
c939dbc
switched out wiki lib for beautiful soup extraction + switched model
varshinivij Aug 17, 2025
5b8d9b8
working version yayy
varshinivij Aug 17, 2025
948fd68
rag file
varshinivij Aug 18, 2025
ce11610
changes
varshinivij Aug 18, 2025
aa53ab9
working version of the rag class system
varshinivij Aug 18, 2025
8e4413a
new file for user purposes!
varshinivij Aug 18, 2025
7d42210
changes to user and skeleton ver
varshinivij Aug 18, 2025
551143a
added a new function to extract wiki content
varshinivij Aug 18, 2025
b7bf85f
attempt to fix
varshinivij Aug 18, 2025
08e2c39
fixes and clean up
varshinivij Aug 18, 2025
6f21790
fixes and clean ups ongoing
varshinivij Aug 18, 2025
9dec5cc
file improvisations with request failsafe
varshinivij Aug 18, 2025
b5f8191
resolved critical errors
varshinivij Aug 18, 2025
ff4e569
testing
varshinivij Aug 18, 2025
a0b16c2
new fixes
varshinivij Aug 18, 2025
794a350
removed embedding and functions file
varshinivij Aug 18, 2025
1241348
fixing shennanigans
varshinivij Aug 18, 2025
11300ee
moved files
varshinivij Aug 18, 2025
7385d06
changes to file in type annotations
varshinivij Aug 18, 2025
d0f0e4b
made more fixes to rag class by adding 1 function for the entire pipe…
varshinivij Aug 18, 2025
a2f3a0c
took back some fixes that introduced errors
varshinivij Aug 18, 2025
1a47544
changed text to string type in func_Def
varshinivij Aug 18, 2025
e1fd82e
regex for improvements to extracting html
varshinivij Aug 18, 2025
74ea614
introduced more aggressive regex for cleaning func def
varshinivij Aug 18, 2025
1a0d504
added support for dict objects
varshinivij Aug 18, 2025
be19d84
type annotations changed + dict incorporated
varshinivij Aug 18, 2025
7c687ba
errors fixed
varshinivij Aug 18, 2025
4a0db06
quick fixes to rag file
varshinivij Aug 18, 2025
38b78ad
more changes for error correction
varshinivij Aug 18, 2025
3555e4b
deleted folder from wrong location
varshinivij Aug 19, 2025
cf270e1
moved locations for rag folder
varshinivij Aug 19, 2025
5c8d67c
improper folder placement
varshinivij Aug 19, 2025
b5f5cd7
moved locations
varshinivij Aug 19, 2025
b445e22
making changes to the runner.py file system
varshinivij Aug 19, 2025
1f395d5
moved rag
varshinivij Aug 19, 2025
e0d0bc9
rag + changes to runner
varshinivij Aug 19, 2025
cdb6400
one working version of implementation of rag - using agents
varshinivij Aug 19, 2025
841373f
dylan's proposed version of the implementation
varshinivij Aug 19, 2025
86f1239
attempts at rag implementation
varshinivij Aug 19, 2025
eff1b47
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Aug 19, 2025
3a12eef
Added rag support to agent system
djriffle Aug 19, 2025
e1f157c
query function from database by function signature search
varshinivij Aug 20, 2025
cb48c6e
changed function definition search to function signature search
varshinivij Aug 20, 2025
8cfcdc7
fixed rag implementation
varshinivij Aug 20, 2025
f362e8e
may have fixed import, need to consult and fix file
varshinivij Aug 20, 2025
d59c341
potentially error fix
varshinivij Aug 20, 2025
c6b056a
fixed imports
djriffle Aug 20, 2025
1a22185
working with dylans location of rag folder
varshinivij Aug 21, 2025
2ebc549
userrag file deemed unnecessary and deleted
varshinivij Aug 21, 2025
b0290c2
change to file names and locations - more clean up, fixed imports
varshinivij Aug 21, 2025
44f0629
user rag file changed
varshinivij Aug 21, 2025
8b5ea20
Update system_blueprint.json
varshinivij Aug 21, 2025
75ff4a9
changes to imports and file structures
varshinivij Aug 21, 2025
ed61005
Merge remote-tracking branch 'origin/AddingRetrievalAugmentedGenerati…
varshinivij Aug 21, 2025
8bca72a
finally fixed import situation
varshinivij Aug 21, 2025
1888051
tested code
varshinivij Aug 21, 2025
f89b899
trivial errors
varshinivij Aug 21, 2025
b1a5fba
trivial errors
varshinivij Aug 21, 2025
e0dc7c3
Update RetrievalAugmentedGeneration.py
varshinivij Aug 22, 2025
9850afe
Fixed rag implementation
djriffle Aug 22, 2025
2fa2c0a
small UX fixes
djriffle Aug 22, 2025
5d695e9
added in new embeddings
varshinivij Aug 23, 2025
e541a19
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Aug 23, 2025
fb50daf
Merge remote-tracking branch 'refs/remotes/origin/AddingRetrievalAugm…
varshinivij Aug 23, 2025
abe5960
embeddings and functions file created - however the search results fr…
varshinivij Aug 23, 2025
559dcd8
fixed embedding file structure
varshinivij Aug 23, 2025
ceda693
need to fix wikipedia
varshinivij Aug 23, 2025
88dcc35
restructured embeddings.jsonl to signature and embedding + added in q…
varshinivij Aug 23, 2025
b3d6913
Synced New Embeddings and Functions
djriffle Aug 28, 2025
243ff16
merge commit
varshinivij Sep 1, 2025
5527623
adding from remote
varshinivij Sep 1, 2025
88df487
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Sep 1, 2025
374fe23
Fixed implementation to include function name mismatch
varshinivij Sep 1, 2025
e225655
made changes to runner.py to allow comprehensive function signature d…
varshinivij Sep 1, 2025
2e0dc59
changes made to rag model
varshinivij Sep 2, 2025
a430147
Merge branch 'main' into AddingRetrievalAugmentedGeneration
varshinivij Sep 2, 2025
1fc155e
added functions and embeddings.jsonl
varshinivij Sep 2, 2025
f04066e
Update functions.jsonl
varshinivij Sep 3, 2025
93fb46b
Update functions.jsonl
varshinivij Sep 3, 2025
685166b
Update functions.jsonl
varshinivij Sep 3, 2025
538807c
embeddings and functions fixed
varshinivij Sep 3, 2025
54d7a63
added function to embed from functions.jsonl
varshinivij Sep 3, 2025
f2db568
fixed functions.jsonl
varshinivij Sep 3, 2025
ebd3699
fixed functions.jsonl
varshinivij Sep 3, 2025
3a30465
fixed incorrect version of embeddings with new embedding content
varshinivij Sep 3, 2025
33b37b9
Merge remote-tracking branch 'refs/remotes/origin/AddingRetrievalAugm…
varshinivij Sep 3, 2025
269dcaa
embeddings fixed in both folders
varshinivij Sep 3, 2025
4f0ff7e
Update RetrievalAugmentedGeneration.py
varshinivij Sep 6, 2025
7156a0e
Update integration_system.json
varshinivij Sep 6, 2025
79f62f7
Update AgentSystem.py
varshinivij Sep 6, 2025
5d669bc
Update integration_system.json
varshinivij Sep 6, 2025
3cb595d
Update runner.py
varshinivij Sep 6, 2025
9682e88
Update RetrievalAugmentedGeneration.py
varshinivij Sep 6, 2025
d38c6d2
Update AgentSystem.py
varshinivij Sep 6, 2025
811b5f4
Update RetrievalAugmentedGeneration.py
varshinivij Sep 6, 2025
76d4710
light cleanup
djriffle Sep 9, 2025
df02a7e
Fixed imports to improve cli speed
djriffle Sep 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,11 @@ def embedding_pipeline(self, url:str) -> None:
self.add_function({"signature": func_definition, "embedding": embedding_content})
else:
console.log(f"[yellow] Embedding for url {url} exists.")


def embedding_pipeline_functions(self):
for i in range(len(self.functions)):
embedding_content = self.functions[i]["embedding"]
self.add_embedding(embedding_content)

@staticmethod
def cosine_similarity(A: np.ndarray, B: List[np.ndarray]) -> List[float]:
Expand Down Expand Up @@ -277,58 +281,10 @@ def clear(self) -> None:

if __name__ == "__main__":
rag = RetrievalAugmentedEmbedder()
urls = [
"https://docs.scvi-tools.org/en/stable/api/reference/scvi.model.SCVI.html",
"https://docs.scvi-tools.org/en/stable/api/reference/scvi.model.SCANVI.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.calculate_qc_metrics.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.filter_cells.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.filter_genes.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.normalize_total.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.log1p.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.highly_variable_genes.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.regress_out.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.scale.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.pca.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.normalize_pearson_residuals.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.flag_gene_family.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.filter_highly_variable.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.harmony_integrate.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.scrublet.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.scrublet_simulate_doublets.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.neighbors.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.pp.bbknn.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.umap.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.tsne.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.diffmap.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.draw_graph.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.mde.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.embedding_density.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.louvain.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.leiden.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.kmeans.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.score_genes.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.score_genes_cell_cycle.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.tl.rank_genes_groups_logreg.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.get.aggregate.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.get.anndata_to_GPU.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.get.anndata_to_CPU.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.get.X_to_GPU.html",
"https://rapids-singlecell.readthedocs.io/en/latest/api/generated/rapids_singlecell.get.X_to_CPU.html",

# CellTypist
"https://celltypist.readthedocs.io/en/latest/celltypist.train.html",
"https://celltypist.readthedocs.io/en/latest/celltypist.annotate.html#celltypist.annotate",
"https://celltypist.readthedocs.io/en/latest/celltypist.dotplot.html",
"https://celltypist.readthedocs.io/en/latest/celltypist.models.download_models.html",
"https://celltypist.readthedocs.io/en/latest/celltypist.samples.downsample_adata.html",
"https://celltypist.readthedocs.io/en/latest/celltypist.classifier.AnnotationResult.html",
"https://celltypist.readthedocs.io/en/latest/celltypist.classifier.Classifier.html",
"https://celltypist.readthedocs.io/en/latest/celltypist.models.Model.html",
]

for url in urls:
rag.embedding_pipeline(url)
print(rag.query("What is pca"))
#rag.embedding_pipeline_functions()
print(rag.query("Find a function to download the model"))
print(rag.query("AttributeError: module 'celltypist.models' has no attribute 'download_model'"))
rag.cosine_distance_heatmap()



44 changes: 44 additions & 0 deletions cli/extra_tools/RAG/embeddings.jsonl

Large diffs are not rendered by default.

Loading