diff --git a/assets/med_data_k_graph.png b/assets/med_data_k_graph.png deleted file mode 100644 index babb781..0000000 Binary files a/assets/med_data_k_graph.png and /dev/null differ diff --git a/concepts/cloud-architecture.mdx b/concepts/cloud-architecture.mdx index c15a31d..cf0f331 100644 --- a/concepts/cloud-architecture.mdx +++ b/concepts/cloud-architecture.mdx @@ -15,7 +15,7 @@ This separation keeps your application management in the UI while all document d | Component | Primary role | Typical hosting | | --- | --- | --- | | Cloud UI | Auth, orgs, billing, app metadata, dashboards | Vercel (or your web host) | -| Morphik Core | Ingestion, storage, retrieval, search, graphs, chat | EC2 or Kubernetes | +| Morphik Core | Ingestion, storage, retrieval, search, chat | EC2 or Kubernetes | | Embedding GPU (optional) | Multimodal embeddings (ColPali API mode) | Lambda GPU, on-prem GPU | | Postgres + pgvector | Documents, embeddings, app isolation | Neon or any Postgres | | Object storage | Raw files and chunk payloads | S3 or local disk | @@ -102,4 +102,3 @@ Agent mode runs in a server route (Cloud UI) so it can call your LLM provider se - The UI calls `/api/agent/chat` on the Cloud UI. - The server route calls Morphik Core for retrieval (using the app token). - The server route streams the LLM response back to the browser. - diff --git a/concepts/colpali.mdx b/concepts/colpali.mdx index a79d638..71f1108 100644 --- a/concepts/colpali.mdx +++ b/concepts/colpali.mdx @@ -5,7 +5,7 @@ description: 'Using Late-interaction and Contrastive learning to achieve state-o ## Introduction -Upto now, we've seen RAG techniques that **i)** parse a given document, **ii)** convert it to text, and **iii)** embed the text for retrieval. These techniques have been particularly text-heavy. Embedding models expect text in, knowledge graphs expect text in, and parsers break down when provided with documents that aren't text-dominant. This motivates the question: +Upto now, we've seen RAG techniques that **i)** parse a given document, **ii)** convert it to text, and **iii)** embed the text for retrieval. These techniques have been particularly text-heavy. Embedding models expect text in, and parsers break down when provided with documents that aren't text-dominant. This motivates the question: > When was the last time you looked at a document and only saw text? @@ -57,7 +57,7 @@ from morphik import Morphik db = Morphik("YOUR-URI-HERE") -db.ingest_file("report_with_images_and_graphs.pdf", use_colpali=True) +db.ingest_file("report_with_images_and_charts.pdf", use_colpali=True) ``` Here is an example query pathway: @@ -109,5 +109,3 @@ If you're experiencing context limit issues with image-based retrieval, it may b - - diff --git a/concepts/knowledge-graphs.mdx b/concepts/knowledge-graphs.mdx deleted file mode 100644 index 374da16..0000000 --- a/concepts/knowledge-graphs.mdx +++ /dev/null @@ -1,492 +0,0 @@ ---- -title: 'Knowledge Graphs and Graph RAG' -description: 'Leveraging graph-based relationships for improved context and retrieval in RAG systems' ---- - -## Introduction -Traditional Retrieval-Augmented Generation (RAG) systems typically use vector-based similarity searches to find relevant documents. While effective for straightforward queries, vector searches often struggle with more nuanced information needs that involve understanding connections between entities dispersed across multiple documents. - -That's where knowledge graphs come into play. Unlike traditional vector-based approaches, knowledge graphs explicitly capture entities and their relationships, uncovering connections that otherwise might be missed. - -Consider three simple documents: - -1. "Elon Musk is the CEO of SpaceX." -2. "Starship is a spacecraft developed by SpaceX, designed for missions to Mars." -3. "Tesla produces electric vehicles, and Elon Musk serves as its CEO." - -If a user queries, "Who leads the companies involved in Mars exploration, and what other companies does this individual lead?", a traditional vector search might only identify the second document about mars, and space, potentially overlooking Elon Musk's relationship with Tesla and SpaceX. In contrast, a knowledge graph explicitly represents these interconnected relationships, providing a comprehensive, context-rich answer by traversing connections across all three documents. Let's dig into why and how? - -## Core Concepts - -### What is a Knowledge Graph? - -A knowledge graph is a structured representation of information that consists of: - -- **Entities**: Distinct objects, concepts, or things (e.g., people, organizations, products, technologies) -- **Relationships**: Connections between entities that describe how they relate to each other -- **Properties** (optional): Additional attributes that describe entities or relationships - -The example we will build will finally look like: - -![Knowledge Graph Visualization](/assets/med_data_k_graph.png) - -## Implementation in Morphik - -Morphik's knowledge graph implementation is built on several core components: - -### Entity and Relationship Extraction - -When you create a knowledge graph, Morphik processes your documents to extract entities and relationships. Entities and relationships are extracted for every chunk of the documents requested for creation. This is implemented in the `GraphService` class: - -```python -async def extract_entities_from_text(self, content: str, doc_id: str, chunk_number: int) -> Tuple[List[Entity], List[Relationship]]: - """Extract entities and relationships from text content using the LLM.""" - # Process content using a language model - # Returns structured data with entities and relationships -``` - -The system uses language models to identify entities and relationships between entities of various types: - -- People (e.g., "Sam Altman") -- Organizations (e.g., "OpenAI") -- Locations (e.g., "San Francisco") -- Technologies (e.g., "Machine Learning") -- Concepts (e.g., "Retrieval Augmented Generation") -- Products (e.g., "GPT-4") -- Events (e.g., "AI Conference 2025") -- And more... - -### Entity Resolution - -One challenge with extracting entities from text is that the same entity might be referenced in different ways. For example, "Sam Altman", "Samuel H. Altman", and "OpenAI CEO" might all refer to the same person. - -Morphik addresses this with entity resolution, implemented in the `EntityResolver` class: - -```python -async def resolve_entities(self, entities: List[Entity]) -> Tuple[List[Entity], Dict[str, str]]: - """Resolves entities by identifying and grouping similar entities.""" - # Use LLM to identify variants of the same entity - # Create mapping from variants to canonical forms - # Merge properties of duplicate entities -``` - -This ensures that the knowledge graph accurately represents unique entities and their relationships, even when they're referenced inconsistently across documents. - -### Custom Prompts and Examples - -Knowledge graphs in Morphik can be enhanced with custom prompts and examples to improve entity extraction and resolution: - -- **Entity Extraction Customization**: You can provide examples and custom prompt templates to guide how entities and relationships are identified, making the extraction process more domain-specific. -- **Entity Resolution Customization**: You can provide examples of how entity variants should be recognized and merged, helping the system correctly identify when different terms refer to the same entity. - -This customization allows for more precise knowledge graph creation, ensuring that the extracted entities and relationships align with your specific information needs and domain terminology. - -Here's an example of creating a knowledge graph with custom prompts and examples: - -```python -from morphik import Morphik -from morphik.models import ( - EntityExtractionExample, - EntityResolutionExample, - EntityExtractionPromptOverride, - EntityResolutionPromptOverride, - GraphPromptOverrides -) - -# Connect to Morphik -db = Morphik() - -# Create a knowledge graph with custom entity extraction and resolution -graph = db.create_graph( - name="medical_knowledge_graph", - filters={"domain": "medical"}, - prompt_overrides=GraphPromptOverrides( - # Customize entity extraction - entity_extraction=EntityExtractionPromptOverride( - # Custom examples to guide entity extraction toward medical domain - examples=[ - EntityExtractionExample( - label="Type 2 Diabetes", - type="CONDITION" - ), - EntityExtractionExample( - label="Metformin", - type="MEDICATION", - properties={"class": "biguanide"} - ), - EntityExtractionExample( - label="Cardiovascular Disease", - type="CONDITION" - ) - ], - # Optional custom prompt template - prompt_template=( - "Extract medical entities and relationships from the following text.\n" - "Focus on conditions, medications, treatments, and healthcare providers.\n" - "{examples}\n\n" - "Text to analyze:\n{content}\n\n" - "Return your analysis as JSON with 'entities' and 'relationships' arrays." - ) - ), - # Customize entity resolution - entity_resolution=EntityResolutionPromptOverride( - # Examples of how entity variants should be resolved - examples=[ - EntityResolutionExample( - canonical="Type 2 Diabetes", - variants=["T2DM", "type 2 diabetes", "Type II Diabetes"] - ), - EntityResolutionExample( - canonical="Metformin", - variants=["Glucophage", "metformin hydrochloride"] - ) - ] - ) - ) -) - -# Query the graph with relationship paths -response = db.query( - "What medications are used to treat diabetes and what complications are associated with it?", - graph_name="medical_knowledge_graph", - hop_depth=2, # Follow connections up to 2 relationships away - include_paths=True # Include relationship paths in response -) -``` - -In this example, we provide custom guidance for both entity extraction and entity resolution: - -1. **Entity Extraction**: We provide examples of medical entities to extract (conditions and medications) and a custom prompt template that focuses the extraction on medical terminology. - -2. **Entity Resolution**: We provide examples of how different terms for the same medical concept should be merged (e.g., "T2DM" and "Type II Diabetes" should be treated as "Type 2 Diabetes"). - -This approach is particularly valuable for specialized domains with complex terminology where the default extraction might miss important entities or relationships. - -### Graph Construction and Storage - -The extracted entities and relationships are stored in a graph structure: - -```python -class Entity(BaseModel): - """Represents an entity in a knowledge graph""" - id: str - label: str - type: str - properties: Dict[str, Any] - document_ids: List[str] - chunk_sources: Dict[str, List[int]] - -class Relationship(BaseModel): - """Represents a relationship between entities in a knowledge graph""" - id: str - source_id: str - target_id: str - type: str - document_ids: List[str] - chunk_sources: Dict[str, List[int]] - -class Graph(BaseModel): - """Represents a knowledge graph""" - id: str - name: str - entities: List[Entity] - relationships: List[Relationship] - metadata: Dict[str, Any] - document_ids: List[str] - filters: Optional[Dict[str, Any]] - # Additional fields... -``` - -Each entity and relationship maintains references to the documents and chunks where they were found, enabling the system to retrieve the original context when needed. - -### Graph-Enhanced Retrieval - -When querying with a knowledge graph, Morphik enhances the retrieval process: - -```mermaid -sequenceDiagram - participant User - participant System - participant VectorSearch - participant GraphProcessor - participant LLM - - User->>System: Query - System->>VectorSearch: Standard vector search - System->>GraphProcessor: Extract entities from query - GraphProcessor->>GraphProcessor: Find matching entities in graph - GraphProcessor->>GraphProcessor: Expand to related entities - GraphProcessor->>System: Return graph-based matches - System->>System: Combine vector and graph results - System->>LLM: Generate completion with enhanced context - LLM->>User: Response -``` - -This process is implemented in the `query_with_graph` method: - -```python -async def query_with_graph( - self, - query: str, - graph_name: str, - auth: AuthContext, - document_service, - filters: Optional[Dict[str, Any]] = None, - k: int = 20, - hop_depth: int = 1, - include_paths: bool = False, - # Other parameters... -) -> CompletionResponse: - """Generate completion using knowledge graph-enhanced retrieval.""" - # Extract entities from query - # Find similar entities in graph - # Expand to related entities (up to hop_depth) - # Retrieve chunks containing these entities - # Combine with vector search results - # Generate completion with enhanced context -``` - -The `hop_depth` parameter controls how far to traverse the graph from the initial entities, allowing you to balance between focused and comprehensive retrieval. - -## Using Knowledge Graphs in Morphik - -### Creating a Knowledge Graph - -You can create a knowledge graph from your documents using the Python SDK or the UI component. I'll show the SDK below, UI should be simpler. - -```python -from morphik import Morphik - -# Connect to Morphik -db = Morphik() - -# Create a knowledge graph from all documents with category "tech" -graph = db.create_graph( - name="tech_knowledge_graph", - filters={"category": "tech"} -) - -# Alternatively, create from specific documents -graph = db.create_graph( - name="project_knowledge_graph", - documents=["doc_id_1", "doc_id_2", "doc_id_3"] -) -``` - -Behind the scenes, Morphik: -1. Retrieves the matching documents -2. Processes each document to extract entities and relationships -3. Performs entity resolution to eliminate duplicates -4. Constructs the graph and saves it - -### Querying with a Knowledge Graph - -Once you've created a knowledge graph, you can use it to enhance your queries: - -```python -# Basic query with a knowledge graph -response = db.query( - "How is AI technology being used in healthcare?", - graph_name="tech_knowledge_graph" -) - -# Advanced query with custom hop depth and path information -response_with_paths = db.query( - "What technologies are used for analyzing electronic health records?", - graph_name="tech_knowledge_graph", - hop_depth=2, # Consider connections up to 2 hops away - include_paths=True # Include explanation of entity relationships -) - -# If path information is included, it will be in the response metadata -if response_with_paths.metadata and "graph" in response_with_paths.metadata: - print("\nGraph paths found:") - for path in response_with_paths.metadata["graph"]["paths"]: - print(" -> ".join(path)) -``` - -The `hop_depth` parameter determines how far to traverse the graph from the initial entities found in the query. A higher value casts a wider net but may include less relevant information. - -When `include_paths=True`, the response includes the paths through the graph that led to the retrieved documents, providing explainability for why certain information was included. - -## Example: Building a Healthcare Knowledge Graph - -Let's walk through a complete example of using knowledge graphs for a healthcare application: - -```python -import os -from morphik import Morphik - -# Connect to Morphik -db = Morphik() - -# Ingest healthcare documents -doc1 = db.ingest_file("medical_research.pdf", metadata={"domain": "healthcare", "type": "research"}) -doc2 = db.ingest_file("patient_data.pdf", metadata={"domain": "healthcare", "type": "clinical"}) -doc3 = db.ingest_file("treatment_protocols.pdf", metadata={"domain": "healthcare", "type": "protocol"}) - -# Create a healthcare knowledge graph -graph = db.create_graph( - name="healthcare_knowledge_graph", - filters={"domain": "healthcare"} -) - -print(f"Created graph with {len(graph.entities)} entities and {len(graph.relationships)} relationships") - -# Query using the knowledge graph -response = db.query( - "What treatments are effective for patients with diabetes and hypertension?", - graph_name="healthcare_knowledge_graph", - hop_depth=2, - include_paths=True -) - -print("\nResponse:") -print(response.completion) - -# Display relationship paths that informed the response -if response.metadata and "graph" in response.metadata: - print("\nEvidence paths:") - for path in response.metadata["graph"]["paths"]: - print(" -> ".join(path)) -``` - -In this example, the knowledge graph might identify entities like: -- Conditions: "Diabetes", "Hypertension" -- Treatments: "Insulin", "ACE inhibitors", "Lifestyle modifications" -- Outcomes: "Blood sugar control", "Blood pressure reduction" - -And relationships like: -- "Insulin" -> "treats" -> "Diabetes" -- "ACE inhibitors" -> "treats" -> "Hypertension" -- "Diabetes" -> "comorbid with" -> "Hypertension" -- "Lifestyle modifications" -> "improves" -> "Blood sugar control" -- "Lifestyle modifications" -> "improves" -> "Blood pressure reduction" - -The graph traversal might find that "Lifestyle modifications" is effective for both conditions, even if that connection wasn't explicitly stated in a single document. - -## Graph Visualization - -When working with knowledge graphs, visualization can provide valuable insights into the structure and connections within your data. - -![Knowledge Graph Visualization](/assets/med_data_k_graph.png) - - -### Updating Existing Graphs - -As your document collection grows, you can update existing graphs: - -```python -# Add new documents to an existing graph -updated_graph = db.update_graph( - name="tech_knowledge_graph", - additional_documents=["new_doc_id_1", "new_doc_id_2"] -) - -# Or add documents matching new filters -updated_graph = db.update_graph( - name="tech_knowledge_graph", - additional_filters={"source": "research_papers"} -) -``` - -Without any arguments, the function will check if something for the filter has been updated and if so will add the docs to the graph. - -## Even more Implementation Details (for the nerds) - -### Graph Traversal Algorithm - -The core of knowledge graph querying is the entity expansion algorithm, which traverses the graph to find related entities: - -```python -def _expand_entities(self, graph: Graph, seed_entities: List[Entity], hop_depth: int) -> List[Entity]: - """Expand entities by traversing relationships with improved connectivity.""" - if hop_depth <= 1: - return seed_entities - - # Create a set of entity IDs we've seen - seen_entity_ids = {entity.id for entity in seed_entities} - all_entities = list(seed_entities) - - # Create indices for efficient lookup - entity_map = {entity.id: entity for entity in graph.entities} - relationship_index = self._build_relationship_index(graph.relationships) - - # For each hop - for _ in range(hop_depth - 1): - new_entities = [] - - # For each entity we've found so far - for entity in all_entities: - # Find connected entities through relationships - connected_ids = self._get_connected_entity_ids( - relationship_index.get(entity.id, []), entity.id, seen_entity_ids - ) - - # Add new connected entities - for entity_id in connected_ids: - if target_entity := entity_map.get(entity_id): - new_entities.append(target_entity) - seen_entity_ids.add(entity_id) - - # Add new entities to our list - all_entities.extend(new_entities) - - # Stop if no new entities found - if not new_entities: - break - - return all_entities -``` - -This algorithm efficiently expands from the initial entities found in the query to related entities, gathering relevant context for the retrieval process. - -### Entity Resolution - -The entity resolution process is crucial for maintaining a clean, accurate knowledge graph: - -```python -async def _resolve_with_llm(self, entity_labels: List[str]) -> List[Dict[str, Any]]: - """Use LLM to resolve entities by identifying and grouping similar entities.""" - # Group similar entity labels using a language model - # Example output: - # [ - # {"canonical": "OpenAI", "variants": ["OpenAI Inc.", "OpenAI Corporation"]}, - # {"canonical": "GPT-4", "variants": ["GPT4", "GPT 4", "OpenAI GPT-4"]} - # ] -``` - -This approach allows the system to recognize different references to the same entity, improving retrieval accuracy. - -## Performance Considerations - -Knowledge graph operations involve several performance considerations: - -1. **Graph Creation Time**: Creating a graph involves processing all documents with LLMs, which can be time-consuming for large document collections. - -2. **Query Processing Overhead**: Graph-enhanced retrieval requires extra processing compared to standard vector search but often produces more comprehensive results. - -3. **Graph Size**: As the number of entities and relationships grows, memory usage increases, and traversal operations may become more expensive. - -Performance tips: - -- Use metadata filters to create focused graphs rather than one large graph for all documents -- Start with smaller hop depths (1 or 2) and increase only if needed -- Consider the tradeoff between processing time and retrieval quality - -## Conclusion - -Knowledge graphs in Morphik provide a powerful way to enhance retrieval by capturing and leveraging relationships between entities in your documents. By combining traditional vector search with graph-based retrieval, Morphik delivers more comprehensive and contextually relevant information for complex queries. - -Whether you're building applications in healthcare, finance, research, or any domain with complex information relationships, knowledge graphs can significantly improve the quality of information retrieval and generation. - -## Next Steps - -To get started with knowledge graphs in your Morphik applications: - -1. Review your document collection and identify domains that would benefit from relationship-aware retrieval -2. Create focused knowledge graphs for these domains -3. Experiment with different hop depths and query formulations -4. Consider including path information to understand how the system connects information - -For more advanced use cases, explore combining knowledge graphs with other Morphik features like ColPali for multi-modal retrieval. diff --git a/concepts/metadata-filtering.mdx b/concepts/metadata-filtering.mdx index 98a433a..df2886c 100644 --- a/concepts/metadata-filtering.mdx +++ b/concepts/metadata-filtering.mdx @@ -3,7 +3,7 @@ title: "Metadata Filtering" description: "Canonical reference for Morphik’s metadata filter DSL and typed comparisons." --- -Morphik lets you filter documents and chunks directly in the database using a concise JSON filter syntax. The same structure powers the REST API, Python SDK (sync + async), folder helpers, `UserScope`, and knowledge-graph builders, so you can define a filter once and reuse it everywhere. +Morphik lets you filter documents and chunks directly in the database using a concise JSON filter syntax. The same structure powers the REST API, Python SDK (sync + async), folder helpers, and `UserScope`, so you can define a filter once and reuse it everywhere. Prefer server-side filters over client-side post-processing. You’ll reduce bandwidth, improve performance, and keep behavior consistent between endpoints. @@ -14,7 +14,7 @@ Morphik lets you filter documents and chunks directly in the database using a co You can pass `filters` (or `document_filters`) to: - Retrieval endpoints: [`retrieve_chunks`](/python-sdk/retrieve_chunks), [`retrieve_docs`](/python-sdk/retrieve_docs), [`query`](/python-sdk/query), [`query_document`](/python-sdk/query_document) ingestion options. -- Listing/management: [`list_documents`](/python-sdk/list_documents), document/folder analytics, graph create/update, chat history, and anywhere an SDK method exposes a `filters` argument. +- Listing/management: [`list_documents`](/python-sdk/list_documents), document/folder analytics, chat history, and anywhere an SDK method exposes a `filters` argument. ## Quick Start diff --git a/concepts/naive-rag.mdx b/concepts/naive-rag.mdx index a0fe960..a2e95c0 100644 --- a/concepts/naive-rag.mdx +++ b/concepts/naive-rag.mdx @@ -20,7 +20,7 @@ Note how both answers recognized the issue correctly, but since the LLM had addi While the core concept itself is quite obvious, the complexity arises in _how_ we can effectively retrieve the correct information. In the following sections, we explain one way to effectively perform RAG based on the concept of vector embeddings and similarity search (we'll explain what these mean\!). - In reality, Morphik uses a combination of different RAG techniques to achieve the best solution. We intend to talk about each of the techniques we implement in the [concepts](/concepts/) section of our documentation. If you're looking for a particular RAG technique, such as [ColPali](/concepts/colpali) or [Knowledge Graphs](/concepts/knowledge-graphs), you'll find it there. In this explainer, however, we'll restrict ourselves to talk about single vector-search based retrieval. + In reality, Morphik uses a combination of different RAG techniques to achieve the best solution. In this explainer, we’ll restrict ourselves to single vector-search based retrieval. ## How does RAG work? @@ -95,4 +95,4 @@ In future articles, we'll delve deeper into specific RAG techniques, discuss opt ### Generate -## How can we implement RAG? */} \ No newline at end of file +## How can we implement RAG? */} diff --git a/cookbooks/knowledge-graphs.mdx b/cookbooks/knowledge-graphs.mdx deleted file mode 100644 index c4b638c..0000000 --- a/cookbooks/knowledge-graphs.mdx +++ /dev/null @@ -1,8 +0,0 @@ ---- -title: 'Knowledge Graph - Based RAG' -description: 'A collection of recipes to help you get started with Morphik' ---- - -## Introduction - -Morphik is a multi-modal RAG system that allows you to store, search, and retrieve data from various sources. It is built on top of the [LangChain](https://www.langchain.com/) framework and the [Chroma](https://www.chromadb.org/) vector store. \ No newline at end of file diff --git a/docs.json b/docs.json index 86440ea..acbc237 100644 --- a/docs.json +++ b/docs.json @@ -73,7 +73,6 @@ "concepts/user-folder-scoping", "concepts/cloud-architecture", "concepts/colpali", - "concepts/knowledge-graphs", "concepts/metadata-filtering" ] }, @@ -134,18 +133,20 @@ "tab": "Python SDK", "groups": [ { - "group": "Client", - "pages": [ - "python-sdk/morphik", - "python-sdk/close" - ] - }, + "group": "Client", + "pages": [ + "python-sdk/morphik", + "python-sdk/signin", + "python-sdk/close" + ] + }, { "group": "Document Ingestion", "pages": [ "python-sdk/ingest_text", "python-sdk/ingest_file", "python-sdk/ingest_files", + "python-sdk/ingest_directory", "python-sdk/query_document" ] }, @@ -169,6 +170,12 @@ "python-sdk/create_folder", "python-sdk/list_folders", "python-sdk/get_folder", + "python-sdk/get_folder_by_name", + "python-sdk/get_info", + "python-sdk/get_summary", + "python-sdk/upsert_summary", + "python-sdk/get_folder_summary", + "python-sdk/upsert_folder_summary", "python-sdk/get_folders_summary", "python-sdk/get_folders_details", "python-sdk/add_document_to_folder", @@ -185,7 +192,9 @@ "python-sdk/update_document_metadata", "python-sdk/update_document_by_filename_with_text", "python-sdk/update_document_by_filename_with_file", - "python-sdk/update_document_by_filename_metadata" + "python-sdk/update_document_by_filename_metadata", + "python-sdk/get_document_summary", + "python-sdk/upsert_document_summary" ] }, { @@ -195,19 +204,6 @@ "python-sdk/batch_get_chunks" ] }, - { - "group": "Knowledge Graph Operations", - "pages": [ - "python-sdk/create_graph", - "python-sdk/update_graph", - "python-sdk/get_graph", - "python-sdk/list_graphs", - "python-sdk/get_graph_visualization", - "python-sdk/get_graph_status", - "python-sdk/wait_for_graph_completion", - "python-sdk/check_workflow_status" - ] - }, { "group": "Chat & Conversation Management", "pages": [ @@ -222,15 +218,29 @@ "python-sdk/extract_document_pages", "python-sdk/get_document_download_url", "python-sdk/get_document_status", + "python-sdk/wait_for_document_completion", "python-sdk/delete_document", "python-sdk/delete_document_by_filename" ] }, { - "group": "Usage & Monitoring", + "group": "Apps & Tokens", + "pages": [ + "python-sdk/list_apps", + "python-sdk/create_app", + "python-sdk/generate_cloud_uri", + "python-sdk/rename_app", + "python-sdk/rotate_app_token", + "python-sdk/delete_app" + ] + }, + { + "group": "Ops & Monitoring", "pages": [ - "python-sdk/get_usage_stats", - "python-sdk/get_recent_usage", + "python-sdk/requeue_ingestion_jobs", + "python-sdk/get_logs", + "python-sdk/get_health", + "python-sdk/get_app_storage_usage", "python-sdk/ping" ] } diff --git a/images/ui-guide/knowledge-graph.png b/images/ui-guide/knowledge-graph.png deleted file mode 100644 index 63ae678..0000000 Binary files a/images/ui-guide/knowledge-graph.png and /dev/null differ diff --git a/knowledge-base/how-to-perform-search-over-documents.mdx b/knowledge-base/how-to-perform-search-over-documents.mdx index d81ee83..7e5a43e 100644 --- a/knowledge-base/how-to-perform-search-over-documents.mdx +++ b/knowledge-base/how-to-perform-search-over-documents.mdx @@ -3,7 +3,7 @@ title: 'How do I perform search over documents?' description: 'Techniques for searching document collections efficiently' --- -Effective document search relies on representing your data in a way that captures meaning. Common approaches include keyword search, vector similarity search, and semantic graph traversal. +Effective document search relies on representing your data in a way that captures meaning. Common approaches include keyword search and vector similarity search. With Morphik, you can ingest text, images, and other modalities. Use the `retrieve_docs` function for a simple vector similarity search or `query` to combine retrieval with language model generation: @@ -29,4 +29,4 @@ print(answer.text) **A:** Pass a `filters` dictionary when calling `retrieve_docs` or `query`, e.g. `filters={"category": "finance"}`, to restrict results to documents with matching metadata. - **Q:** When should I use `query` instead of `retrieve_docs`? - **A:** Use `query` when you need the language model to read the retrieved docs and generate a synthesized answer; use `retrieve_docs` when you only need the raw documents. \ No newline at end of file + **A:** Use `query` when you need the language model to read the retrieved docs and generate a synthesized answer; use `retrieve_docs` when you only need the raw documents. diff --git a/python-sdk/check_workflow_status.mdx b/python-sdk/check_workflow_status.mdx deleted file mode 100644 index ab58fe9..0000000 --- a/python-sdk/check_workflow_status.mdx +++ /dev/null @@ -1,37 +0,0 @@ ---- -title: "check_workflow_status" -description: "Poll the status of an asynchronous graph build/update workflow" ---- - - - - - ```python - def check_workflow_status(workflow_id: str, run_id: Optional[str] = None) -> Dict[str, Any] - ``` - - - - - ```python - async def check_workflow_status(workflow_id: str, run_id: Optional[str] = None) -> Dict[str, Any] - ``` - - - - -## Parameters - -- `workflow_id` (str): Identifier returned when the graph build/update was started. -- `run_id` (str, optional): For multi-run workflows, specify a particular run. - -## Returns - -- `Dict[str, Any]`: At minimum contains `status` (`"running"`, `"completed"` or `"failed"`). Additional keys such as `result` may be included when finished. - -## Example - -```python -status = db.check_workflow_status("build-update-research_graph-abc123") -print(status["status"]) -``` \ No newline at end of file diff --git a/python-sdk/create_app.mdx b/python-sdk/create_app.mdx new file mode 100644 index 0000000..0126699 --- /dev/null +++ b/python-sdk/create_app.mdx @@ -0,0 +1,57 @@ +--- +title: "create_app" +description: "Create a cloud app and return its authenticated URI" +--- + + + + ```python + def create_app( + name: str, + ) -> Dict[str, str] + ``` + + + ```python + async def create_app( + name: str, + ) -> Dict[str, str] + ``` + + + +## Parameters + +- `name` (str): App display name + +## Returns + +- `Dict[str, str]`: Response containing the authenticated URI and app metadata + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + resp = db.create_app(name="demo") + print(resp) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + resp = await db.create_app(name="demo") + print(resp) + ``` + + + +## Notes + +- `generate_cloud_uri` is a deprecated alias for this method. +- The SDK only accepts `name` for app creation. diff --git a/python-sdk/create_graph.mdx b/python-sdk/create_graph.mdx deleted file mode 100644 index c701eb7..0000000 --- a/python-sdk/create_graph.mdx +++ /dev/null @@ -1,217 +0,0 @@ ---- -title: "create_graph" -description: "Create a graph from documents" ---- - - - - ```python - def create_graph( - name: str, - filters: Optional[Dict[str, Any]] = None, - documents: Optional[List[str]] = None, - prompt_overrides: Optional[Union[GraphPromptOverrides, Dict[str, Any]]] = None, - folder_name: Optional[Union[str, List[str]]] = None, - end_user_id: Optional[str] = None, - ) -> Graph - ``` - - - ```python - async def create_graph( - name: str, - filters: Optional[Dict[str, Any]] = None, - documents: Optional[List[str]] = None, - prompt_overrides: Optional[Union[GraphPromptOverrides, Dict[str, Any]]] = None, - folder_name: Optional[Union[str, List[str]]] = None, - end_user_id: Optional[str] = None, - ) -> Graph - ``` - - - -## Parameters - -- `name` (str): Name of the graph to create -- `filters` (Dict[str, Any], optional): Optional metadata filters to determine which documents to include -- `documents` (List[str], optional): Optional list of specific document IDs to include -- `prompt_overrides` (GraphPromptOverrides | Dict[str, Any], optional): Optional customizations for entity extraction and resolution prompts -- `folder_name` (str | List[str], optional): Optional folder scope (canonical path or list of paths/names) -- `end_user_id` (str, optional): Optional end-user scope - -## Returns - -Calling `create_graph` now returns a *placeholder* `Graph` immediately. - -- `graph` (Graph): Graph stub with `system_metadata["status"] = "processing"`. - Entities and relationships will be empty until processing completes. - -Use `db.wait_for_graph_completion("")` (sync) or -`await db.wait_for_graph_completion("")` (async) to block until the -graph is done, or poll `graph.is_processing` / `graph.is_completed`. - -## Examples - - - - ```python - from morphik import Morphik - - db = Morphik() - - # Start graph creation – returns immediately with status "processing" - graph = db.create_graph( - name="research_graph", - filters={"category": "research"}, - folder_name="/projects/alpha", - ) - - # Option 1: Block until finished - graph = db.wait_for_graph_completion("research_graph") - - # Option 2: Poll periodically - while graph.is_processing: - time.sleep(10) - graph = db.get_graph("research_graph") - print("Entities:", len(graph.entities)) - - # Create a graph from specific documents - graph = db.create_graph( - name="custom_graph", - documents=["doc1", "doc2", "doc3"] - ) - - # With custom entity extraction examples - from morphik.models import EntityExtractionPromptOverride, EntityExtractionExample, GraphPromptOverrides - - # Example with only entity extraction examples - graph = db.create_graph( - name="medical_graph", - filters={"category": "medical"}, - prompt_overrides=GraphPromptOverrides( - entity_extraction=EntityExtractionPromptOverride( - examples=[ - EntityExtractionExample(label="Insulin", type="MEDICATION"), - EntityExtractionExample(label="Diabetes", type="CONDITION") - ] - ) - ) - ) - - # Example with custom entity extraction prompt template and examples - graph = db.create_graph( - name="financial_graph", - documents=["doc1", "doc2"], - prompt_overrides=GraphPromptOverrides( - entity_extraction=EntityExtractionPromptOverride( - prompt_template="Extract financial entities from the following text:\n\n{content}\n\nFocus on these types of entities:\n{examples}\n\nReturn in JSON format.", - examples=[ - EntityExtractionExample(label="Apple Inc.", type="COMPANY", properties={"sector": "Technology"}), - EntityExtractionExample(label="Q3 2024", type="TIME_PERIOD"), - EntityExtractionExample(label="Revenue Growth", type="METRIC") - ] - ), - entity_resolution=EntityResolutionPromptOverride( - examples=[ - EntityResolutionExample( - canonical="Apple Inc.", - variants=["Apple", "AAPL", "Apple Computer"] - ) - ] - ) - ) - ) - - print(f"Created graph with {len(graph.entities)} entities and {len(graph.relationships)} relationships") - ``` - - - ```python - from morphik import AsyncMorphik - - async with AsyncMorphik() as db: - # Start graph creation (returns quickly) - graph = await db.create_graph( - name="research_graph", - filters={"category": "research"}, - folder_name="/projects/alpha", - ) - - # Wait for completion - graph = await db.wait_for_graph_completion("research_graph") - - print("Entities:", len(graph.entities)) - - # Create a graph from documents with category="research" - graph = await db.create_graph( - name="research_graph", - filters={"category": "research"} - ) - - # Create a graph from specific documents - graph = await db.create_graph( - name="custom_graph", - documents=["doc1", "doc2", "doc3"] - ) - - # With custom entity extraction examples - from morphik.models import EntityExtractionPromptOverride, EntityExtractionExample, GraphPromptOverrides - - # Example with only entity extraction examples - graph = await db.create_graph( - name="medical_graph", - filters={"category": "medical"}, - prompt_overrides=GraphPromptOverrides( - entity_extraction=EntityExtractionPromptOverride( - examples=[ - EntityExtractionExample(label="Insulin", type="MEDICATION"), - EntityExtractionExample(label="Diabetes", type="CONDITION") - ] - ) - ) - ) - - # Example with custom entity extraction prompt template and examples - graph = await db.create_graph( - name="financial_graph", - documents=["doc1", "doc2"], - prompt_overrides=GraphPromptOverrides( - entity_extraction=EntityExtractionPromptOverride( - prompt_template="Extract financial entities from the following text:\n\n{content}\n\nFocus on these types of entities:\n{examples}\n\nReturn in JSON format.", - examples=[ - EntityExtractionExample(label="Apple Inc.", type="COMPANY", properties={"sector": "Technology"}), - EntityExtractionExample(label="Q3 2024", type="TIME_PERIOD"), - EntityExtractionExample(label="Revenue Growth", type="METRIC") - ] - ), - entity_resolution=EntityResolutionPromptOverride( - examples=[ - EntityResolutionExample( - canonical="Apple Inc.", - variants=["Apple", "AAPL", "Apple Computer"] - ) - ] - ) - ) - ) - - print(f"Created graph with {len(graph.entities)} entities and {len(graph.relationships)} relationships") - ``` - - - -## Graph Properties - -The returned `Graph` object has the following properties: - -- `id` (str): Unique graph identifier -- `name` (str): Graph name -- `entities` (List[Entity]): List of entities in the graph -- `relationships` (List[Relationship]): List of relationships in the graph -- `metadata` (Dict[str, Any]): Graph metadata -- `document_ids` (List[str]): Source document IDs -- `filters` (Dict[str, Any], optional): Document filters used to create the graph -- `created_at` (datetime): Creation timestamp -- `updated_at` (datetime): Last update timestamp -- `owner` (Dict[str, str]): Graph owner information -- `folder_path` (Optional[str]): Canonical folder path for the graph (if scoped) diff --git a/python-sdk/delete_app.mdx b/python-sdk/delete_app.mdx new file mode 100644 index 0000000..5813301 --- /dev/null +++ b/python-sdk/delete_app.mdx @@ -0,0 +1,35 @@ +--- +title: "delete_app" +description: "Delete a cloud app by name" +--- + + + + ```python + def delete_app( + app_name: str, + ) -> Dict[str, Any] + ``` + + + ```python + async def delete_app( + app_name: str, + ) -> Dict[str, Any] + ``` + + + +## Parameters + +- `app_name` (str): Name of the app to delete + +## Returns + +- `Dict[str, Any]`: API response with delete status + +## Examples + +```python +db.delete_app("staging-app") +``` diff --git a/python-sdk/delete_document.mdx b/python-sdk/delete_document.mdx index 570ca42..eea6404 100644 --- a/python-sdk/delete_document.mdx +++ b/python-sdk/delete_document.mdx @@ -92,4 +92,3 @@ If you don't know the document ID, you can use other methods to find it: - For convenience, you can also use the [delete_document_by_filename](/python-sdk/delete_document_by_filename) method if you know the filename but not the ID. - This operation requires appropriate permissions for the document. -- Deleting a document that is part of an existing knowledge graph will not automatically update the graph. You may need to recreate or update the graph separately. \ No newline at end of file diff --git a/python-sdk/delete_document_by_filename.mdx b/python-sdk/delete_document_by_filename.mdx index 4da80a9..bad6380 100644 --- a/python-sdk/delete_document_by_filename.mdx +++ b/python-sdk/delete_document_by_filename.mdx @@ -93,4 +93,3 @@ If multiple documents have the same filename, this method will delete the most r - This operation requires appropriate permissions for the document. - If no document exists with the specified filename, a `ValueError` will be raised. -- Deleting a document that is part of an existing knowledge graph will not automatically update the graph. You may need to recreate or update the graph separately. \ No newline at end of file diff --git a/python-sdk/folders.mdx b/python-sdk/folders.mdx index 9f1667d..4577373 100644 --- a/python-sdk/folders.mdx +++ b/python-sdk/folders.mdx @@ -7,7 +7,7 @@ description: "Organize and isolate data into logical folder groups in Morphik" Folders in Morphik provide a way to organize documents into logical groups. This is particularly useful for multi-project environments where you want to maintain separation between different contexts. Documents within a folder are isolated from those in other folders, allowing for clean organization and data separation. -> ℹ️ All folder APIs accept **folder UUIDs, names, or canonical paths** (e.g., `"/projects/alpha/specs"`). Folder objects expose `full_path`, `parent_id`, `depth`, and `child_count`; documents and graphs expose `folder_path` to mirror server responses. +> ℹ️ All folder APIs accept **folder UUIDs, names, or canonical paths** (e.g., `"/projects/alpha/specs"`). Folder objects expose `full_path`, `parent_id`, `depth`, and `child_count`; documents expose `folder_path` to mirror server responses. ## Creating and Accessing Folders @@ -110,6 +110,20 @@ chunks = db.retrieve_chunks( Folder-scoped helpers inherit the path automatically, so `folder.retrieve_chunks(..., folder_depth=-1)` will include its children. +## Expanding Scope with Additional Folders + +Folder-scoped retrieval/list/query helpers accept `additional_folders` to include extra folders in the same request: + +```python +folder = db.get_folder("/projects/alpha") + +# Search across /projects/alpha plus shared archives +results = folder.retrieve_chunks( + "design notes", + additional_folders=["/shared", "/archive"], +) +``` + ## Folder Methods All the core document operations available on the main Morphik client are also available on folder objects, but they are automatically scoped to the specific folder: @@ -124,8 +138,9 @@ All the core document operations available on the main Morphik client are also a - `list_documents` - List all documents in this folder - `batch_get_documents` - Get multiple documents by their IDs from this folder - `batch_get_chunks` - Get specific chunks by source from this folder -- `create_graph` - Create a knowledge graph from documents in this folder -- `update_graph` - Update a knowledge graph with new documents from this folder +- `get_info` - Fetch the latest folder metadata from the API +- `get_summary` - Fetch the latest folder summary +- `upsert_summary` - Create or update the folder summary - `delete_document_by_filename` - Delete a document by filename from this folder ## Managing Existing Documents and Folders @@ -247,9 +262,6 @@ A common use case for folders is separating different projects. Here's an exampl # Query is scoped to just Project B documents project_b_response = project_b.query("What are the technical requirements?") - # Create project-specific knowledge graphs - project_a.create_graph("project_a_graph") - project_b.create_graph("project_b_graph") ``` @@ -277,9 +289,6 @@ A common use case for folders is separating different projects. Here's an exampl # Query is scoped to just Project B documents project_b_response = await project_b.query("What are the technical requirements?") - # Create project-specific knowledge graphs - await project_a.create_graph("project_a_graph") - await project_b.create_graph("project_b_graph") ``` diff --git a/python-sdk/generate_cloud_uri.mdx b/python-sdk/generate_cloud_uri.mdx new file mode 100644 index 0000000..f1bc7d5 --- /dev/null +++ b/python-sdk/generate_cloud_uri.mdx @@ -0,0 +1,25 @@ +--- +title: "generate_cloud_uri" +description: "Deprecated alias for create_app" +--- + +`generate_cloud_uri` is a deprecated alias for [`create_app`](./create_app). Use `create_app` for all new integrations. + +See `create_app` for security notes about which fields are honored based on the token type. + + + + ```python + def generate_cloud_uri( + name: str, + ) -> Dict[str, str] + ``` + + + ```python + async def generate_cloud_uri( + name: str, + ) -> Dict[str, str] + ``` + + diff --git a/python-sdk/get_app_storage_usage.mdx b/python-sdk/get_app_storage_usage.mdx new file mode 100644 index 0000000..4b0b370 --- /dev/null +++ b/python-sdk/get_app_storage_usage.mdx @@ -0,0 +1,28 @@ +--- +title: "get_app_storage_usage" +description: "Return storage usage metrics for the authenticated app" +--- + + + + ```python + def get_app_storage_usage() -> AppStorageUsageResponse + ``` + + + ```python + async def get_app_storage_usage() -> AppStorageUsageResponse + ``` + + + +## Returns + +- `AppStorageUsageResponse`: Storage usage details (chunk, raw, multivector, total, document count) + +## Examples + +```python +usage = db.get_app_storage_usage() +print(usage.total_mb, usage.document_count) +``` diff --git a/python-sdk/get_document_summary.mdx b/python-sdk/get_document_summary.mdx new file mode 100644 index 0000000..0fea7a6 --- /dev/null +++ b/python-sdk/get_document_summary.mdx @@ -0,0 +1,52 @@ +--- +title: "get_document_summary" +description: "Fetch the stored summary for a document" +--- + + + + ```python + def get_document_summary( + document_id: str, + ) -> Summary + ``` + + + ```python + async def get_document_summary( + document_id: str, + ) -> Summary + ``` + + + +## Parameters + +- `document_id` (str): ID of the document + +## Returns + +- `Summary`: Stored summary payload for the document + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + summary = db.get_document_summary("doc_123") + print(summary.content) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + summary = await db.get_document_summary("doc_123") + print(summary.content) + ``` + + diff --git a/python-sdk/get_folder_by_name.mdx b/python-sdk/get_folder_by_name.mdx new file mode 100644 index 0000000..87e9929 --- /dev/null +++ b/python-sdk/get_folder_by_name.mdx @@ -0,0 +1,52 @@ +--- +title: "get_folder_by_name" +description: "Create a folder scope from a name or path" +--- + + + + ```python + def get_folder_by_name( + name: str, + ) -> Folder + ``` + + + ```python + async def get_folder_by_name( + name: str, + ) -> Folder + ``` + + + +## Parameters + +- `name` (str): Folder name or canonical path + +## Returns + +- `Folder`: Folder scope object for subsequent operations + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + folder = db.get_folder_by_name("/projects/alpha") + docs = folder.list_documents() + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + folder = await db.get_folder_by_name("/projects/alpha") + docs = await folder.list_documents() + ``` + + diff --git a/python-sdk/get_folder_summary.mdx b/python-sdk/get_folder_summary.mdx new file mode 100644 index 0000000..b1c98de --- /dev/null +++ b/python-sdk/get_folder_summary.mdx @@ -0,0 +1,52 @@ +--- +title: "get_folder_summary" +description: "Fetch the stored summary for a folder" +--- + + + + ```python + def get_folder_summary( + folder_id_or_path: str, + ) -> Summary + ``` + + + ```python + async def get_folder_summary( + folder_id_or_path: str, + ) -> Summary + ``` + + + +## Parameters + +- `folder_id_or_path` (str): Folder identifier (UUID, name, or canonical path) + +## Returns + +- `Summary`: Stored summary payload for the folder + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + summary = db.get_folder_summary("/projects/alpha") + print(summary.content) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + summary = await db.get_folder_summary("/projects/alpha") + print(summary.content) + ``` + + diff --git a/python-sdk/get_graph.mdx b/python-sdk/get_graph.mdx deleted file mode 100644 index 21f806b..0000000 --- a/python-sdk/get_graph.mdx +++ /dev/null @@ -1,146 +0,0 @@ ---- -title: "get_graph" -description: "Get a graph by name" ---- - - - - ```python - def get_graph( - name: str, - folder_name: Optional[Union[str, List[str]]] = None, - folder_depth: Optional[int] = None, - end_user_id: Optional[str] = None, - ) -> Graph - ``` - - - ```python - async def get_graph( - name: str, - folder_name: Optional[Union[str, List[str]]] = None, - folder_depth: Optional[int] = None, - end_user_id: Optional[str] = None, - ) -> Graph - ``` - - - -## Parameters - -- `name` (str): Name of the graph to retrieve -- `folder_name` (str | List[str], optional): Optional folder scope. Accepts canonical paths or a list of paths/names. -- `folder_depth` (int, optional): Folder scope depth. `None`/`0` = exact match, `-1` = include all descendants, `n > 0` = include descendants up to `n` levels deep. -- `end_user_id` (str, optional): Optional end-user scope. - -## Returns - -- `Graph`: The requested graph object. If the graph is still building it will - have `system_metadata["status"] == "processing"`. Use the convenience - helpers `graph.is_processing`, `graph.is_completed`, `graph.error`, or the - client-level `wait_for_graph_completion()` to monitor progress. - -## Examples - - - - ```python - from morphik import Morphik - - db = Morphik() - - # Get a graph by name - graph = db.get_graph("finance_graph") - - # Or fetch by path and include nested folders - nested_graph = db.get_graph("finance_graph", folder_name="/projects/alpha", folder_depth=-1) - - if graph.is_processing: - print("Graph still processing, waiting...") - graph = db.wait_for_graph_completion("finance_graph") - - # Now safe to access entities and relationships - print(f"Graph has {len(graph.entities)} entities and {len(graph.relationships)} relationships") - - # Access entities and relationships - for entity in graph.entities: - print(f"Entity: {entity.label} ({entity.type})") - - for relationship in graph.relationships: - source_entity = next((e for e in graph.entities if e.id == relationship.source_id), None) - target_entity = next((e for e in graph.entities if e.id == relationship.target_id), None) - if source_entity and target_entity: - print(f"Relationship: {source_entity.label} --{relationship.type}--> {target_entity.label}") - ``` - - - ```python - from morphik import AsyncMorphik - - async with AsyncMorphik() as db: - # Get a graph by name - graph = await db.get_graph("finance_graph") - - nested_graph = await db.get_graph( - "finance_graph", - folder_name="/projects/alpha", - folder_depth=-1, - ) - - if graph.is_processing: - print("Graph still processing, waiting...") - graph = await db.wait_for_graph_completion("finance_graph") - - # Now safe to access entities and relationships - print(f"Graph has {len(graph.entities)} entities and {len(graph.relationships)} relationships") - - # Access entities and relationships - for entity in graph.entities: - print(f"Entity: {entity.label} ({entity.type})") - - for relationship in graph.relationships: - source_entity = next((e for e in graph.entities if e.id == relationship.source_id), None) - target_entity = next((e for e in graph.entities if e.id == relationship.target_id), None) - if source_entity and target_entity: - print(f"Relationship: {source_entity.label} --{relationship.type}--> {target_entity.label}") - ``` - - - -## Graph Properties - -The returned `Graph` object has the following properties: - -- `id` (str): Unique graph identifier -- `name` (str): Graph name -- `entities` (List[Entity]): List of entities in the graph -- `relationships` (List[Relationship]): List of relationships in the graph -- `metadata` (Dict[str, Any]): Graph metadata -- `document_ids` (List[str]): Source document IDs -- `filters` (Dict[str, Any], optional): Document filters used to create the graph -- `created_at` (datetime): Creation timestamp -- `updated_at` (datetime): Last update timestamp -- `owner` (Dict[str, str]): Graph owner information -- `folder_path` (Optional[str]): Canonical folder path for the graph (if scoped) - -### Entity Properties - -Each `Entity` object has the following properties: - -- `id` (str): Unique entity identifier -- `label` (str): Display label for the entity -- `type` (str): Entity type -- `properties` (Dict[str, Any]): Entity properties -- `document_ids` (List[str]): Source document IDs -- `chunk_sources` (Dict[str, List[int]]): Source chunk numbers by document ID - -### Relationship Properties - -Each `Relationship` object has the following properties: - -- `id` (str): Unique relationship identifier -- `source_id` (str): Source entity ID -- `target_id` (str): Target entity ID -- `type` (str): Relationship type -- `document_ids` (List[str]): Source document IDs -- `chunk_sources` (Dict[str, List[int]]): Source chunk numbers by document ID diff --git a/python-sdk/get_graph_status.mdx b/python-sdk/get_graph_status.mdx deleted file mode 100644 index 10ce9d0..0000000 --- a/python-sdk/get_graph_status.mdx +++ /dev/null @@ -1,92 +0,0 @@ ---- -title: "get_graph_status" -description: "Get the current status of a graph with pipeline stage information" ---- - - - - ```python - def get_graph_status( - graph_name: str, - folder_name: Optional[Union[str, List[str]]] = None, - folder_depth: Optional[int] = None, - end_user_id: Optional[str] = None, - ) -> Dict[str, Any] - ``` - - - ```python - async def get_graph_status( - graph_name: str, - folder_name: Optional[Union[str, List[str]]] = None, - folder_depth: Optional[int] = None, - end_user_id: Optional[str] = None, - ) -> Dict[str, Any] - ``` - - - -## Parameters - -- `graph_name` (str): Name of the graph to check -- `folder_name` (str | List[str], optional): Optional folder scope (canonical path or list of paths/names) -- `folder_depth` (int, optional): Folder scope depth. `None`/`0` = exact match, `-1` = include all descendants, `n > 0` = include descendants up to `n` levels deep. -- `end_user_id` (str, optional): Optional end user ID for scoping - -## Returns - -- `Dict[str, Any]`: Status information containing status, pipeline_stage (if processing), and other metadata - -## Examples - - - - ```python - from morphik import Morphik - - db = Morphik() - - # Check graph status - status = db.get_graph_status("my_knowledge_graph") - - print(f"Status: {status.get('status')}") - if status.get('pipeline_stage'): - print(f"Pipeline stage: {status.get('pipeline_stage')}") - - # Check with folder scoping - status = db.get_graph_status( - graph_name="team_graph", - folder_name="/engineering/graphs", - folder_depth=-1, - ) - print(f"Graph status in folder: {status.get('status')}") - ``` - - - ```python - from morphik import AsyncMorphik - - async with AsyncMorphik() as db: - # Check graph status - status = await db.get_graph_status("my_knowledge_graph") - - print(f"Status: {status.get('status')}") - if status.get('pipeline_stage'): - print(f"Pipeline stage: {status.get('pipeline_stage')}") - - # Check with folder scoping - status = await db.get_graph_status( - graph_name="team_graph", - folder_name="/engineering/graphs", - folder_depth=-1, - ) - print(f"Graph status in folder: {status.get('status')}") - ``` - - - -## Notes - -- This is a lightweight endpoint that examines local status metadata and augments it with remote pipeline details when available. -- Use this to monitor graph creation or update progress. -- For polling until completion, consider using [`wait_for_graph_completion`](./wait_for_graph_completion) instead. diff --git a/python-sdk/get_graph_visualization.mdx b/python-sdk/get_graph_visualization.mdx deleted file mode 100644 index db68f4d..0000000 --- a/python-sdk/get_graph_visualization.mdx +++ /dev/null @@ -1,49 +0,0 @@ ---- -title: "get_graph_visualization" -description: "Retrieve nodes and links for visualizing a knowledge graph" ---- - - - - - ```python - def get_graph_visualization( - name: str, - folder_name: Optional[Union[str, List[str]]] = None, - folder_depth: Optional[int] = None, - end_user_id: Optional[str] = None, - ) -> Dict[str, Any] - ``` - - - - - ```python - async def get_graph_visualization( - name: str, - folder_name: Optional[Union[str, List[str]]] = None, - folder_depth: Optional[int] = None, - end_user_id: Optional[str] = None, - ) -> Dict[str, Any] - ``` - - - - -## Parameters - -- `name` (str): Graph name. -- `folder_name` (str | List[str], optional): Folder scope filter (canonical path or list of paths/names). -- `folder_depth` (int, optional): Folder scope depth. `None`/`0` = exact match, `-1` = include all descendants, `n > 0` = include descendants up to `n` levels deep. -- `end_user_id` (str, optional): End-user scope filter. - -## Returns - -- `Dict[str, Any]`: JSON with `nodes` and `links` arrays ready for d3-force or similar libraries. - -## Example - -```python -viz = db.get_graph_visualization("research_graph") -print(len(viz["nodes"]), "nodes", len(viz["links"]), "edges") -``` diff --git a/python-sdk/get_health.mdx b/python-sdk/get_health.mdx new file mode 100644 index 0000000..6654468 --- /dev/null +++ b/python-sdk/get_health.mdx @@ -0,0 +1,30 @@ +--- +title: "get_health" +description: "Return detailed health status for the API" +--- + + + + ```python + def get_health() -> DetailedHealthCheckResponse + ``` + + + ```python + async def get_health() -> DetailedHealthCheckResponse + ``` + + + +## Returns + +- `DetailedHealthCheckResponse`: Overall health plus per-service status details + +## Examples + +```python +health = db.get_health() +print(health.status) +for svc in health.services: + print(svc.name, svc.status) +``` diff --git a/python-sdk/get_info.mdx b/python-sdk/get_info.mdx new file mode 100644 index 0000000..15d7955 --- /dev/null +++ b/python-sdk/get_info.mdx @@ -0,0 +1,48 @@ +--- +title: "get_info" +description: "Fetch the latest folder metadata from the API" +--- + +This method is available on `Folder` objects. + + + + ```python + def get_info() -> FolderInfo + ``` + + + ```python + async def get_info() -> FolderInfo + ``` + + + +## Returns + +- `FolderInfo`: Folder metadata including `full_path`, `parent_id`, `depth`, and `child_count` + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + folder = db.get_folder("/projects/alpha") + info = folder.get_info() + print(info.full_path, info.child_count) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + folder = await db.get_folder("/projects/alpha") + info = await folder.get_info() + print(info.full_path, info.child_count) + ``` + + diff --git a/python-sdk/get_logs.mdx b/python-sdk/get_logs.mdx new file mode 100644 index 0000000..9860d5d --- /dev/null +++ b/python-sdk/get_logs.mdx @@ -0,0 +1,46 @@ +--- +title: "get_logs" +description: "Fetch recent API log events for the authenticated app" +--- + + + + ```python + def get_logs( + limit: int = 100, + hours: float = 4.0, + op_type: Optional[str] = None, + status: Optional[str] = None, + ) -> List[LogResponse] + ``` + + + ```python + async def get_logs( + limit: int = 100, + hours: float = 4.0, + op_type: Optional[str] = None, + status: Optional[str] = None, + ) -> List[LogResponse] + ``` + + + +## Parameters + +- `limit` (int, optional): Maximum number of log entries. Defaults to 100. +- `hours` (float, optional): Lookback window in hours. Defaults to 4.0. +- `op_type` (str, optional): Filter by operation type (for example `query`, `ingest`) +- `status` (str, optional): Filter by status (for example `ok`, `error`) + +## Returns + +- `List[LogResponse]`: Recent log entries + +## Examples + +```python +logs = db.get_logs(limit=20, hours=24) +for item in logs: + print(item.operation_type, item.status, item.timestamp) +``` diff --git a/python-sdk/get_recent_usage.mdx b/python-sdk/get_recent_usage.mdx deleted file mode 100644 index 42cf960..0000000 --- a/python-sdk/get_recent_usage.mdx +++ /dev/null @@ -1,49 +0,0 @@ ---- -title: "get_recent_usage" -description: "Retrieve a log of recent usage entries with optional filters" ---- - - - - - ```python - def get_recent_usage( - operation_type: Optional[str] = None, - since: Optional[datetime] = None, - status: Optional[str] = None, - ) -> List[Dict[str, Any]] - ``` - - - - - ```python - async def get_recent_usage( - operation_type: Optional[str] = None, - since: Optional[datetime] = None, - status: Optional[str] = None, - ) -> List[Dict[str, Any]] - ``` - - - - -## Parameters - -- `operation_type` (str, optional): Filter by operation (e.g. `"query"`, `"ingest"`). -- `since` (datetime | str, optional): Only return records newer than this timestamp (ISO string or `datetime`). -- `status` (str, optional): Filter by `"success"` or `"error"`. - -## Returns - -- `List[Dict[str, Any]]`: Each record includes `timestamp`, `operation_type`, `tokens_used`, `duration_ms`, and `status`. - -## Example - -```python -from datetime import datetime, timedelta - -one_hour_ago = datetime.utcnow() - timedelta(hours=1) -recent = db.get_recent_usage(since=one_hour_ago) -print(len(recent), "operations in the last hour") -``` \ No newline at end of file diff --git a/python-sdk/get_summary.mdx b/python-sdk/get_summary.mdx new file mode 100644 index 0000000..75c2dc9 --- /dev/null +++ b/python-sdk/get_summary.mdx @@ -0,0 +1,48 @@ +--- +title: "get_summary" +description: "Fetch the latest summary for a folder scope" +--- + +This method is available on `Folder` objects. + + + + ```python + def get_summary() -> Summary + ``` + + + ```python + async def get_summary() -> Summary + ``` + + + +## Returns + +- `Summary`: Stored summary payload for the folder + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + folder = db.get_folder("/projects/alpha") + summary = folder.get_summary() + print(summary.content) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + folder = await db.get_folder("/projects/alpha") + summary = await folder.get_summary() + print(summary.content) + ``` + + diff --git a/python-sdk/get_usage_stats.mdx b/python-sdk/get_usage_stats.mdx deleted file mode 100644 index df339eb..0000000 --- a/python-sdk/get_usage_stats.mdx +++ /dev/null @@ -1,32 +0,0 @@ ---- -title: "get_usage_stats" -description: "Retrieve cumulative token usage for the current user or application" ---- - - - - - ```python - def get_usage_stats() -> Dict[str, int] - ``` - - - - - ```python - async def get_usage_stats() -> Dict[str, int] - ``` - - - - -## Returns - -- `Dict[str, int]`: Mapping of operation types (e.g. `"query"`, `"agent"`) to total tokens used. - -## Example - -```python -stats = db.get_usage_stats() -print(stats["query"], "tokens consumed for queries") -``` \ No newline at end of file diff --git a/python-sdk/ingest_directory.mdx b/python-sdk/ingest_directory.mdx new file mode 100644 index 0000000..147d05d --- /dev/null +++ b/python-sdk/ingest_directory.mdx @@ -0,0 +1,80 @@ +--- +title: "ingest_directory" +description: "Ingest all files in a directory" +--- + + + + ```python + def ingest_directory( + directory: Union[str, Path], + recursive: bool = False, + pattern: str = "*", + metadata: Optional[Dict[str, Any]] = None, + use_colpali: bool = True, + parallel: bool = True, + ) -> List[Document] + ``` + + + ```python + async def ingest_directory( + directory: Union[str, Path], + recursive: bool = False, + pattern: str = "*", + metadata: Optional[Dict[str, Any]] = None, + use_colpali: bool = True, + parallel: bool = True, + ) -> List[Document] + ``` + + + +## Parameters + +- `directory` (str | Path): Directory containing files to ingest +- `recursive` (bool, optional): Whether to recurse into subdirectories. Defaults to False. +- `pattern` (str, optional): Glob pattern to select files (for example `"*.pdf"`). Defaults to `"*"`. +- `metadata` (Dict[str, Any], optional): Metadata applied to each ingested file +- `use_colpali` (bool, optional): Whether to use ColPali-style embedding. Defaults to True. +- `parallel` (bool, optional): Whether to process files in parallel. Defaults to True. + +## Returns + +- `List[Document]`: List of ingested document metadata + +## Examples + + + + ```python + from pathlib import Path + from morphik import Morphik + + db = Morphik() + + docs = db.ingest_directory( + Path("/data/contracts"), + recursive=True, + pattern="*.pdf", + metadata={"category": "contracts"}, + ) + print(f"Ingested {len(docs)} documents") + ``` + + + ```python + from pathlib import Path + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + docs = await db.ingest_directory( + Path("/data/contracts"), + recursive=True, + pattern="*.pdf", + metadata={"category": "contracts"}, + ) + print(f"Ingested {len(docs)} documents") + ``` + + diff --git a/python-sdk/list_apps.mdx b/python-sdk/list_apps.mdx new file mode 100644 index 0000000..f29f5f7 --- /dev/null +++ b/python-sdk/list_apps.mdx @@ -0,0 +1,67 @@ +--- +title: "list_apps" +description: "List cloud apps accessible to the current credentials" +--- + + + + ```python + def list_apps( + org_id: Optional[str] = None, + user_id: Optional[str] = None, + app_id_filter: Optional[Union[str, Dict[str, Any], List[Any]]] = None, + app_name_filter: Optional[Union[str, Dict[str, Any], List[Any]]] = None, + limit: int = 100, + offset: int = 0, + ) -> Dict[str, Any] + ``` + + + ```python + async def list_apps( + org_id: Optional[str] = None, + user_id: Optional[str] = None, + app_id_filter: Optional[Union[str, Dict[str, Any], List[Any]]] = None, + app_name_filter: Optional[Union[str, Dict[str, Any], List[Any]]] = None, + limit: int = 100, + offset: int = 0, + ) -> Dict[str, Any] + ``` + + + +## Parameters + +- `org_id` (str, optional): Filter by organization ID +- `user_id` (str, optional): Filter by user ID +- `app_id_filter` (str | dict | list, optional): JSON filter for app IDs (dict/list will be serialized) +- `app_name_filter` (str | dict | list, optional): JSON filter for app names (dict/list will be serialized) +- `limit` (int, optional): Max results per page. Defaults to 100 (clamped to 500). +- `offset` (int, optional): Pagination offset. Defaults to 0. + +## Returns + +- `Dict[str, Any]`: API response containing apps and pagination metadata + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + apps = db.list_apps(limit=20) + print(apps) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + apps = await db.list_apps(app_name_filter={"$eq": "demo"}) + print(apps) + ``` + + diff --git a/python-sdk/list_graphs.mdx b/python-sdk/list_graphs.mdx deleted file mode 100644 index 4951f83..0000000 --- a/python-sdk/list_graphs.mdx +++ /dev/null @@ -1,102 +0,0 @@ ---- -title: "list_graphs" -description: "List all graphs the user has access to" ---- - - - - ```python - def list_graphs( - folder_name: Optional[Union[str, List[str]]] = None, - folder_depth: Optional[int] = None, - end_user_id: Optional[str] = None, - ) -> List[Graph] - ``` - - - ```python - async def list_graphs( - folder_name: Optional[Union[str, List[str]]] = None, - folder_depth: Optional[int] = None, - end_user_id: Optional[str] = None, - ) -> List[Graph] - ``` - - - -## Parameters - -- `folder_name` (str | List[str], optional): Optional folder scope. Accepts canonical paths or a list of paths/names. -- `folder_depth` (int, optional): Folder scope depth. `None`/`0` = exact match, `-1` = include all descendants, `n > 0` = include descendants up to `n` levels deep. -- `end_user_id` (str, optional): Optional end-user scope. - -## Returns - -- `List[Graph]`: List of graph objects - -## Examples - - - - ```python - from morphik import Morphik - - db = Morphik() - - # List all accessible graphs - graphs = db.list_graphs() - - for graph in graphs: - status = graph.status or "completed" - print( - f"Graph: {graph.name} (status={status}), " - f"Entities: {len(graph.entities)}, Relationships: {len(graph.relationships)}", - ) - - # Scope to a nested folder subtree - nested_graphs = db.list_graphs(folder_name="/projects/alpha", folder_depth=-1) - - # Find the most recent graph - latest_graph = max(graphs, key=lambda g: g.updated_at) - print(f"Most recently updated: {latest_graph.name} (updated {latest_graph.updated_at})") - ``` - - - ```python - from morphik import AsyncMorphik - - async with AsyncMorphik() as db: - # List all accessible graphs - graphs = await db.list_graphs() - - for graph in graphs: - status = graph.status or "completed" - print( - f"Graph: {graph.name} (status={status}), " - f"Entities: {len(graph.entities)}, Relationships: {len(graph.relationships)}", - ) - - nested_graphs = await db.list_graphs(folder_name="/projects/alpha", folder_depth=-1) - - # Find the most recent graph - latest_graph = max(graphs, key=lambda g: g.updated_at) - print(f"Most recently updated: {latest_graph.name} (updated {latest_graph.updated_at})") - ``` - - - -## Graph Properties - -Each `Graph` object in the returned list has the following properties: - -- `id` (str): Unique graph identifier -- `name` (str): Graph name -- `entities` (List[Entity]): List of entities in the graph -- `relationships` (List[Relationship]): List of relationships in the graph -- `metadata` (Dict[str, Any]): Graph metadata -- `document_ids` (List[str]): Source document IDs -- `filters` (Dict[str, Any], optional): Document filters used to create the graph -- `created_at` (datetime): Creation timestamp -- `updated_at` (datetime): Last update timestamp -- `owner` (Dict[str, str]): Graph owner information -- `folder_path` (Optional[str]): Canonical folder path for the graph (if scoped) diff --git a/python-sdk/morphik.mdx b/python-sdk/morphik.mdx index 2174654..314ca02 100644 --- a/python-sdk/morphik.mdx +++ b/python-sdk/morphik.mdx @@ -71,7 +71,7 @@ Morphik supports organizing and isolating data by user and folder. This provides -Nested folders are supported across the SDK. Use canonical paths (e.g., `"/projects/alpha/specs"`) when creating or scoping folders, and pass `folder_depth` on retrieval/list/graph helpers to include descendant folders. +Nested folders are supported across the SDK. Use canonical paths (e.g., `"/projects/alpha/specs"`) when creating or scoping folders, and pass `folder_depth` on retrieval/list helpers to include descendant folders. For detailed documentation and examples: - [Folder Management](/python-sdk/folders) - Organizing documents by logical groups @@ -84,12 +84,24 @@ Both clients share the same constructor parameters: ```python - Morphik(uri: Optional[str] = None, timeout: int = 30, is_local: bool = False) + Morphik( + uri: Optional[str] = None, + timeout: int = 30, + is_local: bool = False, + http2: Optional[bool] = None, + http2_fallback: bool = True, + ) ``` ```python - AsyncMorphik(uri: Optional[str] = None, timeout: int = 30, is_local: bool = False) + AsyncMorphik( + uri: Optional[str] = None, + timeout: int = 30, + is_local: bool = False, + http2: Optional[bool] = None, + http2_fallback: bool = True, + ) ``` @@ -99,46 +111,82 @@ Both clients share the same constructor parameters: - `uri` (str, optional): Morphik URI in format "morphik://<owner_id>:<token>@<host>". If not provided, connects to http://localhost:8000 without authentication. - `timeout` (int, optional): Request timeout in seconds. Defaults to 30. - `is_local` (bool, optional): Whether connecting to local development server. Defaults to False. +- `http2` (bool, optional): Enable HTTP/2 when possible. Defaults to None (auto-disabled for local). +- `http2_fallback` (bool, optional): Fall back to HTTP/1.1 if HTTP/2 fails. Defaults to True. ## Methods Morphik provides the following methods. Each method page includes both synchronous and asynchronous versions. -### Document Operations +### Document Ingestion - [ingest_text](/python-sdk/ingest_text) - [ingest_file](/python-sdk/ingest_file) - [ingest_files](/python-sdk/ingest_files) +- [ingest_directory](/python-sdk/ingest_directory) - [query_document](/python-sdk/query_document) + +### Document Retrieval - [retrieve_chunks](/python-sdk/retrieve_chunks) +- [retrieve_chunks_grouped](/python-sdk/retrieve_chunks_grouped) - [retrieve_docs](/python-sdk/retrieve_docs) - [query](/python-sdk/query) - [list_documents](/python-sdk/list_documents) +- [search_documents](/python-sdk/search_documents) - [get_document](/python-sdk/get_document) - [get_document_by_filename](/python-sdk/get_document_by_filename) + +### Document Updates & Summaries - [update_document_with_text](/python-sdk/update_document_with_text) - [update_document_with_file](/python-sdk/update_document_with_file) - [update_document_metadata](/python-sdk/update_document_metadata) - [update_document_by_filename_with_text](/python-sdk/update_document_by_filename_with_text) - [update_document_by_filename_with_file](/python-sdk/update_document_by_filename_with_file) - [update_document_by_filename_metadata](/python-sdk/update_document_by_filename_metadata) +- [get_document_summary](/python-sdk/get_document_summary) +- [upsert_document_summary](/python-sdk/upsert_document_summary) +- [get_document_status](/python-sdk/get_document_status) +- [wait_for_document_completion](/python-sdk/wait_for_document_completion) + +### Document Management +- [get_document_file](/python-sdk/get_document_file) +- [extract_document_pages](/python-sdk/extract_document_pages) +- [get_document_download_url](/python-sdk/get_document_download_url) - [delete_document](/python-sdk/delete_document) - [delete_document_by_filename](/python-sdk/delete_document_by_filename) - [batch_get_documents](/python-sdk/batch_get_documents) - [batch_get_chunks](/python-sdk/batch_get_chunks) -### Folder Operations +### Folder & User Scoping +- [folders](/python-sdk/folders) - [create_folder](/python-sdk/create_folder) - [list_folders](/python-sdk/list_folders) - [get_folder](/python-sdk/get_folder) +- [get_folder_by_name](/python-sdk/get_folder_by_name) +- [get_folder_summary](/python-sdk/get_folder_summary) +- [upsert_folder_summary](/python-sdk/upsert_folder_summary) +- [get_folders_summary](/python-sdk/get_folders_summary) +- [get_folders_details](/python-sdk/get_folders_details) - [add_document_to_folder](/python-sdk/add_document_to_folder) - [remove_document_from_folder](/python-sdk/remove_document_from_folder) - [delete_folder](/python-sdk/delete_folder) - -### Knowledge Graph Operations -- [create_graph](/python-sdk/create_graph) -- [get_graph](/python-sdk/get_graph) -- [list_graphs](/python-sdk/list_graphs) -- [update_graph](/python-sdk/update_graph) +- [users](/python-sdk/users) + +### Apps & Ops +- [list_apps](/python-sdk/list_apps) +- [create_app](/python-sdk/create_app) +- [generate_cloud_uri](/python-sdk/generate_cloud_uri) +- [rename_app](/python-sdk/rename_app) +- [rotate_app_token](/python-sdk/rotate_app_token) +- [delete_app](/python-sdk/delete_app) +- [requeue_ingestion_jobs](/python-sdk/requeue_ingestion_jobs) +- [get_logs](/python-sdk/get_logs) +- [get_health](/python-sdk/get_health) +- [get_app_storage_usage](/python-sdk/get_app_storage_usage) +- [ping](/python-sdk/ping) + +### Chat & Conversation +- [list_chat_conversations](/python-sdk/list_chat_conversations) +- [get_chat_history](/python-sdk/get_chat_history) ### Client Management - [close](/python-sdk/close) diff --git a/python-sdk/query.mdx b/python-sdk/query.mdx index 4d7a229..ba87195 100644 --- a/python-sdk/query.mdx +++ b/python-sdk/query.mdx @@ -19,9 +19,6 @@ Generate completion using relevant chunks as context. temperature: Optional[float] = None, use_colpali: bool = True, use_reranking: Optional[bool] = None, - graph_name: Optional[str] = None, - hop_depth: int = 1, - include_paths: bool = False, prompt_overrides: Optional[Union[QueryPromptOverrides, Dict[str, Any]]] = None, folder_name: Optional[Union[str, List[str]]] = None, folder_depth: Optional[int] = None, @@ -43,9 +40,6 @@ Generate completion using relevant chunks as context. temperature: Optional[float] = None, use_colpali: bool = True, use_reranking: Optional[bool] = None, - graph_name: Optional[str] = None, - hop_depth: int = 1, - include_paths: bool = False, prompt_overrides: Optional[Union[QueryPromptOverrides, Dict[str, Any]]] = None, folder_name: Optional[Union[str, List[str]]] = None, folder_depth: Optional[int] = None, @@ -68,9 +62,6 @@ Generate completion using relevant chunks as context. - `temperature` (float, optional): Model temperature - `use_colpali` (bool, optional): Whether to use ColPali-style embedding model to generate the completion (only works for documents ingested with `use_colpali=True`). Defaults to True. - `use_reranking` (bool, optional): Override workspace reranking configuration for this request. -- `graph_name` (str, optional): Optional name of the graph to use for knowledge graph-enhanced retrieval -- `hop_depth` (int, optional): Number of relationship hops to traverse in the graph (1-3). Defaults to 1. -- `include_paths` (bool, optional): Whether to include relationship paths in the response. Defaults to False. - `prompt_overrides` (QueryPromptOverrides | Dict[str, Any], optional): Optional customizations for entity extraction, resolution, and query prompts - `folder_name` (str | List[str], optional): Optional folder scope. Accepts canonical paths (e.g., `/projects/alpha/specs`) or a list of paths/names. - `folder_depth` (int, optional): Folder scope depth. `None`/`0` = exact match, `-1` = include all descendants, `n > 0` = include descendants up to `n` levels deep. @@ -179,54 +170,6 @@ For more advanced filtering patterns, see the [Complex Metadata Filtering cookbo -### Knowledge Graph Enhanced Query - - - - ```python - from morphik import Morphik - - db = Morphik() - - # Use a knowledge graph to enhance the query - response = db.query( - "How does product X relate to customer segment Y?", - graph_name="market_graph", - hop_depth=2, - include_paths=True - ) - - print(response.completion) - - # If include_paths=True, you can inspect the graph paths - if response.metadata and "graph" in response.metadata: - for path in response.metadata["graph"]["paths"]: - print(" -> ".join(path)) - ``` - - - ```python - from morphik import AsyncMorphik - - async with AsyncMorphik() as db: - # Knowledge graph enhanced query - kg_response = await db.query( - "How does product X relate to customer segment Y?", - graph_name="market_graph", - hop_depth=2, - include_paths=True - ) - - print(kg_response.completion) - - # If include_paths=True, you can inspect the graph paths - if kg_response.metadata and "graph" in kg_response.metadata: - for path in kg_response.metadata["graph"]["paths"]: - print(" -> ".join(path)) - ``` - - - ### With Custom Prompt Overrides @@ -271,7 +214,6 @@ For more advanced filtering patterns, see the [Complex Metadata Filtering cookbo # Example with both query and entity extraction customization response = await db.query( "How does the medication affect diabetes?", - graph_name="medical_graph", prompt_overrides=QueryPromptOverrides( # Customize how responses are generated query=QueryPromptOverride( @@ -299,7 +241,7 @@ The `CompletionResponse` object returned by this method has the following proper - `completion` (str | Dict[str, Any] | None): The generated completion text or the structured output dictionary. - `usage` (Dict[str, int]): Token usage information - `sources` (List[ChunkSource]): Sources of chunks used in the completion -- `metadata` (Dict[str, Any], optional): Additional metadata about the completion. When using a knowledge graph with `include_paths=True`, this contains graph traversal information. +- `metadata` (Dict[str, Any], optional): Additional metadata about the completion (if provided by the server). - `finish_reason` (Optional[str]): Reason the generation finished (e.g., 'stop', 'length') ### ChunkSource Properties diff --git a/python-sdk/rename_app.mdx b/python-sdk/rename_app.mdx new file mode 100644 index 0000000..7ee849a --- /dev/null +++ b/python-sdk/rename_app.mdx @@ -0,0 +1,41 @@ +--- +title: "rename_app" +description: "Rename a cloud app by ID or current name" +--- + + + + ```python + def rename_app( + new_name: str, + app_id: Optional[str] = None, + app_name: Optional[str] = None, + ) -> Dict[str, Any] + ``` + + + ```python + async def rename_app( + new_name: str, + app_id: Optional[str] = None, + app_name: Optional[str] = None, + ) -> Dict[str, Any] + ``` + + + +## Parameters + +- `new_name` (str): New app name +- `app_id` (str, optional): App ID to rename +- `app_name` (str, optional): Current app name to rename + +## Returns + +- `Dict[str, Any]`: API response with rename status + +## Examples + +```python +db.rename_app(new_name="prod-app", app_name="staging-app") +``` diff --git a/python-sdk/requeue_ingestion_jobs.mdx b/python-sdk/requeue_ingestion_jobs.mdx new file mode 100644 index 0000000..fe639b0 --- /dev/null +++ b/python-sdk/requeue_ingestion_jobs.mdx @@ -0,0 +1,57 @@ +--- +title: "requeue_ingestion_jobs" +description: "Requeue ingestion jobs for failed or stuck documents" +--- + + + + ```python + def requeue_ingestion_jobs( + *, + jobs: Optional[List[Union[RequeueIngestionJob, Dict[str, Any]]]] = None, + include_all: bool = False, + statuses: Optional[List[str]] = None, + limit: Optional[int] = None, + ) -> RequeueIngestionResponse + ``` + + + ```python + async def requeue_ingestion_jobs( + *, + jobs: Optional[List[Union[RequeueIngestionJob, Dict[str, Any]]]] = None, + include_all: bool = False, + statuses: Optional[List[str]] = None, + limit: Optional[int] = None, + ) -> RequeueIngestionResponse + ``` + + + +## Parameters + +- `jobs` (List[RequeueIngestionJob | Dict[str, Any]], optional): Specific jobs to requeue +- `include_all` (bool, optional): Requeue all matching jobs. Defaults to False. +- `statuses` (List[str], optional): Limit to specific statuses (for example `["failed"]`) +- `limit` (int, optional): Limit the number of jobs to requeue + +## Returns + +- `RequeueIngestionResponse`: Result details for each requeued job + +## Notes + +- You must provide either `jobs` or `include_all=True`. + +## Examples + +```python +from morphik import Morphik +from morphik.models import RequeueIngestionJob + +db = Morphik() +resp = db.requeue_ingestion_jobs( + jobs=[RequeueIngestionJob(external_id="doc_123")], +) +print(resp.results) +``` diff --git a/python-sdk/retrieve_chunks_grouped.mdx b/python-sdk/retrieve_chunks_grouped.mdx index 101f072..ff86d5a 100644 --- a/python-sdk/retrieve_chunks_grouped.mdx +++ b/python-sdk/retrieve_chunks_grouped.mdx @@ -18,9 +18,6 @@ description: "Retrieve relevant chunks with grouping for UI display" end_user_id: Optional[str] = None, padding: int = 0, output_format: Optional[str] = None, - graph_name: Optional[str] = None, - hop_depth: int = 1, - include_paths: bool = False, query_image: Optional[str] = None, ) -> GroupedChunkResponse ``` @@ -39,9 +36,6 @@ description: "Retrieve relevant chunks with grouping for UI display" end_user_id: Optional[str] = None, padding: int = 0, output_format: Optional[str] = None, - graph_name: Optional[str] = None, - hop_depth: int = 1, - include_paths: bool = False, query_image: Optional[str] = None, ) -> GroupedChunkResponse ``` @@ -64,9 +58,6 @@ description: "Retrieve relevant chunks with grouping for UI display" - `"base64"` (default): Returns base64-encoded image data - `"url"`: Returns presigned HTTPS URLs - `"text"`: Converts images to markdown text via OCR (faster inference, best for text-heavy documents) -- `graph_name` (str, optional): Name of the graph to use for knowledge graph-enhanced retrieval -- `hop_depth` (int, optional): Number of relationship hops to traverse in the graph. Defaults to 1. -- `include_paths` (bool, optional): Whether to include relationship paths in the response. Defaults to False. - `query_image` (str, optional): Base64-encoded image for reverse image search. Mutually exclusive with `query`. Requires `use_colpali=True`. ## Returns @@ -115,13 +106,6 @@ Filters follow the same JSON syntax across the API. See the [Metadata Filtering folder_depth=-1, ) - # With knowledge graph enhancement - response = db.retrieve_chunks_grouped( - query="product features", - graph_name="product_graph", - hop_depth=2, - include_paths=True, - ) ``` @@ -158,13 +142,6 @@ Filters follow the same JSON syntax across the API. See the [Metadata Filtering folder_depth=-1, ) - # With knowledge graph enhancement - response = await db.retrieve_chunks_grouped( - query="product features", - graph_name="product_graph", - hop_depth=2, - include_paths=True, - ) ``` @@ -192,7 +169,6 @@ Each `ChunkGroup` in `groups` has: - The `chunks` list provides backward compatibility with flat chunk lists. - The `groups` list organizes results with their padding context, ideal for building search result UIs. - When `padding` is specified, surrounding chunks are included in `padding_chunks` for each group. -- Knowledge graph parameters (`graph_name`, `hop_depth`, `include_paths`) enable graph-enhanced retrieval. ## Reverse Image Search diff --git a/python-sdk/rotate_app_token.mdx b/python-sdk/rotate_app_token.mdx new file mode 100644 index 0000000..5c130b6 --- /dev/null +++ b/python-sdk/rotate_app_token.mdx @@ -0,0 +1,41 @@ +--- +title: "rotate_app_token" +description: "Rotate an app token by ID or name" +--- + + + + ```python + def rotate_app_token( + app_id: Optional[str] = None, + app_name: Optional[str] = None, + expiry_days: Optional[int] = None, + ) -> Dict[str, Any] + ``` + + + ```python + async def rotate_app_token( + app_id: Optional[str] = None, + app_name: Optional[str] = None, + expiry_days: Optional[int] = None, + ) -> Dict[str, Any] + ``` + + + +## Parameters + +- `app_id` (str, optional): App ID to rotate +- `app_name` (str, optional): App name to rotate +- `expiry_days` (int, optional): New token expiry in days + +## Returns + +- `Dict[str, Any]`: API response containing the rotated token and metadata + +## Examples + +```python +db.rotate_app_token(app_name="demo", expiry_days=30) +``` diff --git a/python-sdk/signin.mdx b/python-sdk/signin.mdx new file mode 100644 index 0000000..ba25488 --- /dev/null +++ b/python-sdk/signin.mdx @@ -0,0 +1,56 @@ +--- +title: "signin" +description: "Create a user scope for end-user isolation" +--- + + + + ```python + def signin( + end_user_id: str, + ) -> UserScope + ``` + + + ```python + async def signin( + end_user_id: str, + ) -> AsyncUserScope + ``` + + + +## Parameters + +- `end_user_id` (str): End-user identifier to scope all operations + +## Returns + +- `UserScope` / `AsyncUserScope`: Scoped client that automatically includes `end_user_id` + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + user = db.signin("user_123") + docs = user.list_documents() + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + user = db.signin("user_123") + docs = await user.list_documents() + ``` + + + +## Notes + +- You can also scope a folder to a user: `folder.signin("user_123")`. diff --git a/python-sdk/update_graph.mdx b/python-sdk/update_graph.mdx deleted file mode 100644 index 965b6e8..0000000 --- a/python-sdk/update_graph.mdx +++ /dev/null @@ -1,316 +0,0 @@ ---- -title: "update_graph" -description: "Update an existing knowledge graph with new documents" ---- - -## Usage - - - - ```python - from morphik import Morphik - - db = Morphik() - - # Update a graph with additional documents - updated_graph = db.update_graph( - name="research_graph", - additional_filters={"category": "new_research"}, - additional_documents=["doc_123", "doc_456"], - folder_name="/projects/alpha", - folder_depth=-1, - ) - - print(f"Graph now has {len(updated_graph.entities)} entities") - print(f"Graph now has {len(updated_graph.relationships)} relationships") - ``` - - - - ```python - from morphik import AsyncMorphik - - async with AsyncMorphik() as db: - # Update a graph with additional documents - updated_graph = await db.update_graph( - name="research_graph", - additional_filters={"category": "new_research"}, - additional_documents=["doc_123", "doc_456"], - folder_name="/projects/alpha", - folder_depth=-1, - ) - - print(f"Graph now has {len(updated_graph.entities)} entities") - print(f"Graph now has {len(updated_graph.relationships)} relationships") - ``` - - - -## Parameters - -- `name` (str): Name of the graph to update -- `additional_filters` (Dict[str, Any], optional): Optional additional metadata filters to determine which new documents to include -- `additional_documents` (List[str], optional): Optional list of additional document IDs to include -- `prompt_overrides` (GraphPromptOverrides | Dict[str, Any], optional): Optional customizations for entity extraction and resolution prompts -- `folder_name` (str | List[str], optional): Optional folder scope (canonical path or list of paths/names) -- `folder_depth` (int, optional): Folder scope depth. `None`/`0` = exact match, `-1` = include all descendants, `n > 0` = include descendants up to `n` levels deep. -- `end_user_id` (str, optional): Optional end-user scope - -## Returns - -A Graph object representing the updated knowledge graph. - -## Description - -This method processes additional documents matching the original or new filters, extracts entities and relationships, and updates the graph with new information. - -The graph update operation: -1. Retrieves additional documents based on filters and/or specific document IDs -2. Extracts entities and relationships from these documents -3. Intelligently merges new entities and relationships with the existing graph -4. Returns the updated graph with all entities and relationships - -## Advanced Examples - -### With Entity Resolution Examples - - - - ```python - from morphik.models import EntityResolutionPromptOverride, EntityResolutionExample, GraphPromptOverrides - - # Update with custom entity resolution examples - updated_graph = db.update_graph( - name="research_graph", - additional_documents=["doc_123"], - prompt_overrides=GraphPromptOverrides( - entity_resolution=EntityResolutionPromptOverride( - examples=[ - EntityResolutionExample( - canonical="Machine Learning", - variants=["ML", "machine learning", "AI/ML"] - ), - EntityResolutionExample( - canonical="Natural Language Processing", - variants=["NLP", "natural language processing", "text processing"] - ) - ] - ) - ) - ) - - # With custom entity resolution prompt template - updated_graph = db.update_graph( - name="research_graph", - additional_filters={"year": "2025"}, - prompt_overrides=GraphPromptOverrides( - entity_resolution=EntityResolutionPromptOverride( - prompt_template="""I have extracted the following entities from the text: - -{entities_str} - -Here are examples of how different entity references can be grouped together: - -{examples_json} - -Please resolve these entities by identifying which mentions refer to the same entity. -Group them together, selecting a canonical/preferred form for each group. -Return your resolution in JSON format with the canonical form and all its variants. -""", - examples=[ - EntityResolutionExample( - canonical="General AI", - variants=["AGI", "General Artificial Intelligence", "General AI"] - ) - ] - ) - ) - ) - ``` - - - - ```python - from morphik.models import EntityResolutionPromptOverride, EntityResolutionExample, GraphPromptOverrides - - # Update with custom entity resolution examples - updated_graph = await db.update_graph( - name="research_graph", - additional_documents=["doc_123"], - prompt_overrides=GraphPromptOverrides( - entity_resolution=EntityResolutionPromptOverride( - examples=[ - EntityResolutionExample( - canonical="Machine Learning", - variants=["ML", "machine learning", "AI/ML"] - ), - EntityResolutionExample( - canonical="Natural Language Processing", - variants=["NLP", "natural language processing", "text processing"] - ) - ] - ) - ) - ) - - # With custom entity resolution prompt template - updated_graph = await db.update_graph( - name="research_graph", - additional_filters={"year": "2025"}, - prompt_overrides=GraphPromptOverrides( - entity_resolution=EntityResolutionPromptOverride( - prompt_template="""I have extracted the following entities from the text: - -{entities_str} - -Here are examples of how different entity references can be grouped together: - -{examples_json} - -Please resolve these entities by identifying which mentions refer to the same entity. -Group them together, selecting a canonical/preferred form for each group. -Return your resolution in JSON format with the canonical form and all its variants. -""", - examples=[ - EntityResolutionExample( - canonical="General AI", - variants=["AGI", "General Artificial Intelligence", "General AI"] - ) - ] - ) - ) - ) - ``` - - - -### With Entity Extraction Examples - - - - ```python - from morphik.models import EntityExtractionPromptOverride, EntityExtractionExample, GraphPromptOverrides - - # Update with custom entity extraction examples - updated_graph = db.update_graph( - name="medical_graph", - additional_filters={"category": "new_medical_data"}, - prompt_overrides=GraphPromptOverrides( - entity_extraction=EntityExtractionPromptOverride( - examples=[ - EntityExtractionExample(label="Insulin", type="MEDICATION"), - EntityExtractionExample(label="Diabetes", type="CONDITION"), - EntityExtractionExample(label="Heart rate", type="VITAL_SIGN"), - EntityExtractionExample(label="Cardiology", type="SPECIALTY") - ] - ) - ) - ) - - # With custom entity extraction template - updated_graph = db.update_graph( - name="legal_graph", - additional_documents=["contract1", "contract2"], - prompt_overrides=GraphPromptOverrides( - entity_extraction=EntityExtractionPromptOverride( - prompt_template="""Extract legal entities from the following document: - -{content} - -Focus on these types of entities: -{examples} - -Return the extracted entities in JSON format with the following structure: -[ - {"label": "entity name", "type": "ENTITY_TYPE", "properties": {"key": "value"}} -] -""", - examples=[ - EntityExtractionExample( - label="John Smith", - type="PERSON", - properties={"role": "Plaintiff"} - ), - EntityExtractionExample( - label="Acme Corporation", - type="ORGANIZATION", - properties={"type": "Corporation"} - ), - EntityExtractionExample( - label="January 15, 2025", - type="DATE" - ) - ] - ) - ) - ) - ``` - - - - ```python - from morphik.models import EntityExtractionPromptOverride, EntityExtractionExample, GraphPromptOverrides - - # Update with custom entity extraction examples - updated_graph = await db.update_graph( - name="medical_graph", - additional_filters={"category": "new_medical_data"}, - prompt_overrides=GraphPromptOverrides( - entity_extraction=EntityExtractionPromptOverride( - examples=[ - EntityExtractionExample(label="Insulin", type="MEDICATION"), - EntityExtractionExample(label="Diabetes", type="CONDITION"), - EntityExtractionExample(label="Heart rate", type="VITAL_SIGN"), - EntityExtractionExample(label="Cardiology", type="SPECIALTY") - ] - ) - ) - ) - - # With custom entity extraction template - updated_graph = await db.update_graph( - name="legal_graph", - additional_documents=["contract1", "contract2"], - prompt_overrides=GraphPromptOverrides( - entity_extraction=EntityExtractionPromptOverride( - prompt_template="""Extract legal entities from the following document: - -{content} - -Focus on these types of entities: -{examples} - -Return the extracted entities in JSON format with the following structure: -[ - {"label": "entity name", "type": "ENTITY_TYPE", "properties": {"key": "value"}} -] -""", - examples=[ - EntityExtractionExample( - label="John Smith", - type="PERSON", - properties={"role": "Plaintiff"} - ), - EntityExtractionExample( - label="Acme Corporation", - type="ORGANIZATION", - properties={"type": "Corporation"} - ), - EntityExtractionExample( - label="January 15, 2025", - type="DATE" - ) - ] - ) - ) - ) - ``` - - - -## Notes - -- The graph name must match an existing graph that the user has access to. -- Either `additional_filters` or `additional_documents` (or both) should be provided; otherwise, no new content will be added to the graph. -- When using `additional_filters`, these are applied in addition to any filters used during graph creation. -- The `prompt_overrides` are applied only to this update operation and do not permanently change the configuration of the graph. diff --git a/python-sdk/upsert_document_summary.mdx b/python-sdk/upsert_document_summary.mdx new file mode 100644 index 0000000..e7a0303 --- /dev/null +++ b/python-sdk/upsert_document_summary.mdx @@ -0,0 +1,67 @@ +--- +title: "upsert_document_summary" +description: "Create or update a document summary" +--- + + + + ```python + def upsert_document_summary( + document_id: str, + content: str, + versioning: bool = True, + overwrite_latest: bool = False, + ) -> Summary + ``` + + + ```python + async def upsert_document_summary( + document_id: str, + content: str, + versioning: bool = True, + overwrite_latest: bool = False, + ) -> Summary + ``` + + + +## Parameters + +- `document_id` (str): ID of the document +- `content` (str): Summary content (markdown or plain text) +- `versioning` (bool, optional): Create a new version instead of overwriting. Defaults to True. +- `overwrite_latest` (bool, optional): Overwrite the latest summary when versioning is enabled. Defaults to False. + +## Returns + +- `Summary`: Updated summary payload + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + summary = db.upsert_document_summary( + document_id="doc_123", + content="This report summarizes Q2 performance.", + ) + print(summary.version) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + summary = await db.upsert_document_summary( + document_id="doc_123", + content="This report summarizes Q2 performance.", + ) + print(summary.version) + ``` + + diff --git a/python-sdk/upsert_folder_summary.mdx b/python-sdk/upsert_folder_summary.mdx new file mode 100644 index 0000000..d232fb9 --- /dev/null +++ b/python-sdk/upsert_folder_summary.mdx @@ -0,0 +1,67 @@ +--- +title: "upsert_folder_summary" +description: "Create or update a folder summary" +--- + + + + ```python + def upsert_folder_summary( + folder_id_or_path: str, + content: str, + versioning: bool = True, + overwrite_latest: bool = False, + ) -> Summary + ``` + + + ```python + async def upsert_folder_summary( + folder_id_or_path: str, + content: str, + versioning: bool = True, + overwrite_latest: bool = False, + ) -> Summary + ``` + + + +## Parameters + +- `folder_id_or_path` (str): Folder identifier (UUID, name, or canonical path) +- `content` (str): Summary content (markdown or plain text) +- `versioning` (bool, optional): Create a new version instead of overwriting. Defaults to True. +- `overwrite_latest` (bool, optional): Overwrite the latest summary when versioning is enabled. Defaults to False. + +## Returns + +- `Summary`: Updated summary payload + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + summary = db.upsert_folder_summary( + folder_id_or_path="/projects/alpha", + content="Summary of project alpha documents.", + ) + print(summary.version) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + summary = await db.upsert_folder_summary( + folder_id_or_path="/projects/alpha", + content="Summary of project alpha documents.", + ) + print(summary.version) + ``` + + diff --git a/python-sdk/upsert_summary.mdx b/python-sdk/upsert_summary.mdx new file mode 100644 index 0000000..185ca93 --- /dev/null +++ b/python-sdk/upsert_summary.mdx @@ -0,0 +1,62 @@ +--- +title: "upsert_summary" +description: "Create or update a folder summary" +--- + +This method is available on `Folder` objects. + + + + ```python + def upsert_summary( + content: str, + versioning: bool = True, + overwrite_latest: bool = False, + ) -> Summary + ``` + + + ```python + async def upsert_summary( + content: str, + versioning: bool = True, + overwrite_latest: bool = False, + ) -> Summary + ``` + + + +## Parameters + +- `content` (str): Summary content (markdown or plain text) +- `versioning` (bool, optional): Create a new version instead of overwriting. Defaults to True. +- `overwrite_latest` (bool, optional): Overwrite the latest summary when versioning is enabled. Defaults to False. + +## Returns + +- `Summary`: Updated summary payload + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + folder = db.get_folder("/projects/alpha") + summary = folder.upsert_summary("Summary of project alpha.") + print(summary.version) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + folder = await db.get_folder("/projects/alpha") + summary = await folder.upsert_summary("Summary of project alpha.") + print(summary.version) + ``` + + diff --git a/python-sdk/users.mdx b/python-sdk/users.mdx index 317d357..81212fc 100644 --- a/python-sdk/users.mdx +++ b/python-sdk/users.mdx @@ -96,10 +96,22 @@ The UserScope class provides the same document operations as the main Morphik cl - `list_documents` - List all documents owned by this user - `batch_get_documents` - Get multiple documents by their IDs for this user - `batch_get_chunks` - Get specific chunks by source for this user -- `create_graph` - Create a knowledge graph from this user's documents -- `update_graph` - Update a knowledge graph with new documents from this user - `delete_document_by_filename` - Delete a document by filename for this user +## Scoping to Multiple Folders + +User scopes can include additional folder filters using `additional_folders` on retrieval/list/query helpers: + +```python +user_scope = db.signin("user123") + +# Search within specific folders owned by this user +results = user_scope.retrieve_chunks( + "contract terms", + additional_folders=["/legal/contracts", "/legal/archives"], +) +``` + ## Using Custom LLM Configuration with User Scopes You can pass a custom LLM configuration when querying within a user scope: @@ -483,4 +495,4 @@ When developing applications that serve multiple users, you might need to proces -See [Folder Management](/python-sdk/folders) for more details on working with folder scopes. \ No newline at end of file +See [Folder Management](/python-sdk/folders) for more details on working with folder scopes. diff --git a/python-sdk/wait_for_document_completion.mdx b/python-sdk/wait_for_document_completion.mdx new file mode 100644 index 0000000..f9e160a --- /dev/null +++ b/python-sdk/wait_for_document_completion.mdx @@ -0,0 +1,63 @@ +--- +title: "wait_for_document_completion" +description: "Block until a document finishes processing" +--- + + + + ```python + def wait_for_document_completion( + document_id: str, + timeout_seconds: int = 300, + check_interval_seconds: int = 2, + progress_callback: Optional[Callable[[int, int, str, float], None]] = None, + ) -> Document + ``` + + + ```python + async def wait_for_document_completion( + document_id: str, + timeout_seconds: int = 300, + check_interval_seconds: int = 2, + progress_callback: Optional[Callable[[int, int, str, float], None]] = None, + ) -> Document + ``` + + + +## Parameters + +- `document_id` (str): ID of the document to wait for +- `timeout_seconds` (int, optional): Maximum time to wait for completion. Defaults to 300. +- `check_interval_seconds` (int, optional): Delay between status checks. Defaults to 2. +- `progress_callback` (callable, optional): Receives progress updates as `(current_step, total_steps, step_name, percentage)` + +## Returns + +- `Document`: Updated document metadata once processing completes + +## Examples + + + + ```python + from morphik import Morphik + + db = Morphik() + doc = db.ingest_text("Sample content") + ready = db.wait_for_document_completion(doc.external_id) + print(ready.status) + ``` + + + ```python + from morphik import AsyncMorphik + + async with AsyncMorphik() as db: + doc = await db.ingest_text("Sample content") + ready = await db.wait_for_document_completion(doc.external_id) + print(ready.status) + ``` + + diff --git a/python-sdk/wait_for_graph_completion.mdx b/python-sdk/wait_for_graph_completion.mdx deleted file mode 100644 index 5f6fc06..0000000 --- a/python-sdk/wait_for_graph_completion.mdx +++ /dev/null @@ -1,74 +0,0 @@ ---- -title: "wait_for_graph_completion" -description: "Block until a graph finishes processing" ---- - -When you call `create_graph()` the server immediately returns a **placeholder** -`Graph` with `system_metadata["status"] = "processing"`. Use -`wait_for_graph_completion()` to block until processing is done (or fails). - - - - ```python - def wait_for_graph_completion( - graph_name: str, - timeout_seconds: int = 300, - check_interval_seconds: int = 5, - ) -> Graph - ``` - - - ```python - async def wait_for_graph_completion( - graph_name: str, - timeout_seconds: int = 300, - check_interval_seconds: int = 5, - ) -> Graph - ``` - - - -## Parameters - -- `graph_name` (str): Name of the graph to monitor. -- `timeout_seconds` (int, default `300`): Maximum seconds to wait. -- `check_interval_seconds` (int, default `5`): Seconds between status checks. - -## Returns - -- `Graph`: The completed graph object. Raises `TimeoutError` if the graph does - not finish within the timeout or `RuntimeError` if processing fails. - -## Example (sync) - -```python -from morphik import Morphik -import time - -db = Morphik() - -graph = db.create_graph(name="research_graph", filters={"category": "research"}) -print("Submitted graph creation...") - -# Block until ready -completed = db.wait_for_graph_completion("research_graph") -print("Graph ready!", len(completed.entities), "entities") -``` - -## Example (async) - -```python -from morphik import AsyncMorphik -import asyncio - -async def main(): - async with AsyncMorphik() as db: - graph = await db.create_graph( - name="research_graph", filters={"category": "research"} - ) - print("Submitted graph creation...") - completed = await db.wait_for_graph_completion("research_graph") - print("Graph ready!", len(completed.entities), "entities") - -asyncio.run(main()) -``` \ No newline at end of file diff --git a/using-morphik/morphik-ui.mdx b/using-morphik/morphik-ui.mdx index 2b65c60..7be3bf4 100644 --- a/using-morphik/morphik-ui.mdx +++ b/using-morphik/morphik-ui.mdx @@ -3,7 +3,7 @@ title: 'Morphik UI' description: 'Learn how to use the Morphik UI interface' --- -Morphik provides a graphical user interface so that you can interact directly with your ingested data. This includes support for ingesting documents, retrieving relevant information, chatting with your documents, and visualizing/creating knowledge graphs. In this guide, we'll walk you through how you can set up the Morphik UI and start ingesting! +Morphik provides a graphical user interface so that you can interact directly with your ingested data. This includes support for ingesting documents, retrieving relevant information, and chatting with your documents. In this guide, we'll walk you through how you can set up the Morphik UI and start ingesting! ## Prerequisites @@ -139,7 +139,3 @@ Morphik has support for **automatic metadata extraction** at file ingestion time **Chunk searching** - perfect for testing and iterating on retrieval strategies: ![Morphik UI chunk searching](/images/ui-guide/searching-chunks.png) - -Visualize your **knowledge graphs** to better understand the relationships between your data: - -![Morphik UI knowledge graph](/images/ui-guide/knowledge-graph.png)