Skip to content

Conversation

@mnttnm
Copy link
Owner

@mnttnm mnttnm commented Nov 5, 2025

Summary by CodeRabbit

  • Documentation

    • Added extensive enterprise analysis, roadmap, and phased implementation plans with prioritized features, success metrics, and a 6‑month focus.
    • Added detailed plans for API, data platform integrations, search, compliance/security, observability, multi‑LLM, OCR, and deployment.
  • New Features

    • Multi‑LLM provider support with provider selection, fallback and cost/usage concepts.
    • New API endpoints: chat completions, OCR/extract, and health checks for service status and provider/quotas.

This analysis examines Slicely's current capabilities against 2025 IDP
market demands and provides strategic feature recommendations based on
the 80/20 principle.

Key findings:
- IDP market projected at $6.78B by 2025 (35-40% CAGR)
- 65% of enterprises actively implementing IDP initiatives
- Critical gaps: API access, integrations, compliance features

Top 5 priorities for enterprise readiness:
1. RESTful API layer (REST + GraphQL)
2. Audit logging & RBAC for compliance
3. Snowflake/Databricks integration
4. Elasticsearch integration for advanced search
5. Multi-LLM provider support

Includes:
- Market research and competitive analysis
- Feature recommendations across 4 tiers
- 12-month implementation roadmap
- ROI projections (4.1x return)
- Pricing strategy for enterprise tiers
Created detailed implementation plans for 5 critical enterprise features
based on 80/20 principle analysis:

1. RESTful & GraphQL API Layer (01-api-layer.md)
   - Complete REST API with CRUD operations
   - GraphQL endpoint for complex queries
   - API authentication, rate limiting, versioning
   - Client SDKs for JavaScript and Python
   - OpenAPI/Swagger documentation

2. Enterprise Data Platform Integrations (02-data-platform-integrations.md)
   - Snowflake, Databricks, AWS S3, Azure Blob, GCS
   - Zapier and Make.com integration
   - Job queue system with BullMQ
   - Scheduled exports and batch processing

3. Elasticsearch Integration (03-elasticsearch-integration.md)
   - Hybrid search (keyword + semantic)
   - Faceted search and autocomplete
   - Fuzzy matching and search analytics
   - Real-time indexing via CDC

4. Compliance & Security Suite (04-compliance-security-suite.md)
   - Comprehensive audit logging
   - RBAC with granular permissions
   - PII detection and redaction
   - Data retention policies and GDPR tools

5. Multi-LLM Provider Support (05-multi-llm-provider-support.md)
   - Anthropic Claude, Azure OpenAI, AWS Bedrock, Ollama
   - Provider abstraction layer
   - Automatic fallback and cost tracking
   - Unified interface across all providers

Plus comprehensive roadmap (ENTERPRISE_ROADMAP.md):
- 12-month timeline with 5 phases
- Resource requirements and budget ($726K investment)
- Expected ROI: 7.6x ($5.5M ARR)
- Success metrics and KPIs
- Competitive positioning analysis
- Risk mitigation strategies

All plans include:
- Technical architecture and implementation details
- Database schemas and code examples
- UI/UX designs
- Testing strategies
- Rollout plans
- Success metrics
@vercel
Copy link

vercel bot commented Nov 5, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
slicely Error Error Nov 19, 2025 8:33am

@coderabbitai
Copy link

coderabbitai bot commented Nov 5, 2025

Walkthrough

Adds enterprise strategy and roadmap docs plus detailed implementation plans (API layer, data integrations, Elasticsearch, compliance, multi‑LLM, OCR). Implements new v1 API utilities and endpoints (chat, extract, health), and adds multi‑provider LLM and multi‑method OCR service modules with types, providers, service logic, and UI selector scaffold.

Changes

Cohort / File(s) Summary
Enterprise Strategy & Roadmap
ENTERPRISE_FEATURE_ANALYSIS.md, docs/ENTERPRISE_ROADMAP.md
New enterprise analysis and roadmap documents with gap analysis, tiered feature recommendations, five‑phase implementation timeline, GTM/pricing, ROI model, success metrics, and next steps.
Implementation Plans (API, Data, Search, Compliance, Multi‑LLM, Recommendations)
docs/implementation-plans/01-api-layer.md, docs/implementation-plans/02-data-platform-integrations.md, docs/implementation-plans/03-elasticsearch-integration.md, docs/implementation-plans/04-compliance-security-suite.md, docs/implementation-plans/05-multi-llm-provider-support.md, docs/UPDATED_RECOMMENDATIONS.md
Adds detailed implementation plans: hybrid API design (REST/GraphQL/tRPC), data destination plugins (S3, Snowflake, Databricks, Zapier), Elasticsearch hybrid search and index mappings, compliance/security suite (audit logs, RBAC, PII, retention/GDPR), multi‑LLM provider architecture, and updated multi‑LLM+OCR recommendations with migration/cost guidance.
API v1 Endpoints & Utilities
src/app/api/v1/chat/route.ts, src/app/api/v1/extract/route.ts, src/app/api/v1/health/route.ts, src/app/api/v1/utils/*
Adds v1 API routes: POST /api/v1/chat (chat completions, streaming/non‑streaming, validation, auth, rate limits), POST /api/v1/extract (OCR extraction from base64 or URL with options), GET /api/v1/health (service status). Adds v1 auth and error utilities: API key validation, permission checks, rate limiting, standardized error responses and re‑exports.
LLM Library & Provider Surface
src/lib/llm/types.ts, src/lib/llm/providers.ts, src/lib/llm/llm-service.ts, src/lib/llm/provider-registry.ts, src/lib/llm/index.ts
Adds multi‑provider LLM types, configs, model schemas, provider configs (PROVIDER_CONFIGS), provider factory/getModel helpers, cost calculation, provider availability, LLM service functions (generateCompletion, streaming, structured responses, embeddings, fallback orchestration, compareProviders), registry/initialization, and an index barrel exporting the surface.
OCR Library & Types
src/lib/ocr/types.ts, src/lib/ocr/ocr-service.ts, src/lib/ocr/index.ts
Adds multi‑method OCR types and service: supports LLMWhisperer, DeepSeek, AWS Textract, Tesseract, PDF.js paths; auto method selection, PDF handling, quota checks, structured outputs (tables/forms), errors, usage metadata, and an index barrel.
UI Component (Provider Selector)
components/slicer-settings/LLMProviderSelector.tsx
Adds a UI component scaffold for selecting LLM provider/models and per‑model pricing metadata.
Package dependencies
package.json
Adds new dependencies: @ai-sdk/anthropic, @ai-sdk/google, @ai-sdk/openai, @aws-sdk/client-textract, and ai with specified versions.
SQL Schema Examples (docs)
(documented schemas) llm_configurations, llm_usage, llm_provider_credentials
Adds documented SQL schema sketches for LLM provider configurations, usage tracking, and provider credentials (in docs/implementation plans).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant API as Backend/API
    participant Auth as Auth & RateLimit
    participant LLM as LLMService
    participant Primary as PrimaryProvider
    participant Fallback as FallbackProvider
    Note over LLM: Multi‑LLM orchestration and fallback (new)
    Client->>API: POST /api/v1/chat (payload)
    API->>Auth: validate API key & permissions
    Auth-->>API: allowed
    API->>LLM: requestCompletion(payload, preferredProvider)
    alt Primary succeeds
        LLM->>Primary: callCompletion
        Primary-->>LLM: response
        LLM-->>API: response (usage, provider meta)
        API-->>Client: 200 JSON / stream
    else Primary fails
        LLM->>Primary: callCompletion
        Primary--x LLM: error
        LLM->>Fallback: callCompletion
        Fallback-->>LLM: response
        LLM-->>API: response (logs failure, usage)
        API-->>Client: 200 JSON (with provider metadata)
    end
Loading
sequenceDiagram
    autonumber
    participant Client
    participant API as Backend/API
    participant Auth as Auth & RateLimit
    participant OCR as OCRService
    participant LLMW as LLMWhisperer
    participant Textract as AWS Textract
    Note over OCR: Multi‑method OCR selection and extraction
    Client->>API: POST /api/v1/extract (file or url + options)
    API->>Auth: validate API key & permissions
    Auth-->>API: allowed
    API->>OCR: extractText(input, options)
    OCR->>OCR: selectOCRMethod (quota, availability, options)
    alt method == llmwhisperer
        OCR->>LLMW: callWhisperer
        LLMW-->>OCR: text + meta
    else method == textract
        OCR->>Textract: analyzeDocument
        Textract-->>OCR: text, tables, forms
    else other
        OCR-->>OCR: use DeepSeek/Tesseract/PDF.js
    end
    OCR-->>API: OCRResult (text, markdown, tables, forms)
    API-->>Client: 200 JSON (includes rate limit headers)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

  • Reasons: Large volume of new documentation plus multiple new runtime modules and API endpoints that introduce cross‑cutting auth, error handling, rate limiting, multi‑provider orchestration, OCR backends, and added dependencies.
  • Areas needing extra attention:
    • Security: API key handling, permission model, credential storage/encryption, and env dependency checks.
    • Error and retry semantics across LLM providers and OCR backends (LLMError/OCRError handling and retryability).
    • Rate limiting logic and header semantics in v1 utilities and endpoints.
    • Consistency between TypeScript types in src/lib/llm/* and provider implementations/configs.
    • AWS Textract dynamic imports and dependency use (@aws-sdk/client-textract) and graceful fallback behavior.
    • Tests and rollout steps referenced in docs vs. implemented behavior in code (endpoints and services).

"🐰
I hopped through specs and roadmap glades,
APIs and LLMs in layered braids,
OCR and search, a carrot-sweet blend,
New providers dance — I cheer and mend! 🥕"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'Research app features and market demands' is vague and does not accurately reflect the substantial implementation work. The changeset includes 14+ files with comprehensive API, OCR, LLM, compliance, and enterprise roadmap implementations—not research documentation. Revise the title to reflect the actual deliverables, such as 'Add enterprise API layer, multi-LLM support, OCR integration, and compliance suite' or similar to accurately communicate the scope of implementation changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/analyze-app-features-research-011CUpJpDTmF3pQSq778jaTb

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (14)
docs/implementation-plans/02-data-platform-integrations.md (5)

535-547: CSV escaping is incomplete (newlines, quotes, commas).

Current logic misses newline and leading/trailing spaces cases. Use a dedicated escaper.

-      keys.map((key) => {
-        const value = record[key];
-        // Escape quotes and wrap in quotes if contains comma
-        if (typeof value === 'string' && (value.includes(',') || value.includes('"'))) {
-          return `"${value.replace(/"/g, '""')}"`;
-        }
-        return value ?? '';
-      }).join(',')
+      keys.map((key) => {
+        const v = record[key] ?? '';
+        const s = typeof v === 'string' ? v : String(v);
+        const needsQuotes = /[",\n\r]/.test(s);
+        const escaped = s.replace(/"/g, '""');
+        return needsQuotes ? `"${escaped}"` : escaped;
+      }).join(',')

549-566: Parquet writer usage likely incorrect.

parquetjs typically requires a ParquetSchema and a concrete write target; openStream/openBuffer semantics differ. Double-check API, add explicit schema construction, and avoid returning writer.outputStream internals.

Example sketch:

-    const { ParquetWriter } = await import('parquetjs');
-    const schema = this.inferParquetSchema(records[0]);
-    const writer = await ParquetWriter.openStream(schema);
+    const { ParquetWriter, ParquetSchema } = await import('parquetjs');
+    const schema = new ParquetSchema(this.inferParquetSchema(records[0]));
+    const chunks: Buffer[] = [];
+    const writer = await ParquetWriter.openStream(schema, {
+      write: (b: Buffer) => chunks.push(b),
+      end: () => {},
+    } as any);
...
-    return writer.outputStream.getContents();
+    return Buffer.concat(chunks);

894-910: Databricks SQL Statement API likely needs polling.

POSTing a statement usually returns an id; you must poll until status is 'SUCCEEDED' (or use sync param if available). Current code returns immediately.


921-925: Coupling: reusing private method via bracket access.

Calling S3Integration['toParquet'] breaks encapsulation and minifies poorly. Extract a shared parquet utility.


229-248: Schemas: add guardrails (constraints, RLS, audit).

  • Add NOT NULL where appropriate (plugin_type, config).
  • Unique constraint per organization on (name) to prevent duplicates.
  • Add updated_at trigger ON UPDATE.
  • Consider RLS policies by organization_id/user_id.
  • export_jobs: add updated_at, an error_code field, and composite index (destination_id,status,created_at) for queues.

Also applies to: 268-301, 304-327

docs/implementation-plans/01-api-layer.md (1)

917-964: Error responses may leak internal messages.

Avoid echoing raw error.message to clients for 5xx and security-sensitive 4xx; use stable titles/details and map codes.

docs/implementation-plans/04-compliance-security-suite.md (2)

845-875: GDPR deletion: review deletion of audit logs.

Some regimes require retaining certain logs; blanket deletion of audit_logs by user_id may violate retention/legal hold. Gate by policy and legal basis.


477-485: Use cloud-native credentials (IAM roles) over static keys.

Avoid long-lived AWS keys in env; prefer IAM roles (e.g., IRSA on Kubernetes, task roles on ECS/Lambda).

docs/implementation-plans/05-multi-llm-provider-support.md (3)

444-457: Azure provider method incomplete.

completion() returns partial code with "same as OpenAI" comment; fill in or mark as TODO to prevent accidental use.


760-811: Add timeouts, retries, and per-provider error taxonomy.

LLM calls are flaky. Wrap provider.completion with bounded timeouts, limited retries, and map provider errors to consistent codes; also support cancellation.


813-823: Usage tracking: add org/user context and request type.

Insert organization_id, user_id, request_type to llm_usage for accurate cost allocation.

docs/implementation-plans/03-elasticsearch-integration.md (1)

232-236: Many indices (one per user) will strain ES.

Prefer a shared index with user_id as a keyword and filter queries, plus index-level RBAC if needed.

docs/ENTERPRISE_ROADMAP.md (1)

570-586: Optional: clarify infra cost assumptions.

Add footnotes for managed vs self-hosted costs and expected workload (docs/month, QPS) to make comparisons actionable.

ENTERPRISE_FEATURE_ANALYSIS.md (1)

1201-1201: Markdown hygiene: fix headings, lists, and bare URL.

  • Replace emphasized lines acting as headings with proper markdown headings (##/###).
  • Normalize list indentation (MD007).
  • Convert bare URLs to links.

Also applies to: 1487-1504

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9b54f2 and 551c12c.

📒 Files selected for processing (7)
  • ENTERPRISE_FEATURE_ANALYSIS.md (1 hunks)
  • docs/ENTERPRISE_ROADMAP.md (1 hunks)
  • docs/implementation-plans/01-api-layer.md (1 hunks)
  • docs/implementation-plans/02-data-platform-integrations.md (1 hunks)
  • docs/implementation-plans/03-elasticsearch-integration.md (1 hunks)
  • docs/implementation-plans/04-compliance-security-suite.md (1 hunks)
  • docs/implementation-plans/05-multi-llm-provider-support.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
ENTERPRISE_FEATURE_ANALYSIS.md

[uncategorized] ~1250-~1250: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... outputs) - Week 3: Authentication & rate limiting - Week 4: API documentation (Swagger...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.18.1)
ENTERPRISE_FEATURE_ANALYSIS.md

152-152: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


158-158: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


165-165: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


171-171: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


177-177: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


336-336: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


365-365: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


385-385: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


404-404: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


411-411: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


430-430: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


438-438: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


477-477: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


500-500: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


501-501: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


502-502: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


609-609: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


629-629: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


654-654: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


678-678: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


687-687: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


703-703: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


753-753: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


773-773: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


805-805: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


821-821: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


828-828: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


858-858: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


869-869: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


879-879: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


898-898: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


906-906: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


951-951: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


969-969: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


990-990: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


999-999: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1036-1036: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1053-1053: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1081-1081: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1094-1094: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1123-1123: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1145-1145: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1168-1168: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1201-1201: Bare URL used

(MD034, no-bare-urls)


1487-1487: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1495-1495: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


1504-1504: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/implementation-plans/03-elasticsearch-integration.md

52-52: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (1)
docs/implementation-plans/03-elasticsearch-integration.md (1)

220-236: indices.exists return handling.

@elastic/elasticsearch v8 may not return a boolean unless you use the transport option; verify and adapt to the actual return shape.

Comment on lines +153 to +156
CREATE INDEX idx_api_keys_user_id ON api_keys(user_id);
CREATE INDEX idx_api_keys_key_hash ON api_keys(key_hash);
CREATE UNIQUE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unique index on key_prefix will block multiple keys.

key_prefix holds 'sk_live_'/'sk_test_'; making it UNIQUE allows only one key per environment.

Fix by dropping uniqueness or using a separate random prefix/id:

-CREATE UNIQUE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix);
+-- REMOVE unique index; optionally:
+-- CREATE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix);
🤖 Prompt for AI Agents
In docs/implementation-plans/01-api-layer.md around lines 153 to 156 the CREATE
UNIQUE INDEX on api_keys(key_prefix) is incorrect because key_prefix contains
environment prefixes like 'sk_live_'/'sk_test_' and will prevent storing
multiple keys per environment; remove the UNIQUE constraint by changing to a
non-unique index (CREATE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix))
or alternatively add a dedicated, random unique identifier column (e.g., key_id
or key_prefix_random) and create a UNIQUE index on that new column while keeping
key_prefix non-unique.

Comment on lines +685 to +692
// Look up API key in database
const apiKeyRecord = await supabase
.from('api_keys')
.select('*, user:users(*)')
.eq('key_hash', keyHash)
.eq('is_active', true)
.single();

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t join against auth.users; return user_id instead.

Supabase PostgREST cannot join to auth.users in most setups; selecting user:users(*) likely fails. Return user_id and fetch via Admin API if needed, or maintain an application users table.

- .select('*, user:users(*)')
+ .select('id, user_id, scopes, rate_limit_tier, expires_at, is_active')
...
-return apiKeyRecord.user;
+return { id: apiKeyRecord.user_id };

Also applies to: 709-710

🤖 Prompt for AI Agents
In docs/implementation-plans/01-api-layer.md around lines 685 to 692 (and also
apply the same change at lines 709-710), the code is joining to auth.users via
select('*, user:users(*)') which will fail in most Supabase/PostgREST setups;
change the query to return the foreign key (user_id) instead of trying to join
to auth.users, then fetch the full user record separately via the Supabase Admin
API or from your application users table as needed; update both occurrences to
select the user_id and add a follow-up step to retrieve the user using the
appropriate admin/user-table method.

Comment on lines +779 to +800
export async function checkRateLimit(apiKeyId: string, tier: string): Promise<void> {
const limiter = rateLimiters[tier] || rateLimiters.standard;

const { success, limit, reset, remaining } = await limiter.limit(apiKeyId);

if (!success) {
throw new APIError('Rate limit exceeded', 429, {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': '0',
'X-RateLimit-Reset': reset.toString(),
'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
});
}

// Set rate limit headers (return these in response)
return {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': remaining.toString(),
'X-RateLimit-Reset': reset.toString(),
};
}
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Type mismatch: function returns headers but signature says void.

checkRateLimit currently returns a headers object; update return type and callers.

-export async function checkRateLimit(apiKeyId: string, tier: string): Promise<void> {
+export async function checkRateLimit(apiKeyId: string, tier: string): Promise<Record<string,string>> {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function checkRateLimit(apiKeyId: string, tier: string): Promise<void> {
const limiter = rateLimiters[tier] || rateLimiters.standard;
const { success, limit, reset, remaining } = await limiter.limit(apiKeyId);
if (!success) {
throw new APIError('Rate limit exceeded', 429, {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': '0',
'X-RateLimit-Reset': reset.toString(),
'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
});
}
// Set rate limit headers (return these in response)
return {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': remaining.toString(),
'X-RateLimit-Reset': reset.toString(),
};
}
```
export async function checkRateLimit(apiKeyId: string, tier: string): Promise<Record<string,string>> {
const limiter = rateLimiters[tier] || rateLimiters.standard;
const { success, limit, reset, remaining } = await limiter.limit(apiKeyId);
if (!success) {
throw new APIError('Rate limit exceeded', 429, {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': '0',
'X-RateLimit-Reset': reset.toString(),
'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
});
}
// Set rate limit headers (return these in response)
return {
'X-RateLimit-Limit': limit.toString(),
'X-RateLimit-Remaining': remaining.toString(),
'X-RateLimit-Reset': reset.toString(),
};
}
🤖 Prompt for AI Agents
In docs/implementation-plans/01-api-layer.md around lines 779 to 800, the
function checkRateLimit is declared to return Promise<void> but actually returns
an object of headers; change the signature to return
Promise<Record<string,string>> (or a named Headers type) and update all callers
to await the function and use the returned headers (or merge them into the
response) instead of ignoring the result; keep the existing APIError throw
behavior intact and ensure typing reflects header string values.

Comment on lines +344 to +348
const supabase = createClient(
process.env.SUPABASE_URL!,
process.env.SUPABASE_SERVICE_ROLE_KEY! // Service role key required
);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Service role key must never be exposed to clients.

Ensure createClient with SUPABASE_SERVICE_ROLE_KEY runs only in trusted server contexts (no edge/functions with potential leakage, no client bundles). Add a guardrail comment and environment segregation.


I'll verify this review comment by examining the file context and consulting Supabase security best practices.


🏁 Script executed:

#!/bin/bash
# Examine the file and context around lines 344-348
cat "docs/implementation-plans/02-data-platform-integrations.md" | sed -n '330,360p'

Length of output: 848


🏁 Script executed:

#!/bin/bash
# Search for other uses of SUPABASE_SERVICE_ROLE_KEY in the codebase
rg -n "SUPABASE_SERVICE_ROLE_KEY" -C 5

Length of output: 2074


🏁 Script executed:

#!/bin/bash
# Check for any environment configuration or guardrails
rg -n "createClient" -C 3 --type=ts --type=js --type=tsx --type=jsx

Length of output: 27


🏁 Script executed:

#!/bin/bash
# Verify the file type and its purpose
head -50 "docs/implementation-plans/02-data-platform-integrations.md"

Length of output: 1461


🌐 Web query:

What are Supabase service role key security best practices and risks?

💡 Result:

Summary — risks

  • Full-bypass of Row Level Security (RLS): service_role keys bypass RLS so a leak = full read/modify/delete of your DB. [1][2]
  • Data breach & compliance exposure: attacker access → exfiltration, tampering, regulatory/reporting incidents. [6]
  • Supply‑chain/public repo leaks: accidental commits or CI logs can expose keys. [7]
  • Operational impact: abused keys can break apps, cause cost spikes, or force emergency rotation/restarts. [4][7]

Security best practices (concise)

  • Never use service_role in client-side code—keep it strictly server-side or in trusted runtime (server, Edge Function, or private job). [1][2][5]
  • Minimize use: prefer anon key + RLS for normal app flows; only use service_role for backend jobs that truly need bypass. [1][6]
  • Use least privilege patterns: create narrow APIs (server endpoints, RPCs) that perform needed operations rather than giving direct DB access with the key. Consider Postgres roles/RPCs to limit scope. [5]
  • Protect secrets in a vault/secret manager or environment secrets (not in code/commits). Use cloud secret managers or Supabase Edge Function secrets. [0][1]
  • Rotate regularly and on suspicion: rotate service_role/JWT secrets promptly; be aware rotation invalidates active connections. Automate rotation if possible. [8]
  • Monitor & alert: enable auditing/logging, watch unusual queries or bulk exports, and alert on anomalous use. Integrate secret-scanning for repos. [7][6]
  • Revoke and remediate fast: have an incident playbook to revoke keys, rotate, and re-deploy with new secrets. [8][7]
  • Prefer Edge Functions / serverless with scoped secrets for third‑party webhooks/integrations (e.g., Stripe) instead of embedding service_role in backend code accessible to multiple teams. [0][5]
  • Use network controls where possible: IP allowlists for backend jobs or service accounts, and enforce TLS. [5]

Quick operational checklist

  • Audit where service_role is used and replace with RPCs or server endpoints where possible. [5]
  • Move keys to secret manager; remove from repos/CI logs; add secret scanning. [7]
  • Implement monitoring + alerts for abnormal queries or large exports. [7][6]
  • Document rotation process and test it (note: rotation restarts services). [8]

Sources

  • Supabase docs — securing data; service_role guidance (never expose, use server-side). [1]
  • Supabase docs — Edge Functions environment secrets & guidance (service_role usable in Edge Functions, not browsers). [0]
  • Supabase docs — API/serverless guidance; service_role explanations and safeguards. [5]
  • Supabase docs — rotating anon/service/JWT secrets (procedure & impact). [8]
  • Community/ops guidance on protecting keys, using server endpoints/RPCs and edge functions. [3][6]
  • GitGuardian / security posts on risks and detection of leaked service_role tokens. [7]

Add explicit server-only context and environment segregation guardrails to documentation.

The documentation currently shows correct backend pattern but lacks critical security warnings. Per Supabase best practices, service_role keys must never be used in client-side code and should be kept strictly server-side. The existing comment "Service role key required" does not adequately communicate this constraint.

Update the code comment and add a preceding caution block to explicitly state:

  • This function runs server-side only (no client bundles, no edge functions accessible to untrusted code)
  • Service role key must reside in .env with appropriate access controls
  • Environment segregation: never log, expose, or commit the key
  • Link to Supabase vault security best practices for implementers
🤖 Prompt for AI Agents
In docs/implementation-plans/02-data-platform-integrations.md around lines 344
to 348, the inline comment beside the Supabase service_role key is
insufficiently warning — add a clear server-only caution block immediately above
the code that states this function must run server-side only (no client bundles
or publicly accessible edge functions), that the service_role key must be stored
in a protected .env with strict access controls and never logged, exposed, or
committed, and update the inline comment to explicitly state "Server-side only —
do not expose service_role key to clients"; also include a link to Supabase
vault/security best practices for implementers and recommend environment
segregation guardrails (no client access, restrict CI/ops, and rotate keys).

Comment on lines +389 to +413
// Get credential record
const { data: credRecord, error } = await supabase
.from('encrypted_credentials')
.select('vault_secret_id')
.eq('id', credentialId)
.single();

if (error) {
throw new Error(`Credential not found: ${error.message}`);
}

// Retrieve from Vault
const { data: vaultSecret, error: vaultError } = await supabase
.from('vault.secrets')
.select('secret')
.eq('id', credRecord.vault_secret_id)
.single();

if (vaultError) {
throw new Error(`Failed to retrieve credentials: ${vaultError.message}`);
}

return JSON.parse(vaultSecret.decrypted_secret); // Automatically decrypted
}
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Supabase Vault retrieval bug and table access assumptions.

  • You select "secret" from vault.secrets but later read vaultSecret.decrypted_secret. Field mismatch; will throw. Also, accessing "vault.secrets" via supabase-js may not work depending on Vault exposure/permissions.

Suggested fix:

-  const { data: vaultSecret, error: vaultError } = await supabase
-    .from('vault.secrets')
-    .select('secret')
+  const { data: vaultSecret, error: vaultError } = await supabase
+    .from('vault.secrets')
+    .select('decrypted_secret')
     .eq('id', credRecord.vault_secret_id)
     .single();

-  return JSON.parse(vaultSecret.decrypted_secret); // Automatically decrypted
+  return JSON.parse(vaultSecret.decrypted_secret);

Also verify that Vault access is server-only and enabled in your project; consider a dedicated secrets API rather than querying the table directly.


[/run_scripts]

[/web_search]


🏁 Script executed:

#!/bin/bash
# Verify the file exists and check its actual content at the specified lines
if [ -f "docs/implementation-plans/02-data-platform-integrations.md" ]; then
  echo "=== File exists, showing lines 389-413 ==="
  sed -n '389,413p' "docs/implementation-plans/02-data-platform-integrations.md"
  echo ""
  echo "=== Context: lines 380-420 ==="
  sed -n '380,420p' "docs/implementation-plans/02-data-platform-integrations.md"
else
  echo "File not found"
fi

Length of output: 1688


🌐 Web query:

Supabase Vault API select fields decrypted_secret correct usage

💡 Result:

Short answer: query the provided view vault.decrypted_secrets and include the decrypted_secret column in your SELECT. The view decrypts on-the-fly, so usage is just normal SQL, but you must ensure the DB role you’re using has permission to read the view (and you should lock down access because anyone who can select from the view sees plaintext). Example:

  • Basic:
    select id, name, decrypted_secret from vault.decrypted_secrets;

  • By id:
    select id, name, decrypted_secret
    from vault.decrypted_secrets
    where id = '7095d222-efe5-4cd5-b5c6-5755b451e223';

Notes:

  • decrypted_secret is exposed only via the view (vault.secrets stores the ciphertext). The view decrypts at query time. Protect access with proper privileges/RLS. [1][2]

Sources:
[1] Supabase Vault docs — Using Vault / decrypted_secrets.
[2] supabase/vault GitHub README (examples showing decrypted_secret in the view).


Fix Supabase Vault table and field references.

The code queries vault.secrets but references a field that only exists on the vault.decrypted_secrets view. The view decrypts on-the-fly, so decrypted_secret is exposed only via the view, not the underlying table. Update to:

  const { data: vaultSecret, error: vaultError } = await supabase
-   .from('vault.secrets')
-   .select('secret')
+   .from('vault.decrypted_secrets')
+   .select('decrypted_secret')
    .eq('id', credRecord.vault_secret_id)
    .single();

- return JSON.parse(vaultSecret.decrypted_secret); // Automatically decrypted
+ return JSON.parse(vaultSecret.decrypted_secret);

Also verify that Vault access is restricted to server-only contexts and that SERVICE_ROLE_KEY usage here is intentional; consider whether a dedicated secrets management API would be safer than direct table access.

🤖 Prompt for AI Agents
In docs/implementation-plans/02-data-platform-integrations.md around lines 389
to 413, the code queries the underlying vault.secrets table but then tries to
read decrypted_secret which only exists on the vault.decrypted_secrets view;
change the Supabase query to select from the vault.decrypted_secrets view and
select the decrypted_secret field (e.g.,
.from('vault.decrypted_secrets').select('decrypted_secret').eq('id',
credRecord.vault_secret_id).single()), and ensure this code runs only in a
server-only context using the SERVICE_ROLE_KEY (or replace direct DB access with
a dedicated secrets-management API) so decryption access is restricted to
trusted backend environments.

Comment on lines +792 to +795
<p className="text-sm text-muted-foreground">
{results.length} results ({took}ms)
</p>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

UI uses took variable but it’s not defined in component state.

Add took to state from API response to avoid runtime errors.

🤖 Prompt for AI Agents
In docs/implementation-plans/03-elasticsearch-integration.md around lines 792 to
795, the JSX references a took variable that is not defined in the component
state causing runtime errors; add a took state property (e.g., initialize via
useState(0) or include in the component's state object) and set its value from
the search API response when results are fetched (parse the response.took and
update state alongside results), and ensure the UI uses this state variable so
took is never undefined.

Comment on lines +195 to +209
private sanitizePayload(payload: any): any {
if (!payload) return null;

// Remove sensitive fields
const sensitiveFields = ['password', 'apiKey', 'token', 'secret', 'creditCard'];
const sanitized = { ...payload };

for (const field of sensitiveFields) {
if (field in sanitized) {
sanitized[field] = '[REDACTED]';
}
}

return sanitized;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Sanitization should be deep and include nested secrets.

Current sanitizePayload only redacts top-level fields. Traverse recursively and handle nested arrays/objects.

🤖 Prompt for AI Agents
In docs/implementation-plans/04-compliance-security-suite.md around lines 195 to
209, sanitizePayload only redacts top-level keys; update it to perform a deep
recursive traversal that handles nested objects and arrays, redacting any
sensitive keys (password, apiKey, token, secret, creditCard) at any depth and
returning a new sanitized structure (avoid mutating original), skip or safely
handle circular references (e.g., track visited nodes) and preserve
non-sensitive primitive values unchanged.

Comment on lines +220 to +271
// Middleware for Next.js API routes
export function auditMiddleware(handler: any) {
return async (req: NextRequest, res: NextResponse) => {
const startTime = Date.now();
const requestId = req.headers.get('X-Request-ID') || crypto.randomUUID();

try {
const response = await handler(req, res);

// Log successful request
await auditLogger.log({
userId: req.user?.id,
eventType: `api.${req.method.toLowerCase()}`,
eventCategory: 'data',
action: this.mapMethodToAction(req.method),
resourceType: this.extractResourceType(req.url),
ipAddress: req.ip,
userAgent: req.headers.get('User-Agent'),
requestId,
requestMethod: req.method,
requestPath: req.url,
responseStatus: response.status,
status: 'success',
metadata: {
responseTime: Date.now() - startTime,
},
});

return response;
} catch (error: any) {
// Log failed request
await auditLogger.log({
userId: req.user?.id,
eventType: `api.${req.method.toLowerCase()}.failed`,
eventCategory: 'data',
action: this.mapMethodToAction(req.method),
resourceType: this.extractResourceType(req.url),
ipAddress: req.ip,
userAgent: req.headers.get('User-Agent'),
requestId,
requestMethod: req.method,
requestPath: req.url,
status: 'error',
errorMessage: error.message,
severity: error.statusCode === 403 ? 'warning' : 'info',
});

throw error;
}
};
}
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Middleware uses this. in a closure — will be undefined.*

this.mapMethodToAction / this.extractResourceType will throw. Use module-level helpers or call static functions.

- action: this.mapMethodToAction(req.method),
- resourceType: this.extractResourceType(req.url),
+ action: mapMethodToAction(req.method),
+ resourceType: extractResourceType(req.url),

Provide those helpers in scope.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +574 to +583
const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
while ((match = emailRegex.exec(text)) !== null) {
entities.push({
type: 'EMAIL',
value: match[0],
start: match.index,
end: match.index + match[0].length,
confidence: 0.95,
});
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Email regex includes a literal '|' in TLD class.

[A-Z|a-z]{2,} allows pipe char.

-const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
+const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
while ((match = emailRegex.exec(text)) !== null) {
entities.push({
type: 'EMAIL',
value: match[0],
start: match.index,
end: match.index + match[0].length,
confidence: 0.95,
});
}
const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g;
while ((match = emailRegex.exec(text)) !== null) {
entities.push({
type: 'EMAIL',
value: match[0],
start: match.index,
end: match.index + match[0].length,
confidence: 0.95,
});
}
🤖 Prompt for AI Agents
In docs/implementation-plans/04-compliance-security-suite.md around lines
574-583, the email regex character class [A-Z|a-z]{2,} wrongly allows the pipe
character; replace that class with a correct TLD character class (e.g.
[A-Za-z]{2,}) or remove the explicit case range and add the case-insensitive
flag (/i) to the regex, ensuring the pattern no longer permits '|' and continues
to match valid TLDs.

Comment on lines +767 to +775
private async deleteRecords(resourceType: string, records: any[]): Promise<void> {
const ids = records.map((r) => r.id);

const tableName = resourceType === 'pdf' ? 'pdfs' :
resourceType === 'output' ? 'outputs' :
'audit_logs';

await supabase.from(tableName).delete().in('id', ids);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retention deletion fallback risks deleting audit logs by mistake.

Defaulting unknown resourceType to 'audit_logs' is dangerous. Fail closed or throw on unknown type.

-  const tableName = resourceType === 'pdf' ? 'pdfs' :
-                    resourceType === 'output' ? 'outputs' :
-                    'audit_logs';
+  const tableName =
+    resourceType === 'pdf' ? 'pdfs' :
+    resourceType === 'output' ? 'outputs' :
+    resourceType === 'audit_log' ? 'audit_logs' :
+    null;
+  if (!tableName) throw new Error(`Unsupported resource type: ${resourceType}`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private async deleteRecords(resourceType: string, records: any[]): Promise<void> {
const ids = records.map((r) => r.id);
const tableName = resourceType === 'pdf' ? 'pdfs' :
resourceType === 'output' ? 'outputs' :
'audit_logs';
await supabase.from(tableName).delete().in('id', ids);
}
private async deleteRecords(resourceType: string, records: any[]): Promise<void> {
const ids = records.map((r) => r.id);
const tableName =
resourceType === 'pdf' ? 'pdfs' :
resourceType === 'output' ? 'outputs' :
resourceType === 'audit_log' ? 'audit_logs' :
null;
if (!tableName) throw new Error(`Unsupported resource type: ${resourceType}`);
await supabase.from(tableName).delete().in('id', ids);
}
🤖 Prompt for AI Agents
In docs/implementation-plans/04-compliance-security-suite.md around lines 767 to
775, the deleteRecords function currently defaults unknown resourceType to
'audit_logs', which risks deleting audit logs by mistake; change the
implementation to explicitly map allowed resourceType values ('pdf' -> 'pdfs',
'output' -> 'outputs') and throw an error (or reject) for any unknown
resourceType, performing no deletion; also validate that ids is non-empty before
calling supabase.delete() to avoid unnecessary operations.

Based on industry research and best practices, updated implementation
recommendations to use battle-tested, widely-adopted frameworks:

## Multi-LLM Provider: Vercel AI SDK (Recommended)

Previous approach: Custom provider abstraction layer
Updated approach: Vercel AI SDK + optional LiteLLM

Benefits:
- 100+ providers with unified interface
- Native streaming, tool calling, structured outputs
- Model Context Protocol support
- Better maintained (Vercel backing)
- Perfect fit for TypeScript/Next.js stack
- 12K+ GitHub stars

Alternative: LiteLLM for Python pipelines
- 1.8K+ models, 80+ providers
- Built-in cost tracking, load balancing, guardrails
- 20K+ GitHub stars, Y Combinator backed
- Can run as AI Gateway proxy server

## OCR Solutions: Multi-Method Approach

Previous: AWS Textract only
Updated: LLMWhisperer + DeepSeek-OCR + AWS Textract + Tesseract

1. LLMWhisperer (Unstract) - Primary
   - Layout preservation for LLM preprocessing
   - Free tier: 100 pages/day
   - Multiple modes: native_text, high_quality, form
   - Document enhancement (de-skewing, contrast)

2. DeepSeek-OCR - High-Volume
   - 97% accuracy with 10x compression
   - 200K+ pages/day on single A100 GPU
   - Self-hosted, $0 per page
   - Best for complex layouts, formulas

3. AWS Textract - Forms & Tables
   - Enterprise-grade reliability
   - Best for forms, key-value extraction
   - Pay-per-use

4. Tesseract - Fallback
   - Free, open-source
   - Simple text extraction

Cost savings: 60-80% vs AWS Textract only

Includes:
- Detailed Vercel AI SDK implementation examples
- LiteLLM proxy server configuration
- OCR decision logic and method selection
- Code examples for all solutions
- Migration path from current implementation
- Cost comparison and architecture diagrams
- Updated timeline: 4-5 weeks vs 6 weeks previous

Recommendation: Adopt Vercel AI SDK + LLMWhisperer as primary,
with DeepSeek-OCR for high-volume and AWS Textract for forms.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
docs/UPDATED_RECOMMENDATIONS.md (1)

690-694: Clarify DeepSeek-OCR hardware requirements.

The hardware requirements section (lines 690-693) mentions "NVIDIA A100-40G (minimum)" for production but also notes "RTX 4090 (slower but works)". This is ambiguous—clarify whether A100 is the recommended minimum or an aspirational target, and provide more realistic entry-level GPU options.

Revise to:

 **Hardware Requirements:**
-- **Minimum:** NVIDIA A100-40G (for production speed)
-- **Budget:** RTX 4090 (slower but works)
-- **Performance:** 2500 tokens/s on A100, 200K+ pages/day
+- **Recommended (Production):** NVIDIA A100-40G or H100 (2500+ tokens/s, 200K+ pages/day)
+- **Budget Alternative:** RTX 4090 (significantly slower, ~200-400 tokens/s)
+- **Minimum:** RTX 3090 or A10 (not recommended for high-volume production)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 551c12c and 901020c.

📒 Files selected for processing (1)
  • docs/UPDATED_RECOMMENDATIONS.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/UPDATED_RECOMMENDATIONS.md

[grammar] ~800-~800: Use a hyphen to join words.
Context: ...support (self-hosted) - ✅ Implement cost tracking middleware - ✅ UI for provider ...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
docs/UPDATED_RECOMMENDATIONS.md

431-431: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


831-831: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (3)
docs/UPDATED_RECOMMENDATIONS.md (3)

302-329: Code example references undefined supabase import.

The cost-tracking middleware code (lines 316-323) calls supabase.from('llm_usage').insert() without showing the import or initialization. While this is documentation, it could mislead readers about the source of this object.

Add a comment or code snippet showing how supabase is imported and initialized, e.g.:

// Add near the top of the middleware code example
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_ANON_KEY!
);

1-11: Document coordinates well with PR context.

The document appropriately frames itself as an update to previous recommendations and aligns well with the enterprise roadmap and implementation plans mentioned in the enriched summary. The structured approach (comparison → detailed implementation → costs → timeline) is logical and professional.


829-880: Architecture diagrams are clear and comprehensive.

The two ASCII architecture diagrams effectively illustrate the recommended system design (Vercel AI SDK frontend, OCR service backend, and optional LiteLLM proxy). They provide good visual reference for implementers.

Comment on lines +74 to +115
npm install ai @ai-sdk/openai @ai-sdk/anthropic @ai-sdk/google @ai-sdk/mistral
```

### 2.2 Provider Configuration

```typescript
// src/lib/llm/providers.ts

import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { createOpenAI } from '@ai-sdk/openai'; // For Azure, custom endpoints

// OpenAI
export const openaiProvider = openai;

// Anthropic Claude
export const claudeProvider = anthropic;

// Google Gemini
export const geminiProvider = google;

// Azure OpenAI (custom endpoint)
export const azureProvider = createOpenAI({
baseURL: process.env.AZURE_OPENAI_ENDPOINT,
apiKey: process.env.AZURE_OPENAI_KEY,
organization: '',
});

// Ollama (local/self-hosted)
export const ollamaProvider = createOpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama', // Ollama doesn't require real API key
});

// AWS Bedrock (via custom provider)
export const bedrockProvider = createOpenAI({
baseURL: process.env.BEDROCK_ENDPOINT,
apiKey: process.env.AWS_ACCESS_KEY,
// Note: Bedrock requires AWS signature, may need middleware
});
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify Vercel AI SDK provider package availability and naming.

The installation and provider configuration code uses specific package names (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, @ai-sdk/mistral). Ensure these packages exist and are the correct names in the current npm registry.


🌐 Web query:

What are the correct and latest official Vercel AI SDK provider packages available as of November 2025?

💡 Result:

Here are the official Vercel AI SDK provider packages (first‑party / Vercel‑maintained) available as of November 13, 2025:

  • @ai-sdk/openai. [1][2]
  • @ai-sdk/anthropic. [1][2]
  • @ai-sdk/google (Google Generative AI). [1][2]
  • @ai-sdk/google-vertex (Google Vertex). [1][2]
  • @ai-sdk/azure (Azure AI). [3][2]
  • @ai-sdk/cohere. [1][2]
  • @ai-sdk/mistral. [4][2]
  • @ai-sdk/xai (xAI / Grok). [4][2]
  • @ai-sdk/vercel (Vercel provider / v0 API). [5][2]
  • @ai-sdk/gateway (AI Gateway — unified access to multiple providers). [6][2]

Notes:

  • Some community or third‑party provider packages (e.g., Ollama adapters, Mem0, Auth0 integrations) exist but are not first‑party Vercel packages — check their repos/docs if you need them. [1][5]

Sources: Vercel blog & changelog, Vercel Docs (AI Gateway / Models & Providers), and AI SDK provider documentation. [1][2][3][4][6]

References

  1. Vercel — Introducing Vercel AI SDK (providers added).
  2. Vercel Docs — Models & Providers / AI Gateway documentation.
  3. Vercel changelog — Azure AI provider announcement (Jun 12, 2024).
  4. AI SDK provider listings (examples showing xAI, Mistral, OpenAI, Anthropic, etc.).
  5. npm / @ai-sdk/vercel package listing.
  6. AI SDK docs — @ai-sdk/gateway provider.

Update npm install and use official @ai-sdk/azure package for Azure OpenAI.

The packages @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, and @ai-sdk/mistral are correct and available. However, the provider configuration needs updates:

  • Add @ai-sdk/azure to the npm install command
  • Replace the Azure provider implementation with the official @ai-sdk/azure package instead of createOpenAI() with custom baseURL settings
🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around lines 74 to 115, the npm install and
Azure provider usage are incorrect: add @ai-sdk/azure to the install command and
replace the current Azure provider that uses createOpenAI() with the official
@ai-sdk/azure package. Update the top imports to import the Azure factory from
@ai-sdk/azure (instead of using createOpenAI from @ai-sdk/openai), then
instantiate the azureProvider via that Azure package’s factory/function passing
the AZURE endpoint and key from environment variables (and any required
Azure-specific options), and remove the custom createOpenAI-based Azure
configuration. Ensure the npm install example includes @ai-sdk/azure and the
code example imports and constructs the Azure provider using the official
package and env vars.

Comment on lines +300 to +336
### 2.6 Cost Tracking (Add Custom Middleware)

```typescript
// Middleware for cost tracking
import { experimental_wrapLanguageModel as wrapLanguageModel } from 'ai';

function withCostTracking(model: any, providerId: string) {
return wrapLanguageModel({
model,
middleware: {
transformParams: async ({ params }) => params,
wrapGenerate: async ({ doGenerate, params }) => {
const startTime = Date.now();
const result = await doGenerate();

// Track usage
await supabase.from('llm_usage').insert({
provider: providerId,
model: params.model,
input_tokens: result.usage.promptTokens,
output_tokens: result.usage.completionTokens,
cost: calculateCost(providerId, result.usage),
duration_ms: Date.now() - startTime,
});

return result;
},
},
});
}

// Usage
const trackedModel = withCostTracking(
openai('gpt-4o-mini'),
'openai'
);
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Experimental API usage requires stability warning.

The cost-tracking middleware uses experimental_wrapLanguageModel (line 304), which is an experimental Vercel AI SDK API. This may change or be deprecated in future versions, and code relying on it could break.

Consider adding a prominent warning in the documentation about the experimental nature of this API, and suggest monitoring Vercel AI SDK release notes for stability updates. Alternatively, propose an alternative approach that doesn't rely on experimental features, or note that users should pin the ai package version to a known-compatible release.

🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around lines 300 to 336, the example uses the
experimental experimental_wrapLanguageModel API without warning; add a prominent
stability notice immediately above the code block that states this API is
experimental and may change or be removed, advise readers to pin the ai package
to a known-compatible version, monitor Vercel AI SDK release notes for breaking
changes, and suggest an alternative approach (e.g., implement
middleware/wrapping at your application layer or use stable public SDK hooks) as
a fallback if the experimental API changes.

Comment on lines +358 to +407
cat > litellm_config.yaml <<EOF
model_list:
# OpenAI Models
- model_name: gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY

# Anthropic Models
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY

# Azure OpenAI
- model_name: azure-gpt4
litellm_params:
model: azure/gpt-4o
api_base: os.environ/AZURE_OPENAI_ENDPOINT
api_key: os.environ/AZURE_OPENAI_KEY

# Ollama (self-hosted)
- model_name: llama3
litellm_params:
model: ollama/llama3
api_base: http://localhost:11434

# Load balancing
router_settings:
routing_strategy: least-busy
num_retries: 3
timeout: 600

# Cost tracking (built-in)
litellm_settings:
success_callback: ["langfuse"] # Track all requests
failure_callback: ["langfuse"]
set_verbose: true

# Guardrails
guardrails:
- guardrail_name: "content-filter"
litellm_params:
mode: during_call
guardrail: "azure-content-filter"
EOF

# Start proxy server
litellm --config litellm_config.yaml --port 4000
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify LiteLLM configuration syntax and available options.

The LiteLLM proxy configuration (lines 358-407) uses specific YAML structure and settings like routing_strategy: least-busy, success_callback: ["langfuse"], and guardrail: "azure-content-filter". These should be verified against the current LiteLLM documentation to ensure syntax accuracy.


🌐 Web query:

What is the current LiteLLM proxy server configuration syntax and available options as of November 2025?

💡 Result:

Summary of LiteLLM proxy configuration syntax & available options (as of Nov 13, 2025)

Quick overview

  • The proxy is configured via a YAML config file (passed with --config or via LITELLM_CONFIG_PATH / CONFIG_FILE_PATH / bucket env vars). The main top-level sections are: model_list, general_settings, litellm_settings, router_settings, environment_variables (and extras like UI/auth settings). [1][2][3]

Main sections and common keys (YAML)

  • model_list: list of model entries. Each entry commonly includes:
    • model_name: user-facing alias (what clients request).
    • litellm_params: provider params passed to litellm (model identifier, api_key, api_base, api_version, provider-specific fields).
    • any per-model overrides (temperature, max_tokens, stream, fallback, etc.). [1][2]
  • general_settings:
    • master_key (proxy admin key), database connection settings (DATABASE_URL, connection_pool_limit, connection_timeout), port, logging, NO_DOCS / NO_REDOC toggles, UI_USERNAME / UI_PASSWORD. [1][2]
  • litellm_settings:
    • module-level toggles like drop_params, set_verbose, api_base, cache settings and other litellm runtime flags. [1][3]
  • router_settings:
    • routing_strategy (e.g., "least-busy"), other routing params for load-balancing across model instances. [1]
  • environment_variables:
    • list or mapping to inject expected env vars (REDIS_HOST, REDIS_PORT, provider keys when referencing os.environ). [1]
  • Extras / deployment:
    • Support for reading config.yaml from S3 or GCS via LITELLM_CONFIG_BUCKET_TYPE/NAME/OBJECT_KEY env vars.
    • Docker/Helm/Kubernetes deployment examples; default proxy port 4000 (or 8000 in some docs), ability to run simple proxy mode with CLI flags --model, --port, --logs. [2][3]

Example minimal model_list entry

  • YAML (conceptual):
    • model_list:
      • model_name: gpt-4o
        litellm_params:
        model: openai/gpt-4o
        api_key: os.environ/OPENAI_API_KEY
        api_base: https://api.openai.com
        api_version: "2023-05-15"
        temperature: 0.0
        stream: True
        (Use env var references for secrets). [1][2]

Other notable options / behaviors

  • Admin UI & Swagger available at proxy root; Swagger exposes complete config schema at /#/config.yaml. Master key is required to use the admin UI. [2]
  • Logging: --logs returns recent API call logs (saved to api_logs.json). [3]
  • Two run modes: simple/basic proxy (CLI flags) and full proxy using config.yaml with DB-backed features (rate-limits, API key management, usage tracking). [3][1]

Sources
[1] LiteLLM proxy config docs — configs & examples.
[2] LiteLLM proxy Quick Start / UI docs (usage, master key, env vars).
[3] LiteLLM deploy / simple proxy docs (deployment, CLI flags, config-bucket support).


I need to search for more specific information about LiteLLM guardrails and callback features to determine if they are actually supported in the current version.

I need to search more specifically about the guardrails placement and the "azure-content-filter" guardrail type to fully verify the configuration.

Fix LiteLLM configuration structure and guardrail type.

The configuration has two critical issues:

  1. Guardrails placement: Guardrails must be nested within each model's litellm_params, not defined as a separate top-level section. Move the guardrails block into each model entry under model_list.

  2. Incorrect guardrail type: Use guardrail: azure/text_moderations or guardrail: azure/prompt_shield, not "azure-content-filter".

  3. Undocumented router settings: num_retries and timeout in router_settings are not confirmed as valid LiteLLM proxy options. Verify these keys against current LiteLLM documentation or remove them if unsupported.

Move the guardrails block (lines ~390-396) into the appropriate model entries, and update the guardrail type.

🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around lines 358 to 407, the LiteLLM YAML
places guardrails as a top-level section and uses an invalid guardrail type;
move the guardrail configuration into each model's litellm_params (i.e., under
each model entry in model_list), replace the guardrail value
"azure-content-filter" with a supported type such as "azure/text_moderations" or
"azure/prompt_shield", and verify or remove the unconfirmed router_settings keys
(num_retries and timeout) against current LiteLLM docs if they are not
supported.

Comment on lines +431 to +449
```
┌─────────────────────────────────────────┐
│ Next.js Application │
└─────────────┬───────────────────────────┘
│ HTTP (OpenAI format)
┌─────────────▼───────────────────────────┐
│ LiteLLM Proxy Server │
│ (Load Balancing, Cost Tracking, │
│ Guardrails, Retry Logic) │
└─────────────┬───────────────────────────┘
┌─────────┼─────────┐
│ │ │
┌───▼───┐ ┌──▼───┐ ┌──▼────┐
│OpenAI │ │Claude│ │Azure │
└───────┘ └──────┘ └───────┘
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The ASCII architecture diagram at lines 431-449 is a fenced code block without a language specifier, violating markdown linting standards (MD040).

Apply this diff to fix the linting violation:

-```
+```text
 ┌─────────────────────────────────────────┐
 │         Next.js Application             │
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

431-431: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around lines 431 to 449 the fenced code block
containing the ASCII architecture diagram lacks a language identifier (MD040);
fix it by changing the opening fence from ``` to ```text so the block is
explicitly marked as plain text, leaving the diagram content and closing fence
unchanged.

return {
text: result.text,
layout: result.markdown,
confidence: 0.97, // DeepSeek reports 97% accuracy at 10x compression
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify specific performance and accuracy claims.

The document makes several specific quantitative claims that should be validated:

  • Line 552: "DeepSeek reports 97% accuracy at 10x compression"
  • Line 693: "200K+ pages/day" throughput on A100
  • Line 770: Cost figures for various LLM providers

These figures significantly influence architectural decisions and should be verified against current official documentation.

Also applies to: 693-693


🌐 Web query:

What is the current reported accuracy and performance of DeepSeek-OCR model as of November 2025?

💡 Result:

Short answer — reported numbers (as of Nov 13, 2025):

  • OCR accuracy: ~97% decoding/precision at compression ratios <10×; ≈60% at ~20× compression. [1][2][3]
  • Compression / tokens: claims of using ~100 vision tokens/page to match models that use 256–6,000+ tokens (i.e., 7–20× token reduction). [1][2][4]
  • Throughput / latency: published production throughput ~200,000 pages/day on a single NVIDIA A100-40G (scales to millions on large clusters). [1][4]
  • Benchmarks / comparisons: reported to outperform GOT‑OCR2.0 and MinerU2.0 on OmniDocBench/Fox at much lower vision-token budgets. [1][2]

Sources: DeepSeek-OCR arXiv paper (Oct 2025) and press coverage / vendor pages summarizing reported benchmarks. [1][2][3][4]

References
[1] DeepSeek‑OCR arXiv paper (Contexts Optical Compression, Oct 2025).
[2] InfoQ news summary (Oct 2025).
[3] Tom’s Hardware coverage (Oct–Nov 2025).
[4] DeepSeek product/site and blog posts (Oct–Nov 2025).


🌐 Web query:

What are the current official pricing and token costs for OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, and Google Gemini 1.5 Flash as of November 2025?

💡 Result:

Short answer (official sources, as of Nov 13, 2025):

  • OpenAI — GPT‑4o (official docs for GPT‑4o model snapshot gpt-4o-2024-08-06): inference listed at $3.75 per 1M input tokens and $15.00 per 1M output tokens; GPT‑4o‑mini: $0.15 per 1M input / $0.60 per 1M output. [OpenAI]. [1][2]

  • Anthropic — Claude 3.5 Sonnet: $3.00 per 1M input tokens; prompt-cache write $3.75/1M; cache read $0.30/1M; $15.00 per 1M output tokens. (Prices shown as $/MTok on Anthropic docs). [3]

  • Google — Gemini 1.5 Flash (Vertex AI / Google Cloud): Google publishes per‑character serving prices (example blog: $0.0375 per 1M characters input and $0.15 per 1M characters output for Gemini 1.5 Flash); that converts approximately (≈4 characters ≈1 token) to roughly $0.15 per 1M tokens input and $0.60 per 1M tokens output for typical token→character conversion — Vertex/Cloud docs note prices and recent Flash reductions. Check Google Cloud Vertex AI pricing for region-specific/product details. [4][5]

Sources

  • [1] OpenAI — “Introducing vision to the fine‑tuning API” / GPT‑4o model snapshot pricing. (OpenAI).
  • [2] OpenAI — GPT‑4o mini announcement (GPT‑4o mini pricing). (OpenAI).
  • [3] Anthropic — Claude pricing (Claude Sonnet / Claude 3.5 Sonnet prices). (Anthropic docs).
  • [4] Google Cloud blog / Vertex AI — Gemini 1.5 Flash pricing and cost reductions (per‑character pricing). (Google Cloud).
  • [5] Google Cloud Vertex AI pricing pages / model garden (Gemini Flash details).

If you want I can fetch the exact vendor pricing pages and paste the specific lines/links (or convert Google’s per‑character numbers to per‑token using a specific character→token ratio you prefer).


I've identified a factual inaccuracy in the document based on official DeepSeek-OCR benchmarks. Here's my analysis:

Line 552 — Accuracy claim requires correction:
The official benchmarks show ~97% accuracy at compression ratios <10×, with degraded performance (~60%) at ~20× compression. The current phrasing "97% accuracy at 10x compression" is misleading—97% applies only to compression ratios below 10×, not at the 10× threshold.

Line 693 — Throughput claim verified:
The 200K+ pages/day throughput on a single A100 matches the published production benchmark.

Line 770 — Pricing claims pending:
I cannot verify the cost figures on line 770 without seeing the actual document content. Current official pricing (Nov 2025) shows OpenAI GPT-4o at $3.75/$15.00 per 1M tokens, Anthropic Claude 3.5 Sonnet at $3.00/$15.00, and Google Gemini 1.5 Flash at approximately $0.15/$0.60 per 1M tokens—please share the specific figures from line 770 to confirm accuracy.

Action required: Correct line 552 to clarify that 97% accuracy applies to <10× compression ratios, not at 10× compression. Provide line 770 snippet for pricing verification.

🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around line 552, the statement "97% accuracy
at 10x compression" is misleading; update the wording to state that DeepSeek
reports ~97% accuracy for compression ratios below 10× (e.g., "<10×") and that
accuracy degrades (to ~60%) at around 20× compression, ensuring the phrasing
uses "<10×" or "below 10×" rather than "at 10×"; also, to verify the pricing on
line 770, paste the current line 770 snippet so we can confirm and correct the
cost figures as needed.

Comment on lines +831 to +880
```
┌─────────────────────────────────────────────────────────┐
│ Next.js Application (Frontend) │
│ │
│ ┌──────────────────────┐ ┌─────────────────────┐ │
│ │ Vercel AI SDK │ │ React UI Hooks │ │
│ │ (Multi-provider) │◄───│ (useChat, etc.) │ │
│ └──────────┬───────────┘ └─────────────────────┘ │
└─────────────┼───────────────────────────────────────────┘
┌─────────┼─────────┐
│ │ │
┌───▼──┐ ┌──▼───┐ ┌──▼────┐
│OpenAI│ │Claude│ │Gemini │
└──────┘ └──────┘ └───────┘


┌─────────────────────────────────────────────────────────┐
│ Next.js Application (Backend API) │
│ │
│ ┌──────────────────────┐ ┌─────────────────────┐ │
│ │ OCR Service │ │ PDF Processing │ │
│ │ (Multi-method) │◄───│ Pipeline │ │
│ └──────────┬───────────┘ └─────────────────────┘ │
└─────────────┼───────────────────────────────────────────┘
┌─────────┼─────────┬──────────┐
│ │ │ │
┌───▼────────┐│ ┌──▼───────┐ ┌▼─────────┐
│LLMWhisperer││ │DeepSeek │ │Textract │
│(API) ││ │(Self-host│ │(AWS) │
└────────────┘│ └──────────┘ └──────────┘
┌────▼────────┐
│ Tesseract │
│ (Fallback) │
└─────────────┘


Optional: LiteLLM Proxy (for advanced features)
┌─────────────────────────────────────────────────────────┐
│ LiteLLM Proxy Server │
│ (Load Balancing, Cost Tracking, Guardrails) │
└─────────────┬───────────────────────────────────────────┘
┌─────────┼─────────┬──────────┐
│ │ │ │
┌───▼──┐ ┌──▼───┐ ┌──▼────┐ ┌──▼───┐
│OpenAI│ │Claude│ │Azure │ │Ollama│
└──────┘ └──────┘ └───────┘ └──────┘
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block (architecture diagram).

The second ASCII architecture diagram (lines 831-880) is a fenced code block without a language specifier, violating markdown linting standards (MD040).

Apply this diff:

-```
+```text
 ┌─────────────────────────────────────────────────────────┐
 │              Next.js Application (Frontend)             │
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

831-831: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
docs/UPDATED_RECOMMENDATIONS.md around lines 831-880: the ASCII architecture
diagram is in a fenced code block with no language specifier (triggering MD040);
fix by changing the opening fence from ``` to ```text (or otherwise adding
"text" after the opening backticks) so the block becomes a labeled text code
fence, ensuring the closing fence remains ``` unchanged.

- Multi-provider LLM service using Vercel AI SDK
  - Support for OpenAI, Anthropic, Google, Azure, Ollama
  - Structured outputs, streaming, embeddings
  - Cost tracking and fallback providers

- Multi-method OCR service with smart routing
  - LLMWhisperer (primary - layout preservation)
  - DeepSeek-OCR (self-hosted, high-volume)
  - AWS Textract (forms and tables)
  - PDF.js fallback

- REST API v1 foundation
  - /api/v1/health - Service health check
  - /api/v1/extract - Document extraction
  - /api/v1/chat - Chat completion
  - API key auth and rate limiting

- Environment variables template (.env.example)
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (17)
src/app/api/v1/utils/auth.ts (1)

97-131: Unify auth errors with centralized error utilities and harden retryAfter calculation

authError and rateLimitError hand‑roll JSON responses that largely mirror what APIErrorResponse / ErrorResponses in errors.ts already provide, and retryAfter is computed twice with no clamping.

Consider:

  • Reusing ErrorResponses.unauthorized / ErrorResponses.forbidden for authError, or delegating via createErrorResponse so all errors share the exact same payload shape (including future additions like requestId).
  • For rateLimitError, computing retryAfterSeconds once and clamping it to a non‑negative value:
 export function rateLimitError(resetAt: number) {
+  const retryAfterSeconds = Math.max(
+    0,
+    Math.ceil((resetAt - Date.now()) / 1000),
+  );
   return Response.json(
     {
       error: {
         code: 'RATE_LIMITED',
         message: 'Too many requests',
-        retryAfter: Math.ceil((resetAt - Date.now()) / 1000),
+        retryAfter: retryAfterSeconds,
       },
     },
     {
       status: 429,
       headers: {
-        'Retry-After': String(Math.ceil((resetAt - Date.now()) / 1000)),
+        'Retry-After': String(retryAfterSeconds),
       },
     }
   );
 }

This keeps behavior consistent and a bit more robust to any future changes in how resetAt is computed.

src/lib/ocr/types.ts (1)

1-175: Well-structured OCR type surface

The request/response, provider-specific, and error/usage types are laid out cleanly and should be straightforward to consume. I don’t see functional issues here; any further tweaks (e.g., changing Buffer | string or Date serialization in OCRUsageRecord) would be design choices rather than fixes.

src/app/api/v1/utils/errors.ts (1)

81-93: Consider preserving thrown Response objects in handleError

Right now, handleError always converts any thrown value into an INTERNAL_ERROR response. If a route ever adopts the pattern throw ErrorResponses.unauthorized() or throws a prebuilt Response, that intent will be lost.

You could make handleError more flexible with a small check:

 export function handleError(error: unknown): Response {
   console.error('API Error:', error);

-  if (error instanceof Error) {
+  if (error instanceof Response) {
+    return error;
+  }
+
+  if (error instanceof Error) {
     // Don't expose internal error details in production
     const message =
       process.env.NODE_ENV === 'development' ? error.message : 'An unexpected error occurred';

     return ErrorResponses.internalError(message);
   }

That keeps your current behavior for real exceptions while allowing explicit Response throws to pass through unchanged.

src/app/api/v1/chat/route.ts (1)

135-137: Improve error mapping for upstream LLM failures

All thrown errors are funneled through handleError, which currently always returns a generic 500 response. For provider-specific failures (e.g. 429 rate limits, 503 from an upstream LLM), this loses useful semantics for clients (retry behavior, UX). Consider extending handleError to detect your domain errors (e.g. LLMError) and surface appropriate HTTP status codes and error codes instead of always returning an internal error.

src/lib/ocr/ocr-service.ts (5)

131-198: LLMWhisperer integration looks consistent; only minor polish

The LLMWhisperer flow (API key validation, form-data upload, error wrapping, and typed response mapping) is coherent and matches the declared LLMWhispererResponse/OCRResult types.

One minor behavior to be aware of: extractText overwrites processingTimeMs afterward with end-to-end latency, so the API’s reported processing_time_ms is effectively discarded. If you care about distinguishing provider time vs total time, you may want to keep both (e.g., providerProcessingTimeMs vs processingTimeMs).


203-226: Quota calculation semantics may not match upstream API

getLLMWhispererQuota derives limit and remaining from overage_page_count and pages_extracted. If the upstream API’s semantics differ (e.g., one is overage already used vs. total allowance), the numbers exposed here could be misleading, which in turn affects selectOCRMethod.

Recommend double-checking the LLMWhisperer usage API docs and adjusting the computation or field names (e.g., dailyPageLimit, overageUsed) so the meaning is unambiguous.


314-391: Textract integration is reasonable but currently single-page and basic

The Textract integration is sound for simple PDFs, but note:

  • DetectDocumentText and AnalyzeDocument in synchronous mode are effectively single-document calls; pageCount is hard-coded to 1, which may not reflect the actual page count of multi-page PDFs.
  • Credentials are wired manually via credentials rather than relying on the default AWS credential chain, which may be fine but reduces flexibility.
  • Table/form extraction helpers currently return empty arrays, so callers enabling extractTables/detectForms won’t see useful structures yet.

If multi-page or richer structured extraction is in scope, consider:

  • Adding a comment that pageCount is a placeholder and/or computing it from Blocks.
  • Implementing extractTablesFromTextract / extractFormFieldsFromTextract or clearly documenting that they’re stubs for now.

424-453: PDF.js extraction: potential page-range validation

The PDF.js path looks fine overall and correctly supports password and optional page selection. One edge case:

  • If options.pages includes values outside 1..pdf.numPages, pdf.getPage(pageNum) will throw and abort the whole extraction.

If you expect user-supplied page lists, consider clamping/filtering to valid pages or throwing a clearer OCRError when an invalid page is requested.


486-497: Textract table/form helpers intentionally stubbed

Both extractTablesFromTextract and extractFormFieldsFromTextract currently return empty arrays. That’s fine as an MVP, but make sure consumers (e.g., API responses) don’t assume non-empty arrays to infer “feature enabled”.

A short TODO comment tying these to a follow-up task (and tests) would help avoid them being mistaken for complete implementations later.

src/lib/llm/types.ts (1)

187-198: Usage record type is sufficient for basic telemetry

LLMUsageRecord captures the core metrics (tokens, cost, provider/model, requestType, resource IDs, timestamp). If you later add multi-tenant support, consider adding a first-class tenantId/userId field rather than overloading resourceType/resourceId.

src/lib/llm/providers.ts (2)

16-179: Provider configs are well-structured; keep an eye on pricing/version drift

The PROVIDER_CONFIGS block is clear and matches LLMProviderConfig/LLMModelConfig. Capabilities and recommended flags are helpful for UI and default selection. Just be aware that model IDs and pricing can drift over time; a brief comment about their “as of” date or a central place to update them can help avoid silently stale values.


285-308: Provider availability check overlaps with isProviderConfigured

isProviderAvailable checks env vars in almost the same way isProviderConfigured does in llm-service.ts. That’s fine, but you might later want a single source of truth (e.g., export and reuse this function) to avoid divergence.

Also note that Ollama is always treated as “available”, which is OK as long as callers understand this doesn’t guarantee the daemon is actually running.

src/lib/llm/llm-service.ts (5)

33-77: Completion flow is solid; minor note on usage extraction

generateCompletion correctly wires provider/model into generateText, extracts token usage defensively, and calculates cost via calculateCost. Error wrapping into LLMError with retryable on 429 is sensible.

Be aware that different providers may expose usage under different field names; your (result.usage as any).promptTokens ?? inputTokens pattern handles this, but you may want a small helper to centralize this mapping if it grows more complex.


83-108: Streaming completion casting relies on textStream shape

generateStreamingCompletion returns result.textStream as unknown as ReadableStream. This is fine if streamText’s textStream is already a ReadableStream-compatible object, but brittle otherwise.

If the ai SDK exposes a canonical stream type/helper (e.g., toAIStream or ReadableStream directly), consider using that instead of double casting to avoid surprises if the library’s internal stream type changes.


114-181: Structured response + text fallback are well designed

generateStructuredResponse cleanly uses generateObject with LLMResponseSchema and provides a robust fallback to plain text via generateText when structured output fails. Returning a text-type LLMResponse keeps the output contract stable for callers.

Only minor nits:

  • You log the error on fallback; if this becomes noisy in prod, consider gating behind a debug flag or structured logging.
  • The confidence fallback is hard-coded to 0.7; you may want to make that configurable or at least document the choice.

264-294: chatCompletion’s context/instruction split is reasonable; note empty-message behavior

The transformation from messages → (context, instruction) is straightforward and delegates to generateStructuredResponse, which maintains the structured output contract.

One edge case: if messages has no non-system entries, instruction becomes '' and you effectively call the model with no user instruction. That’s probably fine, but if this function is user-facing, you may want to assert at least one non-system message and throw a clearer error.


394-441: getAvailableModels + isProviderConfigured largely duplicate provider availability logic

getAvailableModels uses isProviderConfigured (env checks), while providers.ts has a similar isProviderAvailable. This duplication is easy to drift over time.

Consider exporting and reusing a single availability helper from one module to keep behavior consistent across LLM service and any health endpoints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 901020c and aa96908.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (14)
  • package.json (2 hunks)
  • src/app/api/v1/chat/route.ts (1 hunks)
  • src/app/api/v1/extract/route.ts (1 hunks)
  • src/app/api/v1/health/route.ts (1 hunks)
  • src/app/api/v1/utils/auth.ts (1 hunks)
  • src/app/api/v1/utils/errors.ts (1 hunks)
  • src/app/api/v1/utils/index.ts (1 hunks)
  • src/lib/llm/index.ts (1 hunks)
  • src/lib/llm/llm-service.ts (1 hunks)
  • src/lib/llm/providers.ts (1 hunks)
  • src/lib/llm/types.ts (1 hunks)
  • src/lib/ocr/index.ts (1 hunks)
  • src/lib/ocr/ocr-service.ts (1 hunks)
  • src/lib/ocr/types.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
src/app/api/v1/chat/route.ts (4)
src/app/api/v1/utils/auth.ts (5)
  • validateAPIKey (24-50)
  • authError (100-110)
  • hasPermission (55-57)
  • checkRateLimit (66-95)
  • rateLimitError (115-131)
src/app/api/v1/utils/errors.ts (2)
  • ErrorResponses (52-76)
  • handleError (81-93)
src/lib/llm/types.ts (1)
  • LLMProviderType (13-13)
src/lib/llm/llm-service.ts (2)
  • generateStreamingCompletion (83-108)
  • generateCompletion (33-77)
src/app/api/v1/health/route.ts (2)
src/lib/llm/providers.ts (1)
  • getAvailableProviders (285-287)
src/lib/ocr/ocr-service.ts (1)
  • getLLMWhispererQuota (203-226)
src/app/api/v1/extract/route.ts (4)
src/app/api/v1/utils/auth.ts (5)
  • validateAPIKey (24-50)
  • authError (100-110)
  • hasPermission (55-57)
  • checkRateLimit (66-95)
  • rateLimitError (115-131)
src/app/api/v1/utils/errors.ts (2)
  • ErrorResponses (52-76)
  • handleError (81-93)
src/lib/ocr/ocr-service.ts (1)
  • extractText (37-77)
src/lib/ocr/types.ts (1)
  • OCRMethod (11-11)
src/lib/llm/providers.ts (1)
src/lib/llm/types.ts (3)
  • LLMProviderType (13-13)
  • LLMProviderConfig (15-23)
  • LLMModelConfig (25-36)
src/lib/ocr/ocr-service.ts (2)
src/lib/ocr/index.ts (5)
  • extractText (12-12)
  • getLLMWhispererQuota (13-13)
  • simpleExtract (14-14)
  • extractWithLayout (15-15)
  • extractForms (16-16)
src/lib/ocr/types.ts (6)
  • OCRRequest (19-49)
  • OCRResult (51-78)
  • OCRError (149-160)
  • OCRMethod (11-11)
  • LLMWhispererResponse (116-123)
  • LLMWhispererQuota (125-130)
src/lib/llm/llm-service.ts (2)
src/lib/llm/types.ts (9)
  • LLMCompletionRequest (50-54)
  • LLMCompletionResponse (56-67)
  • LLMError (170-181)
  • LLMConfig (42-48)
  • LLMResponse (144-144)
  • LLMResponseSchema (130-136)
  • EmbeddingRequest (150-157)
  • EmbeddingResponse (159-164)
  • LLMProviderType (13-13)
src/lib/llm/providers.ts (4)
  • getModel (242-245)
  • calculateCost (267-280)
  • getDefaultModel (250-254)
  • PROVIDER_CONFIGS (16-179)
🪛 Biome (2.1.2)
src/lib/llm/providers.ts

[error] 214-214: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 215-215: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 228-228: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (13)
src/lib/ocr/index.ts (1)

1-17: OCR barrel file looks clean and focused

Re-exporting types and a curated set of service functions keeps the public surface tidy and easy to consume; no issues spotted here.

src/app/api/v1/utils/errors.ts (1)

52-76: Centralized error helpers look consistent

APIErrorCode, createErrorResponse, and ErrorResponses give you a clean, uniform error surface across v1 endpoints. Status codes and messages are sensible defaults and easy to override per call site.

src/app/api/v1/utils/index.ts (1)

1-6: API utils index is straightforward and useful

Re-exporting auth and errors from a single index keeps route imports tidy and consistent; this looks good as-is.

src/app/api/v1/health/route.ts (1)

16-55: Health payload structure and status derivation look solid

Parallel checks, clear per‑service status, and an aggregated health.status based on LLM/OCR availability are all reasonable. No blocking issues from a correctness or resilience standpoint.

src/lib/llm/index.ts (1)

1-34: Barrel exports are coherent and improve LLM API ergonomics

Centralizing types, provider utilities, and service functions behind @/lib/llm makes the LLM surface easier to consume and evolve. Export list matches the described providers and service functions; no issues spotted.

src/lib/ocr/ocr-service.ts (1)

291-303: Health check looks fine; be aware of AbortSignal.timeout compatibility

The DeepSeek health probe with AbortSignal.timeout(2000) is a good guard. Just ensure your runtime (Node/Next.js version) consistently supports AbortSignal.timeout; otherwise, consider a manual AbortController fallback if you ever target older Node versions.

src/lib/llm/types.ts (4)

13-37: Provider and model config types are clear and align with usage

LLMProviderType, LLMProviderConfig, and LLMModelConfig are well-structured and match how PROVIDER_CONFIGS and getModel consume them. The pricing comments (“$ per 1M tokens”) are especially useful for downstream cost calculations.


42-67: Completion request/response shapes look coherent

LLMConfig, LLMCompletionRequest, and LLMCompletionResponse align cleanly with how generateCompletion constructs and returns data (provider/model, usage, cost, finishReason). This should make it straightforward to track cost and usage across providers.


79-145: Structured output schema is well-factored

The discriminated union (FormattedResponseSchema) and LLMResponseSchema give you a solid, type-safe contract for structured responses (single value, chart, table, text). This matches how generateStructuredResponse is implemented and should make downstream rendering straightforward.


170-181: LLMError type is consistent with OCR error pattern

LLMError mirrors the OCRError structure (message, code, provider, statusCode, retryable), which keeps error handling consistent across LLM and OCR surfaces. This is a good foundation for centralized logging/telemetry.

src/lib/llm/providers.ts (1)

242-280: Model lookup and cost calculation are straightforward and correct

getModel, getDefaultModel, getModelConfig, and calculateCost are clean and do what they say. Cost is correctly scaled per 1M tokens using the configured pricing.

src/lib/llm/llm-service.ts (2)

234-259: simpleCompletion is a clean backward-compatible helper

The helper correctly defaults provider/model, forwards overrides, and delegates to generateCompletion. This should make it easy to migrate existing call sites without touching provider plumbing.


351-385: compareProviders is straightforward A/B harness

The parallel Promise.allSettled approach and filtering fulfilled results is clean and robust. Just remember that model IDs must be valid per provider (same concern as above, but here the caller passes explicit pairs, so it’s their responsibility).

Comment on lines +17 to 21
"@ai-sdk/anthropic": "^2.0.45",
"@ai-sdk/google": "^2.0.38",
"@ai-sdk/openai": "^2.0.68",
"@aws-sdk/client-textract": "^3.934.0",
"@radix-ui/react-checkbox": "^1.1.1",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

New AI/SDK dependencies align with the LLM/OCR features; verify version compatibility

The added @ai-sdk/*, ai, and @aws-sdk/client-textract dependencies match the new multi‑provider LLM and Textract usage. From a structure standpoint this looks fine; just ensure the chosen versions are compatible with your Next 14 / React 18 / Node.js runtime and each other (peer deps, bundling, ESM/CJS).

Also applies to: 40-40


🌐 Web query:

Check the supported Node.js / React / Next.js versions and peer dependency requirements for:
- `ai@5.0.95`
- `@ai-sdk/openai@2.0.68`
- `@ai-sdk/anthropic@2.0.45`
- `@ai-sdk/google@2.0.38`
- `@aws-sdk/client-textract@3.934.0`
and confirm they are compatible with a Next.js 14.2.8 + React 18 app.

💡 Result:

Short answer: Yes — these package versions are compatible with a Next.js 14.2.8 + React 18 app, provided your runtime Node.js meets the Next.js v14 minimum (Node 18.17+) and you satisfy the AI SDK peer deps (notably zod >= 3.25.0). Evidence:

  • Next.js v14 requires Node >= 18.17 and is intended to be used with React 18. (Next.js upgrade notes). [1]
  • The AI SDK v5 (ai@5.x and the @ai-sdk/* v2 series) requires Node 18+ in its docs and additionally lists zod (>= 3.25.0) as a peer dependency for the v5 migration / providers. The migration/getting-started docs for AI SDK v5 show the ai package and @ai-sdk/* v2 series and call out zod as a required peer dependency. (@AI-SDK provider packages are the v2 series used by ai@5.x). [2][3]
  • Individual @AI-SDK provider packages (openai / anthropic / google) are part of the AI SDK v2 provider releases and are published as companion provider packages for ai@5 — their docs and package pages align with the AI SDK v5 requirements above. (Examples: Anthropic provider listing; OpenAI/Google providers are the same provider-series). [4]
  • @aws-sdk/client-textract (AWS SDK for JavaScript v3) targets modern Node.js LTS and AWS has announced v3 no longer supports Node 16.x; v3 is intended for Node 18+ (AWS SDK v3 Node support guidance). The modular v3 clients (including client-textract@3.934.0) do not impose React/Next peer deps and are compatible in server/runtime code (Node) used by Next.js. [5][6]

What to check in your project

  • Use Node >= 18.17 (Next.js 14 minimum). If your environment uses a lower Node, upgrade. [1]
  • Add/ensure zod >= 3.25.0 in your package.json if you use ai@5.x / @ai-sdk/* v2 features that declare it as a peer dependency. [2][3]
  • @aws-sdk/client-textract has no React/Next peer deps; install it normally for server-side usage (Node runtime). [5]

Sources

  • Next.js v14 upgrade / requirements (Node >=18.17, React 18). [1]
  • AI SDK v5 migration / getting-started (ai@5.x, @ai-sdk/* v2, and zod peer dep note). [2][3]
  • @ai-sdk/anthropic package listing / provider info (example provider page). [4]
  • AWS SDK for JavaScript (v3) Node support / end-of-support notices (Node 16 EOL; v3 targets Node 18+). [5][6]

If you want, I can fetch the exact peerDependencies sections from each package.json for these exact versions and paste them here.



---

🏁 Script executed:

```shell
#!/bin/bash
# Check for zod dependency in package.json
rg '"zod"' package.json -A 1

Length of output: 25


Upgrade zod to satisfy ai@5.x peer dependency requirement

The added dependencies are compatible with Next.js 14.2.8 + React 18 (Node 18.17+), but zod needs to be upgraded. The AI SDK v5 package (ai@5.0.95) requires zod >= 3.25.0 as a peer dependency. Currently, package.json specifies zod@^3.23.8, which falls below the minimum. Update to zod@^3.25.0 or higher to prevent runtime failures when using the AI SDK providers.

🤖 Prompt for AI Agents
In package.json around lines 17 to 21, the project currently pins zod to ^3.23.8
which is below the ai@5.x peer requirement; update the zod entry to ^3.25.0 (or
a later 3.x release) in package.json, then run the package manager (npm install
or yarn install) to update the lockfile and ensure dependencies resolve; finally
run your test/build steps (or a simple start) to verify there are no runtime
peer-dependency issues.

Comment on lines +91 to +108
if (stream) {
// Streaming response
const textStream = await generateStreamingCompletion({
prompt,
config,
systemPrompt: systemMessage,
});

// Return the stream with appropriate headers
return new Response(textStream, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
Connection: 'keep-alive',
'X-RateLimit-Remaining': String(rateLimit.remaining),
'X-RateLimit-Reset': String(rateLimit.resetAt),
},
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Clarify streaming format vs text/event-stream header

generateStreamingCompletion returns a plain text ReadableStream, but the response is advertised as text/event-stream. Unless textStream is already SSE‑framed, EventSource/SSE clients may misinterpret the stream. Consider either:

  • Using the Vercel AI helper that returns a proper SSE Response (e.g. toDataStreamResponse() upstream in generateStreamingCompletion), or
  • Changing the content type to a generic streaming type (e.g. text/plain) if consumers will read raw chunks via fetch/ReadableStream.

🌐 Web query:

What is the recommended way to return a streamed chat completion with Vercel AI SDK `streamText` so that clients receive a proper `text/event-stream` SSE response?

💡 Result:

Use streamText and return its built-in SSE response helper — call result.toUIMessageStreamResponse() (or toDataStreamResponse for the simpler "data" protocol) and return that Response so the SDK sets the proper text/event-stream headers and protocol header for you.

Example (Node / Vercel function):

const result = streamText({ model: openai('gpt-4o-mini'), messages });
return result.toUIMessageStreamResponse({
  headers: {
    // optional extra headers
    'X-Chat-Id': chatId,
  },
});

This yields a Response with Content-Type: text/event-stream; charset=utf-8 and x-vercel-ai-ui-message-stream: v1 (uses the SDK's UI message stream conversion). [1][2][3]

Sources: Official SDK internals and examples. [1] [2] [3]




[/run_scripts]

---

🏁 Script executed:

```shell
#!/bin/bash
# Find and inspect generateStreamingCompletion implementation
rg -n "generateStreamingCompletion" --type=ts -B2 -A10

Length of output: 2932


🏁 Script executed:

#!/bin/bash
# Search for imports of Vercel AI SDK to understand what's being used
rg -n "streamText\|toUIMessageStreamResponse\|toDataStreamResponse" --type=ts -B2 -A2

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Find where Response is constructed with text/event-stream
rg -n "text/event-stream" --type=ts -B5 -A5

Length of output: 736


[/run_scripts]

---

🏁 Script executed:

```shell
#!/bin/bash
# View the full generateStreamingCompletion implementation
sed -n '83,110p' src/lib/llm/llm-service.ts

Length of output: 729


Fix streaming response to use Vercel AI SDK's proper SSE helper

The review comment is correct. generateStreamingCompletion() currently returns result.textStream, which is raw text without SSE framing. While the route handler sets the text/event-stream header, the actual stream content is not SSE-formatted—clients expecting proper Server-Sent Events protocol will fail.

Per Vercel AI SDK recommendations, use result.toUIMessageStreamResponse() which automatically sets proper text/event-stream headers and protocol framing.

Required changes:

  1. src/lib/llm/llm-service.ts (line 83–108): Change generateStreamingCompletion() to return Promise<Response> and use result.toUIMessageStreamResponse() instead of result.textStream.
  2. src/app/api/v1/chat/route.ts (line 93–107): Update to directly return the Response from generateStreamingCompletion() instead of manually wrapping it.
🤖 Prompt for AI Agents
In src/app/api/v1/chat/route.ts around lines 91–108 and
src/lib/llm/llm-service.ts around lines 83–108, the streaming flow currently
returns a raw text stream and wraps it with SSE headers, but the stream lacks
SSE framing; update generateStreamingCompletion in llm-service.ts to return a
Promise<Response> using result.toUIMessageStreamResponse() (so the SDK applies
proper SSE framing and headers), and in route.ts simply await and return that
Response directly instead of re-wrapping the stream and headers.

Comment on lines +65 to +83
const { file, url, method, preserveLayout, detectForms, extractTables, pages, language, password } =
validation.data;

// Ensure either file or url is provided
if (!file && !url) {
return ErrorResponses.validationError('Either file (base64) or url must be provided');
}

// Convert base64 to buffer if file provided
let input: Buffer | string;
if (file) {
try {
input = Buffer.from(file, 'base64');
} catch {
return ErrorResponses.validationError('Invalid base64 file data');
}
} else {
input = url!;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tighten input-source and base64 handling; avoid unnecessary cast

A few small issues in the input handling block:

  • The comment says “Either file (base64) or url must be provided”, but the logic only checks that at least one is present. If both are sent, file silently wins and url is ignored. Consider enforcing “exactly one” source, e.g. with a refinement on the Zod schema or an explicit if (file && url) validation error.
  • Buffer.from(file, 'base64') seldom throws for malformed base64; your try/catch will almost never return “Invalid base64 file data”, and bad input is more likely to fail later inside extractText as a 500. If you care about returning a clear 4xx, consider validating the base64 string (length/charset) or using a small helper to detect obviously invalid data before calling OCR.
  • method: method as OCRMethod forces an optional field into the non‑optional OCRMethod type. Since extractText already chooses a method when options?.method is falsy, you can drop the cast and allow method to be undefined naturally for better type safety.
  • Minor: if pages are intended as page numbers, z.array(z.number().positive()) will accept non‑integers; z.array(z.number().int().positive()) would better model that constraint.

These refinements would make the endpoint behavior more predictable and validation errors clearer without changing the overall design.

Also applies to: 85-97

Comment on lines +24 to +50
export async function validateAPIKey(request: NextRequest): Promise<APIKeyInfo | null> {
const authHeader = request.headers.get('authorization');
const apiKeyHeader = request.headers.get('x-api-key');

const apiKey = apiKeyHeader || (authHeader?.startsWith('Bearer ') ? authHeader.slice(7) : null);

if (!apiKey) {
return null;
}

// TODO: Implement actual API key validation against database
// For now, validate against environment variable for development
const validKey = process.env.API_KEY;
if (apiKey === validKey) {
return {
keyId: 'dev-key',
userId: 'dev-user',
permissions: ['read', 'write', 'extract', 'chat'],
rateLimit: {
requestsPerMinute: 60,
requestsPerDay: 10000,
},
};
}

return null;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Dev-only API key stub should be guarded to avoid accidental production use

validateAPIKey currently treats a single process.env.API_KEY as the only valid key, with a TODO to back it by real storage. That’s fine for early development, but it’s easy to accidentally ship this into a multi-tenant or production environment.

Consider at least:

  • Gating this branch on NODE_ENV === 'development' or a dedicated feature flag, and
  • Failing fast (e.g., 500 / startup error) if API_KEY isn’t set when running in non‑dev.

That will make it harder to forget replacing this stub with proper key lookup and per-user/organization metadata.

🤖 Prompt for AI Agents
In src/app/api/v1/utils/auth.ts around lines 24-50, the dev-only API key stub is
currently unguarded; change it so the environment API_KEY is only accepted when
NODE_ENV === 'development' or a dedicated flag like ALLOW_DEV_API_KEY === 'true'
is set, and add a startup-time sanity check that throws an error (or exits) if
NODE_ENV !== 'development' and process.env.API_KEY is missing but the code would
otherwise rely on it; keep the validateAPIKey return behavior the same (return
dev key info only when the guard allows and the key matches, otherwise return
null).

Comment on lines +66 to +95
export function checkRateLimit(keyInfo: APIKeyInfo): {
allowed: boolean;
remaining: number;
resetAt: number;
} {
const now = Date.now();
const windowMs = 60 * 1000; // 1 minute window
const key = `rate:${keyInfo.keyId}`;

let record = rateLimitStore.get(key);

// Reset if window expired
if (!record || record.resetAt < now) {
record = {
count: 0,
resetAt: now + windowMs,
};
}

record.count++;
rateLimitStore.set(key, record);

const remaining = Math.max(0, keyInfo.rateLimit.requestsPerMinute - record.count);

return {
allowed: record.count <= keyInfo.rateLimit.requestsPerMinute,
remaining,
resetAt: record.resetAt,
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rate‑limit helper docs and shape are slightly out of sync

Two small inconsistencies here:

  • The JSDoc says “Returns true if request is allowed, false if rate limited”, but the function actually returns an object { allowed, remaining, resetAt }. The comment should be updated to match the real return type.
  • APIKeyInfo.rateLimit has requestsPerDay, but checkRateLimit only enforces requestsPerMinute. If daily limiting is a near‑term requirement, consider enforcing it here (or add a comment/TODO so the type doesn’t overpromise).

Neither is a runtime bug, but aligning the docs and type usage will reduce surprises for callers.

🤖 Prompt for AI Agents
In src/app/api/v1/utils/auth.ts around lines 66 to 95, the function returns an
object { allowed, remaining, resetAt } but the JSDoc still says it returns a
boolean and the APIKeyInfo.rateLimit type exposes requestsPerDay whereas the
function only enforces per-minute limits; update the JSDoc to describe the
actual return shape (allowed: boolean, remaining: number, resetAt: number) and
either implement daily limiting or explicitly document TODO and leave a short
comment in the function noting only requestsPerMinute is enforced (and if you
implement daily limiting: track a separate daily key/window, increment and check
against requestsPerDay, compute remaining accordingly, and set resetAt to the
smaller of minute/day reset times).

Comment on lines +462 to +469
async function fetchPDFBuffer(url: string): Promise<Buffer> {
const response = await fetch(url);
if (!response.ok) {
throw new Error(`Failed to fetch PDF: ${response.statusText}`);
}
const arrayBuffer = await response.arrayBuffer();
return Buffer.from(arrayBuffer);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Non-URL string inputs are treated as URLs

fetchPDFBuffer unconditionally fetches any string input. If callers ever pass a filesystem path or raw PDF text instead of a URL, this will fail in a non-obvious way.

You may want to:

  • Either document that string inputs must be HTTP(S) URLs, or
  • Detect file:/relative paths and handle them differently (or reject them explicitly with a clearer error).
🤖 Prompt for AI Agents
In src/lib/ocr/ocr-service.ts around lines 462 to 469, fetchPDFBuffer currently
calls fetch on any string which treats non-HTTP(S) inputs (local file paths or
raw PDF text) as URLs and fails ambiguously; update the function to first
validate the input: if it is an HTTP or HTTPS URL (e.g., via new URL(input) and
checking protocol === 'http:' || 'https:'), use fetch as before; if it is a file
URL (protocol === 'file:') or a local/relative path, read the file from disk
with fs.promises.readFile and return that Buffer; otherwise reject with a clear,
specific Error stating that only HTTP(S) URLs or filesystem paths are accepted
(or add JSDoc to mandate HTTP(S) only and throw early for non-HTTP(S) inputs).
Ensure proper async/await and error messages include the original input for
easier debugging.

Comment on lines +474 to +481
async function pdfToImages(buffer: Buffer, pages?: number[]): Promise<Buffer[]> {
// Note: This requires a PDF-to-image library like pdf-poppler or pdf2pic
// For now, return empty array - would need to implement based on your needs

// Placeholder implementation
console.warn('pdfToImages not fully implemented - need pdf-to-image library');
return [];
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

pdfToImages placeholder should fail fast instead of returning empty

Returning an empty array while logging a warning makes upstream code believe conversion succeeded (DeepSeek path in particular). This is more of a silent failure than a deliberate “not implemented”.

Better to throw an OCRError('NOT_IMPLEMENTED', ...) here until a real implementation is added, and gate selection of image-based backends accordingly.

🤖 Prompt for AI Agents
In src/lib/ocr/ocr-service.ts around lines 474 to 481, replace the placeholder
that returns an empty array with a fast-failing behavior: throw an OCRError with
code 'NOT_IMPLEMENTED' and a clear message indicating pdfToImages is not
implemented (include optional details like required library name), so upstream
callers cannot treat a silent empty result as success; ensure the thrown error
uses the existing OCRError type/import and that callers which select image-based
backends handle or propagate this error appropriately.

Comment on lines +506 to +522
export async function simpleExtract(
input: Buffer | string,
options?: {
preserveLayout?: boolean;
password?: string;
}
): Promise<string> {
const result = await extractText({
input,
options: {
preserveLayout: options?.preserveLayout,
password: options?.password,
},
});

return result.text;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

simpleExtract ignores layout even if preserveLayout is set

simpleExtract forwards preserveLayout but then always returns result.text, discarding result.markdown even when layout-preserving extraction is used.

If the goal is truly “simple text only”, you could:

  • Drop the preserveLayout option from this helper, or
  • Return result.markdown ?? result.text when preserveLayout is true.

Right now the option can be misleading for callers.

🤖 Prompt for AI Agents
In src/lib/ocr/ocr-service.ts around lines 506 to 522, the simpleExtract helper
currently forwards preserveLayout but always returns result.text, so callers
requesting layout-preserving output never receive result.markdown; update the
function so that when options?.preserveLayout is true it returns result.markdown
?? result.text, otherwise return result.text; ensure the function signature
keeps or documents the preserveLayout option and handle the case where markdown
is undefined by falling back to plain text.

Comment on lines +527 to +537
export async function extractWithLayout(input: Buffer | string): Promise<string> {
const result = await extractText({
input,
options: {
preserveLayout: true,
method: 'llmwhisperer',
},
});

return result.markdown || result.text;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

extractWithLayout hard-codes LLMWhisperer and doesn’t fall back

extractWithLayout always uses method: 'llmwhisperer'. If LLMWhisperer is misconfigured or quota is exhausted, this will throw instead of falling back to another layout-preserving option (e.g., auto-selection with preserveLayout: true).

Consider delegating method selection (let extractText decide based on quota and availability) and only pinning preserveLayout: true, unless you explicitly want “fail if LLMWhisperer is unavailable”.

🤖 Prompt for AI Agents
In src/lib/ocr/ocr-service.ts around lines 527 to 537, extractWithLayout
currently hard-codes method: 'llmwhisperer' which causes failures if that
provider is misconfigured or out of quota; remove the explicit method option and
only pass preserveLayout: true so extractText can auto-select or fallback as
designed (alternatively, wrap the call in a try/catch and retry without the
pinned method to provide a fallback), ensuring the function still returns
result.markdown || result.text.

Comment on lines +542 to +557
export async function extractForms(
input: Buffer | string
): Promise<{ text: string; formFields: any[] }> {
const result = await extractText({
input,
options: {
detectForms: true,
method: 'textract',
},
});

return {
text: result.text,
formFields: result.formFields || [],
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

extractForms is hard-wired to Textract

extractForms forces method: 'textract', so:

  • It bypasses LLMWhisperer’s processing_mode = 'form' capabilities.
  • It will throw a config error outright when AWS isn’t configured, even if LLMWhisperer is available.

If you want best-effort forms extraction:

  • Let extractText pick the method based on detectForms: true, or
  • Implement a fallback chain: Textract first, then LLMWhisperer form mode.

Also consider returning typed form field structures instead of any[] once the Textract helpers are implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants