Research app features and market demands #1

mnttnm · 2025-11-05T07:32:35Z

Summary by CodeRabbit

Documentation
- Added extensive enterprise analysis, roadmap, and phased implementation plans with prioritized features, success metrics, and a 6‑month focus.
- Added detailed plans for API, data platform integrations, search, compliance/security, observability, multi‑LLM, OCR, and deployment.
New Features
- Multi‑LLM provider support with provider selection, fallback and cost/usage concepts.
- New API endpoints: chat completions, OCR/extract, and health checks for service status and provider/quotas.

This analysis examines Slicely's current capabilities against 2025 IDP market demands and provides strategic feature recommendations based on the 80/20 principle. Key findings: - IDP market projected at $6.78B by 2025 (35-40% CAGR) - 65% of enterprises actively implementing IDP initiatives - Critical gaps: API access, integrations, compliance features Top 5 priorities for enterprise readiness: 1. RESTful API layer (REST + GraphQL) 2. Audit logging & RBAC for compliance 3. Snowflake/Databricks integration 4. Elasticsearch integration for advanced search 5. Multi-LLM provider support Includes: - Market research and competitive analysis - Feature recommendations across 4 tiers - 12-month implementation roadmap - ROI projections (4.1x return) - Pricing strategy for enterprise tiers

Created detailed implementation plans for 5 critical enterprise features based on 80/20 principle analysis: 1. RESTful & GraphQL API Layer (01-api-layer.md) - Complete REST API with CRUD operations - GraphQL endpoint for complex queries - API authentication, rate limiting, versioning - Client SDKs for JavaScript and Python - OpenAPI/Swagger documentation 2. Enterprise Data Platform Integrations (02-data-platform-integrations.md) - Snowflake, Databricks, AWS S3, Azure Blob, GCS - Zapier and Make.com integration - Job queue system with BullMQ - Scheduled exports and batch processing 3. Elasticsearch Integration (03-elasticsearch-integration.md) - Hybrid search (keyword + semantic) - Faceted search and autocomplete - Fuzzy matching and search analytics - Real-time indexing via CDC 4. Compliance & Security Suite (04-compliance-security-suite.md) - Comprehensive audit logging - RBAC with granular permissions - PII detection and redaction - Data retention policies and GDPR tools 5. Multi-LLM Provider Support (05-multi-llm-provider-support.md) - Anthropic Claude, Azure OpenAI, AWS Bedrock, Ollama - Provider abstraction layer - Automatic fallback and cost tracking - Unified interface across all providers Plus comprehensive roadmap (ENTERPRISE_ROADMAP.md): - 12-month timeline with 5 phases - Resource requirements and budget ($726K investment) - Expected ROI: 7.6x ($5.5M ARR) - Success metrics and KPIs - Competitive positioning analysis - Risk mitigation strategies All plans include: - Technical architecture and implementation details - Database schemas and code examples - UI/UX designs - Testing strategies - Rollout plans - Success metrics

vercel · 2025-11-05T07:32:39Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
slicely	Error			Nov 19, 2025 8:33am

coderabbitai · 2025-11-05T07:32:53Z

Walkthrough

Adds enterprise strategy and roadmap docs plus detailed implementation plans (API layer, data integrations, Elasticsearch, compliance, multi‑LLM, OCR). Implements new v1 API utilities and endpoints (chat, extract, health), and adds multi‑provider LLM and multi‑method OCR service modules with types, providers, service logic, and UI selector scaffold.

Changes

Cohort / File(s)	Summary
Enterprise Strategy & Roadmap `ENTERPRISE_FEATURE_ANALYSIS.md`, `docs/ENTERPRISE_ROADMAP.md`	New enterprise analysis and roadmap documents with gap analysis, tiered feature recommendations, five‑phase implementation timeline, GTM/pricing, ROI model, success metrics, and next steps.
Implementation Plans (API, Data, Search, Compliance, Multi‑LLM, Recommendations) `docs/implementation-plans/01-api-layer.md`, `docs/implementation-plans/02-data-platform-integrations.md`, `docs/implementation-plans/03-elasticsearch-integration.md`, `docs/implementation-plans/04-compliance-security-suite.md`, `docs/implementation-plans/05-multi-llm-provider-support.md`, `docs/UPDATED_RECOMMENDATIONS.md`	Adds detailed implementation plans: hybrid API design (REST/GraphQL/tRPC), data destination plugins (S3, Snowflake, Databricks, Zapier), Elasticsearch hybrid search and index mappings, compliance/security suite (audit logs, RBAC, PII, retention/GDPR), multi‑LLM provider architecture, and updated multi‑LLM+OCR recommendations with migration/cost guidance.
API v1 Endpoints & Utilities `src/app/api/v1/chat/route.ts`, `src/app/api/v1/extract/route.ts`, `src/app/api/v1/health/route.ts`, `src/app/api/v1/utils/*`	Adds v1 API routes: POST `/api/v1/chat` (chat completions, streaming/non‑streaming, validation, auth, rate limits), POST `/api/v1/extract` (OCR extraction from base64 or URL with options), GET `/api/v1/health` (service status). Adds v1 auth and error utilities: API key validation, permission checks, rate limiting, standardized error responses and re‑exports.
LLM Library & Provider Surface `src/lib/llm/types.ts`, `src/lib/llm/providers.ts`, `src/lib/llm/llm-service.ts`, `src/lib/llm/provider-registry.ts`, `src/lib/llm/index.ts`	Adds multi‑provider LLM types, configs, model schemas, provider configs (`PROVIDER_CONFIGS`), provider factory/getModel helpers, cost calculation, provider availability, LLM service functions (generateCompletion, streaming, structured responses, embeddings, fallback orchestration, compareProviders), registry/initialization, and an index barrel exporting the surface.
OCR Library & Types `src/lib/ocr/types.ts`, `src/lib/ocr/ocr-service.ts`, `src/lib/ocr/index.ts`	Adds multi‑method OCR types and service: supports LLMWhisperer, DeepSeek, AWS Textract, Tesseract, PDF.js paths; auto method selection, PDF handling, quota checks, structured outputs (tables/forms), errors, usage metadata, and an index barrel.
UI Component (Provider Selector) `components/slicer-settings/LLMProviderSelector.tsx`	Adds a UI component scaffold for selecting LLM provider/models and per‑model pricing metadata.
Package dependencies `package.json`	Adds new dependencies: `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/openai`, `@aws-sdk/client-textract`, and `ai` with specified versions.
SQL Schema Examples (docs) (documented schemas) `llm_configurations`, `llm_usage`, `llm_provider_credentials`	Adds documented SQL schema sketches for LLM provider configurations, usage tracking, and provider credentials (in docs/implementation plans).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant API as Backend/API
    participant Auth as Auth & RateLimit
    participant LLM as LLMService
    participant Primary as PrimaryProvider
    participant Fallback as FallbackProvider
    Note over LLM: Multi‑LLM orchestration and fallback (new)
    Client->>API: POST /api/v1/chat (payload)
    API->>Auth: validate API key & permissions
    Auth-->>API: allowed
    API->>LLM: requestCompletion(payload, preferredProvider)
    alt Primary succeeds
        LLM->>Primary: callCompletion
        Primary-->>LLM: response
        LLM-->>API: response (usage, provider meta)
        API-->>Client: 200 JSON / stream
    else Primary fails
        LLM->>Primary: callCompletion
        Primary--x LLM: error
        LLM->>Fallback: callCompletion
        Fallback-->>LLM: response
        LLM-->>API: response (logs failure, usage)
        API-->>Client: 200 JSON (with provider metadata)
    end

sequenceDiagram
    autonumber
    participant Client
    participant API as Backend/API
    participant Auth as Auth & RateLimit
    participant OCR as OCRService
    participant LLMW as LLMWhisperer
    participant Textract as AWS Textract
    Note over OCR: Multi‑method OCR selection and extraction
    Client->>API: POST /api/v1/extract (file or url + options)
    API->>Auth: validate API key & permissions
    Auth-->>API: allowed
    API->>OCR: extractText(input, options)
    OCR->>OCR: selectOCRMethod (quota, availability, options)
    alt method == llmwhisperer
        OCR->>LLMW: callWhisperer
        LLMW-->>OCR: text + meta
    else method == textract
        OCR->>Textract: analyzeDocument
        Textract-->>OCR: text, tables, forms
    else other
        OCR-->>OCR: use DeepSeek/Tesseract/PDF.js
    end
    OCR-->>API: OCRResult (text, markdown, tables, forms)
    API-->>Client: 200 JSON (includes rate limit headers)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Reasons: Large volume of new documentation plus multiple new runtime modules and API endpoints that introduce cross‑cutting auth, error handling, rate limiting, multi‑provider orchestration, OCR backends, and added dependencies.
Areas needing extra attention:
- Security: API key handling, permission model, credential storage/encryption, and env dependency checks.
- Error and retry semantics across LLM providers and OCR backends (LLMError/OCRError handling and retryability).
- Rate limiting logic and header semantics in v1 utilities and endpoints.
- Consistency between TypeScript types in src/lib/llm/* and provider implementations/configs.
- AWS Textract dynamic imports and dependency use (@aws-sdk/client-textract) and graceful fallback behavior.
- Tests and rollout steps referenced in docs vs. implemented behavior in code (endpoints and services).

"🐰
I hopped through specs and roadmap glades,
APIs and LLMs in layered braids,
OCR and search, a carrot-sweet blend,
New providers dance — I cheer and mend! 🥕"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title 'Research app features and market demands' is vague and does not accurately reflect the substantial implementation work. The changeset includes 14+ files with comprehensive API, OCR, LLM, compliance, and enterprise roadmap implementations—not research documentation.	Revise the title to reflect the actual deliverables, such as 'Add enterprise API layer, multi-LLM support, OCR integration, and compliance suite' or similar to accurately communicate the scope of implementation changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch claude/analyze-app-features-research-011CUpJpDTmF3pQSq778jaTb

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

🧹 Nitpick comments (14)

docs/implementation-plans/02-data-platform-integrations.md (5)
535-547: CSV escaping is incomplete (newlines, quotes, commas).

Current logic misses newline and leading/trailing spaces cases. Use a dedicated escaper.
-      keys.map((key) => {
-        const value = record[key];
-        // Escape quotes and wrap in quotes if contains comma
-        if (typeof value === 'string' && (value.includes(',') || value.includes('"'))) {
-          return `"${value.replace(/"/g, '""')}"`;
-        }
-        return value ?? '';
-      }).join(',')
+      keys.map((key) => {
+        const v = record[key] ?? '';
+        const s = typeof v === 'string' ? v : String(v);
+        const needsQuotes = /[",\n\r]/.test(s);
+        const escaped = s.replace(/"/g, '""');
+        return needsQuotes ? `"${escaped}"` : escaped;
+      }).join(',')
549-566: Parquet writer usage likely incorrect.

parquetjs typically requires a ParquetSchema and a concrete write target; openStream/openBuffer semantics differ. Double-check API, add explicit schema construction, and avoid returning writer.outputStream internals.

Example sketch:
-    const { ParquetWriter } = await import('parquetjs');
-    const schema = this.inferParquetSchema(records[0]);
-    const writer = await ParquetWriter.openStream(schema);
+    const { ParquetWriter, ParquetSchema } = await import('parquetjs');
+    const schema = new ParquetSchema(this.inferParquetSchema(records[0]));
+    const chunks: Buffer[] = [];
+    const writer = await ParquetWriter.openStream(schema, {
+      write: (b: Buffer) => chunks.push(b),
+      end: () => {},
+    } as any);
...
-    return writer.outputStream.getContents();
+    return Buffer.concat(chunks);
894-910: Databricks SQL Statement API likely needs polling.

POSTing a statement usually returns an id; you must poll until status is 'SUCCEEDED' (or use sync param if available). Current code returns immediately.

921-925: Coupling: reusing private method via bracket access.

Calling S3Integration['toParquet'] breaks encapsulation and minifies poorly. Extract a shared parquet utility.

229-248: Schemas: add guardrails (constraints, RLS, audit).

Add NOT NULL where appropriate (plugin_type, config).

Unique constraint per organization on (name) to prevent duplicates.

Add updated_at trigger ON UPDATE.

Consider RLS policies by organization_id/user_id.

export_jobs: add updated_at, an error_code field, and composite index (destination_id,status,created_at) for queues.

Also applies to: 268-301, 304-327
docs/implementation-plans/01-api-layer.md (1)

917-964: Error responses may leak internal messages.

Avoid echoing raw error.message to clients for 5xx and security-sensitive 4xx; use stable titles/details and map codes.

docs/implementation-plans/04-compliance-security-suite.md (2)

845-875: GDPR deletion: review deletion of audit logs.

Some regimes require retaining certain logs; blanket deletion of audit_logs by user_id may violate retention/legal hold. Gate by policy and legal basis.

477-485: Use cloud-native credentials (IAM roles) over static keys.

Avoid long-lived AWS keys in env; prefer IAM roles (e.g., IRSA on Kubernetes, task roles on ECS/Lambda).

docs/implementation-plans/05-multi-llm-provider-support.md (3)

444-457: Azure provider method incomplete.

completion() returns partial code with "same as OpenAI" comment; fill in or mark as TODO to prevent accidental use.

760-811: Add timeouts, retries, and per-provider error taxonomy.

LLM calls are flaky. Wrap provider.completion with bounded timeouts, limited retries, and map provider errors to consistent codes; also support cancellation.

813-823: Usage tracking: add org/user context and request type.

Insert organization_id, user_id, request_type to llm_usage for accurate cost allocation.

docs/implementation-plans/03-elasticsearch-integration.md (1)

232-236: Many indices (one per user) will strain ES.

Prefer a shared index with user_id as a keyword and filter queries, plus index-level RBAC if needed.

docs/ENTERPRISE_ROADMAP.md (1)

570-586: Optional: clarify infra cost assumptions.

Add footnotes for managed vs self-hosted costs and expected workload (docs/month, QPS) to make comparisons actionable.

ENTERPRISE_FEATURE_ANALYSIS.md (1)

1201-1201: Markdown hygiene: fix headings, lists, and bare URL.

Replace emphasized lines acting as headings with proper markdown headings (##/###).

Normalize list indentation (MD007).

Convert bare URLs to links.

Also applies to: 1487-1504

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9b54f2 and 551c12c.

📒 Files selected for processing (7)

ENTERPRISE_FEATURE_ANALYSIS.md (1 hunks)
docs/ENTERPRISE_ROADMAP.md (1 hunks)
docs/implementation-plans/01-api-layer.md (1 hunks)
docs/implementation-plans/02-data-platform-integrations.md (1 hunks)
docs/implementation-plans/03-elasticsearch-integration.md (1 hunks)
docs/implementation-plans/04-compliance-security-suite.md (1 hunks)
docs/implementation-plans/05-multi-llm-provider-support.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

ENTERPRISE_FEATURE_ANALYSIS.md

[uncategorized] ~1250-~1250: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... outputs) - Week 3: Authentication & rate limiting - Week 4: API documentation (Swagger...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.18.1)

ENTERPRISE_FEATURE_ANALYSIS.md

152-152: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

158-158: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

165-165: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

171-171: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

177-177: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

336-336: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

365-365: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

385-385: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

404-404: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

411-411: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

430-430: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

438-438: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

477-477: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

500-500: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)

501-501: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)

502-502: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)

609-609: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

629-629: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

654-654: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

678-678: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

687-687: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

703-703: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

753-753: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

773-773: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

805-805: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

821-821: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

828-828: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

858-858: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

869-869: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

879-879: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

898-898: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

906-906: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

951-951: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

969-969: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

990-990: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

999-999: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1036-1036: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1053-1053: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1081-1081: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1094-1094: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1123-1123: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1145-1145: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1168-1168: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1201-1201: Bare URL used

(MD034, no-bare-urls)

1487-1487: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1495-1495: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

1504-1504: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

docs/implementation-plans/03-elasticsearch-integration.md

52-52: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (1)

docs/implementation-plans/03-elasticsearch-integration.md (1)

220-236: indices.exists return handling.

@elastic/elasticsearch v8 may not return a boolean unless you use the transport option; verify and adapt to the actual return shape.

coderabbitai · 2025-11-05T07:39:37Z

docs/implementation-plans/01-api-layer.md

+CREATE INDEX idx_api_keys_user_id ON api_keys(user_id);
+CREATE INDEX idx_api_keys_key_hash ON api_keys(key_hash);
+CREATE UNIQUE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix);
+


⚠️ Potential issue | 🔴 Critical

Unique index on key_prefix will block multiple keys.

key_prefix holds 'sk_live_'/'sk_test_'; making it UNIQUE allows only one key per environment.

Fix by dropping uniqueness or using a separate random prefix/id:

-CREATE UNIQUE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix); +-- REMOVE unique index; optionally: +-- CREATE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix);

🤖 Prompt for AI Agents

In docs/implementation-plans/01-api-layer.md around lines 153 to 156 the CREATE UNIQUE INDEX on api_keys(key_prefix) is incorrect because key_prefix contains environment prefixes like 'sk_live_'/'sk_test_' and will prevent storing multiple keys per environment; remove the UNIQUE constraint by changing to a non-unique index (CREATE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix)) or alternatively add a dedicated, random unique identifier column (e.g., key_id or key_prefix_random) and create a UNIQUE index on that new column while keeping key_prefix non-unique.

coderabbitai · 2025-11-05T07:39:38Z

docs/implementation-plans/01-api-layer.md

+  // Look up API key in database
+  const apiKeyRecord = await supabase
+    .from('api_keys')
+    .select('*, user:users(*)')
+    .eq('key_hash', keyHash)
+    .eq('is_active', true)
+    .single();
+


⚠️ Potential issue | 🟠 Major

Don’t join against auth.users; return user_id instead.

Supabase PostgREST cannot join to auth.users in most setups; selecting user:users(*) likely fails. Return user_id and fetch via Admin API if needed, or maintain an application users table.

- .select('*, user:users(*)') + .select('id, user_id, scopes, rate_limit_tier, expires_at, is_active') ... -return apiKeyRecord.user; +return { id: apiKeyRecord.user_id };

Also applies to: 709-710

🤖 Prompt for AI Agents

In docs/implementation-plans/01-api-layer.md around lines 685 to 692 (and also apply the same change at lines 709-710), the code is joining to auth.users via select('*, user:users(*)') which will fail in most Supabase/PostgREST setups; change the query to return the foreign key (user_id) instead of trying to join to auth.users, then fetch the full user record separately via the Supabase Admin API or from your application users table as needed; update both occurrences to select the user_id and add a follow-up step to retrieve the user using the appropriate admin/user-table method.

coderabbitai · 2025-11-05T07:39:38Z

docs/implementation-plans/01-api-layer.md

+export async function checkRateLimit(apiKeyId: string, tier: string): Promise<void> {
+  const limiter = rateLimiters[tier] || rateLimiters.standard;
+
+  const { success, limit, reset, remaining } = await limiter.limit(apiKeyId);
+
+  if (!success) {
+    throw new APIError('Rate limit exceeded', 429, {
+      'X-RateLimit-Limit': limit.toString(),
+      'X-RateLimit-Remaining': '0',
+      'X-RateLimit-Reset': reset.toString(),
+      'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),
+    });
+  }
+
+  // Set rate limit headers (return these in response)
+  return {
+    'X-RateLimit-Limit': limit.toString(),
+    'X-RateLimit-Remaining': remaining.toString(),
+    'X-RateLimit-Reset': reset.toString(),
+  };
+}
+```


⚠️ Potential issue | 🟠 Major

Type mismatch: function returns headers but signature says void.

checkRateLimit currently returns a headers object; update return type and callers.

-export async function checkRateLimit(apiKeyId: string, tier: string): Promise<void> { +export async function checkRateLimit(apiKeyId: string, tier: string): Promise<Record<string,string>> {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export async function checkRateLimit(apiKeyId: string, tier: string): Promise<void> {

const limiter = rateLimiters[tier] || rateLimiters.standard;

const { success, limit, reset, remaining } = await limiter.limit(apiKeyId);

if (!success) {

throw new APIError('Rate limit exceeded', 429, {

'X-RateLimit-Limit': limit.toString(),

'X-RateLimit-Remaining': '0',

'X-RateLimit-Reset': reset.toString(),

'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),

});

}

// Set rate limit headers (return these in response)

return {

'X-RateLimit-Limit': limit.toString(),

'X-RateLimit-Remaining': remaining.toString(),

'X-RateLimit-Reset': reset.toString(),

};

}

```

export async function checkRateLimit(apiKeyId: string, tier: string): Promise<Record<string,string>> {

const limiter = rateLimiters[tier] || rateLimiters.standard;

const { success, limit, reset, remaining } = await limiter.limit(apiKeyId);

if (!success) {

throw new APIError('Rate limit exceeded', 429, {

'X-RateLimit-Limit': limit.toString(),

'X-RateLimit-Remaining': '0',

'X-RateLimit-Reset': reset.toString(),

'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(),

});

}

// Set rate limit headers (return these in response)

return {

'X-RateLimit-Limit': limit.toString(),

'X-RateLimit-Remaining': remaining.toString(),

'X-RateLimit-Reset': reset.toString(),

};

}

🤖 Prompt for AI Agents

In docs/implementation-plans/01-api-layer.md around lines 779 to 800, the function checkRateLimit is declared to return Promise<void> but actually returns an object of headers; change the signature to return Promise<Record<string,string>> (or a named Headers type) and update all callers to await the function and use the returned headers (or merge them into the response) instead of ignoring the result; keep the existing APIError throw behavior intact and ensure typing reflects header string values.

coderabbitai · 2025-11-05T07:39:38Z

docs/implementation-plans/02-data-platform-integrations.md

+  const supabase = createClient(
+    process.env.SUPABASE_URL!,
+    process.env.SUPABASE_SERVICE_ROLE_KEY! // Service role key required
+  );
+


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Service role key must never be exposed to clients.

Ensure createClient with SUPABASE_SERVICE_ROLE_KEY runs only in trusted server contexts (no edge/functions with potential leakage, no client bundles). Add a guardrail comment and environment segregation.

I'll verify this review comment by examining the file context and consulting Supabase security best practices.

🏁 Script executed:

#!/bin/bash # Examine the file and context around lines 344-348 cat "docs/implementation-plans/02-data-platform-integrations.md" | sed -n '330,360p'

Length of output: 848

🏁 Script executed:

#!/bin/bash # Search for other uses of SUPABASE_SERVICE_ROLE_KEY in the codebase rg -n "SUPABASE_SERVICE_ROLE_KEY" -C 5

Length of output: 2074

🏁 Script executed:

#!/bin/bash # Check for any environment configuration or guardrails rg -n "createClient" -C 3 --type=ts --type=js --type=tsx --type=jsx

Length of output: 27

🏁 Script executed:

#!/bin/bash # Verify the file type and its purpose head -50 "docs/implementation-plans/02-data-platform-integrations.md"

Length of output: 1461

🌐 Web query:

What are Supabase service role key security best practices and risks?

💡 Result:

Summary — risks

Full-bypass of Row Level Security (RLS): service_role keys bypass RLS so a leak = full read/modify/delete of your DB. [1][2]

Data breach & compliance exposure: attacker access → exfiltration, tampering, regulatory/reporting incidents. [6]

Supply‑chain/public repo leaks: accidental commits or CI logs can expose keys. [7]

Operational impact: abused keys can break apps, cause cost spikes, or force emergency rotation/restarts. [4][7]

Security best practices (concise)

Never use service_role in client-side code—keep it strictly server-side or in trusted runtime (server, Edge Function, or private job). [1][2][5]

Minimize use: prefer anon key + RLS for normal app flows; only use service_role for backend jobs that truly need bypass. [1][6]

Use least privilege patterns: create narrow APIs (server endpoints, RPCs) that perform needed operations rather than giving direct DB access with the key. Consider Postgres roles/RPCs to limit scope. [5]

Protect secrets in a vault/secret manager or environment secrets (not in code/commits). Use cloud secret managers or Supabase Edge Function secrets. [0][1]

Rotate regularly and on suspicion: rotate service_role/JWT secrets promptly; be aware rotation invalidates active connections. Automate rotation if possible. [8]

Monitor & alert: enable auditing/logging, watch unusual queries or bulk exports, and alert on anomalous use. Integrate secret-scanning for repos. [7][6]

Revoke and remediate fast: have an incident playbook to revoke keys, rotate, and re-deploy with new secrets. [8][7]

Prefer Edge Functions / serverless with scoped secrets for third‑party webhooks/integrations (e.g., Stripe) instead of embedding service_role in backend code accessible to multiple teams. [0][5]

Use network controls where possible: IP allowlists for backend jobs or service accounts, and enforce TLS. [5]

Quick operational checklist

Audit where service_role is used and replace with RPCs or server endpoints where possible. [5]

Move keys to secret manager; remove from repos/CI logs; add secret scanning. [7]

Implement monitoring + alerts for abnormal queries or large exports. [7][6]

Document rotation process and test it (note: rotation restarts services). [8]

Sources

Supabase docs — securing data; service_role guidance (never expose, use server-side). [1]

Supabase docs — Edge Functions environment secrets & guidance (service_role usable in Edge Functions, not browsers). [0]

Supabase docs — API/serverless guidance; service_role explanations and safeguards. [5]

Supabase docs — rotating anon/service/JWT secrets (procedure & impact). [8]

Community/ops guidance on protecting keys, using server endpoints/RPCs and edge functions. [3][6]

GitGuardian / security posts on risks and detection of leaked service_role tokens. [7]

Add explicit server-only context and environment segregation guardrails to documentation.

The documentation currently shows correct backend pattern but lacks critical security warnings. Per Supabase best practices, service_role keys must never be used in client-side code and should be kept strictly server-side. The existing comment "Service role key required" does not adequately communicate this constraint.

Update the code comment and add a preceding caution block to explicitly state:

This function runs server-side only (no client bundles, no edge functions accessible to untrusted code)

Service role key must reside in .env with appropriate access controls

Environment segregation: never log, expose, or commit the key

Link to Supabase vault security best practices for implementers

🤖 Prompt for AI Agents

In docs/implementation-plans/02-data-platform-integrations.md around lines 344 to 348, the inline comment beside the Supabase service_role key is insufficiently warning — add a clear server-only caution block immediately above the code that states this function must run server-side only (no client bundles or publicly accessible edge functions), that the service_role key must be stored in a protected .env with strict access controls and never logged, exposed, or committed, and update the inline comment to explicitly state "Server-side only — do not expose service_role key to clients"; also include a link to Supabase vault/security best practices for implementers and recommend environment segregation guardrails (no client access, restrict CI/ops, and rotate keys).

coderabbitai · 2025-11-05T07:39:38Z

docs/implementation-plans/02-data-platform-integrations.md

+  // Get credential record
+  const { data: credRecord, error } = await supabase
+    .from('encrypted_credentials')
+    .select('vault_secret_id')
+    .eq('id', credentialId)
+    .single();
+
+  if (error) {
+    throw new Error(`Credential not found: ${error.message}`);
+  }
+
+  // Retrieve from Vault
+  const { data: vaultSecret, error: vaultError } = await supabase
+    .from('vault.secrets')
+    .select('secret')
+    .eq('id', credRecord.vault_secret_id)
+    .single();
+
+  if (vaultError) {
+    throw new Error(`Failed to retrieve credentials: ${vaultError.message}`);
+  }
+
+  return JSON.parse(vaultSecret.decrypted_secret); // Automatically decrypted
+}
+```


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Supabase Vault retrieval bug and table access assumptions.

You select "secret" from vault.secrets but later read vaultSecret.decrypted_secret. Field mismatch; will throw. Also, accessing "vault.secrets" via supabase-js may not work depending on Vault exposure/permissions.

Suggested fix:

- const { data: vaultSecret, error: vaultError } = await supabase - .from('vault.secrets') - .select('secret') + const { data: vaultSecret, error: vaultError } = await supabase + .from('vault.secrets') + .select('decrypted_secret') .eq('id', credRecord.vault_secret_id) .single(); - return JSON.parse(vaultSecret.decrypted_secret); // Automatically decrypted + return JSON.parse(vaultSecret.decrypted_secret);

Also verify that Vault access is server-only and enabled in your project; consider a dedicated secrets API rather than querying the table directly.

[/run_scripts]

[/web_search]

🏁 Script executed:

#!/bin/bash # Verify the file exists and check its actual content at the specified lines if [ -f "docs/implementation-plans/02-data-platform-integrations.md" ]; then echo "=== File exists, showing lines 389-413 ===" sed -n '389,413p' "docs/implementation-plans/02-data-platform-integrations.md" echo "" echo "=== Context: lines 380-420 ===" sed -n '380,420p' "docs/implementation-plans/02-data-platform-integrations.md" else echo "File not found" fi

Length of output: 1688

🌐 Web query:

Supabase Vault API select fields decrypted_secret correct usage

💡 Result:

Short answer: query the provided view vault.decrypted_secrets and include the decrypted_secret column in your SELECT. The view decrypts on-the-fly, so usage is just normal SQL, but you must ensure the DB role you’re using has permission to read the view (and you should lock down access because anyone who can select from the view sees plaintext). Example:

Basic:
select id, name, decrypted_secret from vault.decrypted_secrets;

By id:
select id, name, decrypted_secret
from vault.decrypted_secrets
where id = '7095d222-efe5-4cd5-b5c6-5755b451e223';

Notes:

decrypted_secret is exposed only via the view (vault.secrets stores the ciphertext). The view decrypts at query time. Protect access with proper privileges/RLS. [1][2]

Sources:
[1] Supabase Vault docs — Using Vault / decrypted_secrets.
[2] supabase/vault GitHub README (examples showing decrypted_secret in the view).

Fix Supabase Vault table and field references.

The code queries vault.secrets but references a field that only exists on the vault.decrypted_secrets view. The view decrypts on-the-fly, so decrypted_secret is exposed only via the view, not the underlying table. Update to:

const { data: vaultSecret, error: vaultError } = await supabase - .from('vault.secrets') - .select('secret') + .from('vault.decrypted_secrets') + .select('decrypted_secret') .eq('id', credRecord.vault_secret_id) .single(); - return JSON.parse(vaultSecret.decrypted_secret); // Automatically decrypted + return JSON.parse(vaultSecret.decrypted_secret);

Also verify that Vault access is restricted to server-only contexts and that SERVICE_ROLE_KEY usage here is intentional; consider whether a dedicated secrets management API would be safer than direct table access.

🤖 Prompt for AI Agents

In docs/implementation-plans/02-data-platform-integrations.md around lines 389 to 413, the code queries the underlying vault.secrets table but then tries to read decrypted_secret which only exists on the vault.decrypted_secrets view; change the Supabase query to select from the vault.decrypted_secrets view and select the decrypted_secret field (e.g., .from('vault.decrypted_secrets').select('decrypted_secret').eq('id', credRecord.vault_secret_id).single()), and ensure this code runs only in a server-only context using the SERVICE_ROLE_KEY (or replace direct DB access with a dedicated secrets-management API) so decryption access is restricted to trusted backend environments.

coderabbitai · 2025-11-05T07:39:39Z

docs/implementation-plans/03-elasticsearch-integration.md

+            <p className="text-sm text-muted-foreground">
+              {results.length} results ({took}ms)
+            </p>
+


⚠️ Potential issue | 🟡 Minor

UI uses took variable but it’s not defined in component state.

Add took to state from API response to avoid runtime errors.

🤖 Prompt for AI Agents

In docs/implementation-plans/03-elasticsearch-integration.md around lines 792 to 795, the JSX references a took variable that is not defined in the component state causing runtime errors; add a took state property (e.g., initialize via useState(0) or include in the component's state object) and set its value from the search API response when results are fetched (parse the response.took and update state alongside results), and ensure the UI uses this state variable so took is never undefined.

coderabbitai · 2025-11-05T07:39:39Z

docs/implementation-plans/04-compliance-security-suite.md

+  private sanitizePayload(payload: any): any {
+    if (!payload) return null;
+
+    // Remove sensitive fields
+    const sensitiveFields = ['password', 'apiKey', 'token', 'secret', 'creditCard'];
+    const sanitized = { ...payload };
+
+    for (const field of sensitiveFields) {
+      if (field in sanitized) {
+        sanitized[field] = '[REDACTED]';
+      }
+    }
+
+    return sanitized;
+  }


⚠️ Potential issue | 🟠 Major

Sanitization should be deep and include nested secrets.

Current sanitizePayload only redacts top-level fields. Traverse recursively and handle nested arrays/objects.

🤖 Prompt for AI Agents

In docs/implementation-plans/04-compliance-security-suite.md around lines 195 to 209, sanitizePayload only redacts top-level keys; update it to perform a deep recursive traversal that handles nested objects and arrays, redacting any sensitive keys (password, apiKey, token, secret, creditCard) at any depth and returning a new sanitized structure (avoid mutating original), skip or safely handle circular references (e.g., track visited nodes) and preserve non-sensitive primitive values unchanged.

coderabbitai · 2025-11-05T07:39:39Z

docs/implementation-plans/04-compliance-security-suite.md

+// Middleware for Next.js API routes
+export function auditMiddleware(handler: any) {
+  return async (req: NextRequest, res: NextResponse) => {
+    const startTime = Date.now();
+    const requestId = req.headers.get('X-Request-ID') || crypto.randomUUID();
+
+    try {
+      const response = await handler(req, res);
+
+      // Log successful request
+      await auditLogger.log({
+        userId: req.user?.id,
+        eventType: `api.${req.method.toLowerCase()}`,
+        eventCategory: 'data',
+        action: this.mapMethodToAction(req.method),
+        resourceType: this.extractResourceType(req.url),
+        ipAddress: req.ip,
+        userAgent: req.headers.get('User-Agent'),
+        requestId,
+        requestMethod: req.method,
+        requestPath: req.url,
+        responseStatus: response.status,
+        status: 'success',
+        metadata: {
+          responseTime: Date.now() - startTime,
+        },
+      });
+
+      return response;
+    } catch (error: any) {
+      // Log failed request
+      await auditLogger.log({
+        userId: req.user?.id,
+        eventType: `api.${req.method.toLowerCase()}.failed`,
+        eventCategory: 'data',
+        action: this.mapMethodToAction(req.method),
+        resourceType: this.extractResourceType(req.url),
+        ipAddress: req.ip,
+        userAgent: req.headers.get('User-Agent'),
+        requestId,
+        requestMethod: req.method,
+        requestPath: req.url,
+        status: 'error',
+        errorMessage: error.message,
+        severity: error.statusCode === 403 ? 'warning' : 'info',
+      });
+
+      throw error;
+    }
+  };
+}
+```


⚠️ Potential issue | 🔴 Critical

Middleware uses this. in a closure — will be undefined.*

this.mapMethodToAction / this.extractResourceType will throw. Use module-level helpers or call static functions.

- action: this.mapMethodToAction(req.method), - resourceType: this.extractResourceType(req.url), + action: mapMethodToAction(req.method), + resourceType: extractResourceType(req.url),

Provide those helpers in scope.

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2025-11-05T07:39:39Z

docs/implementation-plans/04-compliance-security-suite.md

+    const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
+    while ((match = emailRegex.exec(text)) !== null) {
+      entities.push({
+        type: 'EMAIL',
+        value: match[0],
+        start: match.index,
+        end: match.index + match[0].length,
+        confidence: 0.95,
+      });
+    }


⚠️ Potential issue | 🟠 Major

Email regex includes a literal '|' in TLD class.

[A-Z|a-z]{2,} allows pipe char.

-const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g; +const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;

while ((match = emailRegex.exec(text)) !== null) {

entities.push({

type: 'EMAIL',

value: match[0],

start: match.index,

end: match.index + match[0].length,

confidence: 0.95,

});

}

const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g;

while ((match = emailRegex.exec(text)) !== null) {

entities.push({

type: 'EMAIL',

value: match[0],

start: match.index,

end: match.index + match[0].length,

confidence: 0.95,

});

}

🤖 Prompt for AI Agents

In docs/implementation-plans/04-compliance-security-suite.md around lines 574-583, the email regex character class [A-Z|a-z]{2,} wrongly allows the pipe character; replace that class with a correct TLD character class (e.g. [A-Za-z]{2,}) or remove the explicit case range and add the case-insensitive flag (/i) to the regex, ensuring the pattern no longer permits '|' and continues to match valid TLDs.

coderabbitai · 2025-11-05T07:39:39Z

docs/implementation-plans/04-compliance-security-suite.md

+  private async deleteRecords(resourceType: string, records: any[]): Promise<void> {
+    const ids = records.map((r) => r.id);
+
+    const tableName = resourceType === 'pdf' ? 'pdfs' :
+                      resourceType === 'output' ? 'outputs' :
+                      'audit_logs';
+
+    await supabase.from(tableName).delete().in('id', ids);
+  }


⚠️ Potential issue | 🟠 Major

Retention deletion fallback risks deleting audit logs by mistake.

Defaulting unknown resourceType to 'audit_logs' is dangerous. Fail closed or throw on unknown type.

- const tableName = resourceType === 'pdf' ? 'pdfs' : - resourceType === 'output' ? 'outputs' : - 'audit_logs'; + const tableName = + resourceType === 'pdf' ? 'pdfs' : + resourceType === 'output' ? 'outputs' : + resourceType === 'audit_log' ? 'audit_logs' : + null; + if (!tableName) throw new Error(`Unsupported resource type: ${resourceType}`);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

private async deleteRecords(resourceType: string, records: any[]): Promise<void> {

const ids = records.map((r) => r.id);

const tableName = resourceType === 'pdf' ? 'pdfs' :

resourceType === 'output' ? 'outputs' :

'audit_logs';

await supabase.from(tableName).delete().in('id', ids);

}

private async deleteRecords(resourceType: string, records: any[]): Promise<void> {

const ids = records.map((r) => r.id);

const tableName =

resourceType === 'pdf' ? 'pdfs' :

resourceType === 'output' ? 'outputs' :

resourceType === 'audit_log' ? 'audit_logs' :

null;

if (!tableName) throw new Error(`Unsupported resource type: ${resourceType}`);

await supabase.from(tableName).delete().in('id', ids);

}

🤖 Prompt for AI Agents

In docs/implementation-plans/04-compliance-security-suite.md around lines 767 to 775, the deleteRecords function currently defaults unknown resourceType to 'audit_logs', which risks deleting audit logs by mistake; change the implementation to explicitly map allowed resourceType values ('pdf' -> 'pdfs', 'output' -> 'outputs') and throw an error (or reject) for any unknown resourceType, performing no deletion; also validate that ids is non-empty before calling supabase.delete() to avoid unnecessary operations.

Based on industry research and best practices, updated implementation recommendations to use battle-tested, widely-adopted frameworks: ## Multi-LLM Provider: Vercel AI SDK (Recommended) Previous approach: Custom provider abstraction layer Updated approach: Vercel AI SDK + optional LiteLLM Benefits: - 100+ providers with unified interface - Native streaming, tool calling, structured outputs - Model Context Protocol support - Better maintained (Vercel backing) - Perfect fit for TypeScript/Next.js stack - 12K+ GitHub stars Alternative: LiteLLM for Python pipelines - 1.8K+ models, 80+ providers - Built-in cost tracking, load balancing, guardrails - 20K+ GitHub stars, Y Combinator backed - Can run as AI Gateway proxy server ## OCR Solutions: Multi-Method Approach Previous: AWS Textract only Updated: LLMWhisperer + DeepSeek-OCR + AWS Textract + Tesseract 1. LLMWhisperer (Unstract) - Primary - Layout preservation for LLM preprocessing - Free tier: 100 pages/day - Multiple modes: native_text, high_quality, form - Document enhancement (de-skewing, contrast) 2. DeepSeek-OCR - High-Volume - 97% accuracy with 10x compression - 200K+ pages/day on single A100 GPU - Self-hosted, $0 per page - Best for complex layouts, formulas 3. AWS Textract - Forms & Tables - Enterprise-grade reliability - Best for forms, key-value extraction - Pay-per-use 4. Tesseract - Fallback - Free, open-source - Simple text extraction Cost savings: 60-80% vs AWS Textract only Includes: - Detailed Vercel AI SDK implementation examples - LiteLLM proxy server configuration - OCR decision logic and method selection - Code examples for all solutions - Migration path from current implementation - Cost comparison and architecture diagrams - Updated timeline: 4-5 weeks vs 6 weeks previous Recommendation: Adopt Vercel AI SDK + LLMWhisperer as primary, with DeepSeek-OCR for high-volume and AWS Textract for forms.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

docs/UPDATED_RECOMMENDATIONS.md (1)
690-694: Clarify DeepSeek-OCR hardware requirements.

The hardware requirements section (lines 690-693) mentions "NVIDIA A100-40G (minimum)" for production but also notes "RTX 4090 (slower but works)". This is ambiguous—clarify whether A100 is the recommended minimum or an aspirational target, and provide more realistic entry-level GPU options.

Revise to:
 **Hardware Requirements:**
-- **Minimum:** NVIDIA A100-40G (for production speed)
-- **Budget:** RTX 4090 (slower but works)
-- **Performance:** 2500 tokens/s on A100, 200K+ pages/day
+- **Recommended (Production):** NVIDIA A100-40G or H100 (2500+ tokens/s, 200K+ pages/day)
+- **Budget Alternative:** RTX 4090 (significantly slower, ~200-400 tokens/s)
+- **Minimum:** RTX 3090 or A10 (not recommended for high-volume production)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 551c12c and 901020c.

📒 Files selected for processing (1)

docs/UPDATED_RECOMMENDATIONS.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

docs/UPDATED_RECOMMENDATIONS.md

[grammar] ~800-~800: Use a hyphen to join words.
Context: ...support (self-hosted) - ✅ Implement cost tracking middleware - ✅ UI for provider ...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)

docs/UPDATED_RECOMMENDATIONS.md

431-431: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

831-831: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (3)

docs/UPDATED_RECOMMENDATIONS.md (3)
302-329: Code example references undefined supabase import.

The cost-tracking middleware code (lines 316-323) calls supabase.from('llm_usage').insert() without showing the import or initialization. While this is documentation, it could mislead readers about the source of this object.

Add a comment or code snippet showing how supabase is imported and initialized, e.g.:
// Add near the top of the middleware code example
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_ANON_KEY!
);
1-11: Document coordinates well with PR context.

The document appropriately frames itself as an update to previous recommendations and aligns well with the enterprise roadmap and implementation plans mentioned in the enriched summary. The structured approach (comparison → detailed implementation → costs → timeline) is logical and professional.

829-880: Architecture diagrams are clear and comprehensive.

The two ASCII architecture diagrams effectively illustrate the recommended system design (Vercel AI SDK frontend, OCR service backend, and optional LiteLLM proxy). They provide good visual reference for implementers.

coderabbitai · 2025-11-13T11:54:30Z

docs/UPDATED_RECOMMENDATIONS.md

+npm install ai @ai-sdk/openai @ai-sdk/anthropic @ai-sdk/google @ai-sdk/mistral
+```
+
+### 2.2 Provider Configuration
+
+```typescript
+// src/lib/llm/providers.ts
+
+import { openai } from '@ai-sdk/openai';
+import { anthropic } from '@ai-sdk/anthropic';
+import { google } from '@ai-sdk/google';
+import { createOpenAI } from '@ai-sdk/openai'; // For Azure, custom endpoints
+
+// OpenAI
+export const openaiProvider = openai;
+
+// Anthropic Claude
+export const claudeProvider = anthropic;
+
+// Google Gemini
+export const geminiProvider = google;
+
+// Azure OpenAI (custom endpoint)
+export const azureProvider = createOpenAI({
+  baseURL: process.env.AZURE_OPENAI_ENDPOINT,
+  apiKey: process.env.AZURE_OPENAI_KEY,
+  organization: '',
+});
+
+// Ollama (local/self-hosted)
+export const ollamaProvider = createOpenAI({
+  baseURL: 'http://localhost:11434/v1',
+  apiKey: 'ollama', // Ollama doesn't require real API key
+});
+
+// AWS Bedrock (via custom provider)
+export const bedrockProvider = createOpenAI({
+  baseURL: process.env.BEDROCK_ENDPOINT,
+  apiKey: process.env.AWS_ACCESS_KEY,
+  // Note: Bedrock requires AWS signature, may need middleware
+});
+```


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify Vercel AI SDK provider package availability and naming.

The installation and provider configuration code uses specific package names (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, @ai-sdk/mistral). Ensure these packages exist and are the correct names in the current npm registry.

🌐 Web query:

What are the correct and latest official Vercel AI SDK provider packages available as of November 2025?

💡 Result:

Here are the official Vercel AI SDK provider packages (first‑party / Vercel‑maintained) available as of November 13, 2025:

@ai-sdk/openai. [1][2]

@ai-sdk/anthropic. [1][2]

@ai-sdk/google (Google Generative AI). [1][2]

@ai-sdk/google-vertex (Google Vertex). [1][2]

@ai-sdk/azure (Azure AI). [3][2]

@ai-sdk/cohere. [1][2]

@ai-sdk/mistral. [4][2]

@ai-sdk/xai (xAI / Grok). [4][2]

@ai-sdk/vercel (Vercel provider / v0 API). [5][2]

@ai-sdk/gateway (AI Gateway — unified access to multiple providers). [6][2]

Notes:

Some community or third‑party provider packages (e.g., Ollama adapters, Mem0, Auth0 integrations) exist but are not first‑party Vercel packages — check their repos/docs if you need them. [1][5]

Sources: Vercel blog & changelog, Vercel Docs (AI Gateway / Models & Providers), and AI SDK provider documentation. [1][2][3][4][6]

References

Vercel — Introducing Vercel AI SDK (providers added).

Vercel Docs — Models & Providers / AI Gateway documentation.

Vercel changelog — Azure AI provider announcement (Jun 12, 2024).

AI SDK provider listings (examples showing xAI, Mistral, OpenAI, Anthropic, etc.).

npm / @ai-sdk/vercel package listing.

AI SDK docs — @ai-sdk/gateway provider.

Update npm install and use official @ai-sdk/azure package for Azure OpenAI.

The packages @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, and @ai-sdk/mistral are correct and available. However, the provider configuration needs updates:

Add @ai-sdk/azure to the npm install command

Replace the Azure provider implementation with the official @ai-sdk/azure package instead of createOpenAI() with custom baseURL settings

🤖 Prompt for AI Agents

In docs/UPDATED_RECOMMENDATIONS.md around lines 74 to 115, the npm install and Azure provider usage are incorrect: add @ai-sdk/azure to the install command and replace the current Azure provider that uses createOpenAI() with the official @ai-sdk/azure package. Update the top imports to import the Azure factory from @ai-sdk/azure (instead of using createOpenAI from @ai-sdk/openai), then instantiate the azureProvider via that Azure package’s factory/function passing the AZURE endpoint and key from environment variables (and any required Azure-specific options), and remove the custom createOpenAI-based Azure configuration. Ensure the npm install example includes @ai-sdk/azure and the code example imports and constructs the Azure provider using the official package and env vars.

coderabbitai · 2025-11-13T11:54:30Z

docs/UPDATED_RECOMMENDATIONS.md

+### 2.6 Cost Tracking (Add Custom Middleware)
+
+```typescript
+// Middleware for cost tracking
+import { experimental_wrapLanguageModel as wrapLanguageModel } from 'ai';
+
+function withCostTracking(model: any, providerId: string) {
+  return wrapLanguageModel({
+    model,
+    middleware: {
+      transformParams: async ({ params }) => params,
+      wrapGenerate: async ({ doGenerate, params }) => {
+        const startTime = Date.now();
+        const result = await doGenerate();
+
+        // Track usage
+        await supabase.from('llm_usage').insert({
+          provider: providerId,
+          model: params.model,
+          input_tokens: result.usage.promptTokens,
+          output_tokens: result.usage.completionTokens,
+          cost: calculateCost(providerId, result.usage),
+          duration_ms: Date.now() - startTime,
+        });
+
+        return result;
+      },
+    },
+  });
+}
+
+// Usage
+const trackedModel = withCostTracking(
+  openai('gpt-4o-mini'),
+  'openai'
+);
+```


⚠️ Potential issue | 🟠 Major

Experimental API usage requires stability warning.

The cost-tracking middleware uses experimental_wrapLanguageModel (line 304), which is an experimental Vercel AI SDK API. This may change or be deprecated in future versions, and code relying on it could break.

Consider adding a prominent warning in the documentation about the experimental nature of this API, and suggest monitoring Vercel AI SDK release notes for stability updates. Alternatively, propose an alternative approach that doesn't rely on experimental features, or note that users should pin the ai package version to a known-compatible release.

🤖 Prompt for AI Agents

In docs/UPDATED_RECOMMENDATIONS.md around lines 300 to 336, the example uses the experimental experimental_wrapLanguageModel API without warning; add a prominent stability notice immediately above the code block that states this API is experimental and may change or be removed, advise readers to pin the ai package to a known-compatible version, monitor Vercel AI SDK release notes for breaking changes, and suggest an alternative approach (e.g., implement middleware/wrapping at your application layer or use stable public SDK hooks) as a fallback if the experimental API changes.

coderabbitai · 2025-11-13T11:54:31Z

docs/UPDATED_RECOMMENDATIONS.md

+cat > litellm_config.yaml <<EOF
+model_list:
+  # OpenAI Models
+  - model_name: gpt-4o-mini
+    litellm_params:
+      model: openai/gpt-4o-mini
+      api_key: os.environ/OPENAI_API_KEY
+
+  # Anthropic Models
+  - model_name: claude-sonnet
+    litellm_params:
+      model: anthropic/claude-3-5-sonnet-20241022
+      api_key: os.environ/ANTHROPIC_API_KEY
+
+  # Azure OpenAI
+  - model_name: azure-gpt4
+    litellm_params:
+      model: azure/gpt-4o
+      api_base: os.environ/AZURE_OPENAI_ENDPOINT
+      api_key: os.environ/AZURE_OPENAI_KEY
+
+  # Ollama (self-hosted)
+  - model_name: llama3
+    litellm_params:
+      model: ollama/llama3
+      api_base: http://localhost:11434
+
+# Load balancing
+router_settings:
+  routing_strategy: least-busy
+  num_retries: 3
+  timeout: 600
+
+# Cost tracking (built-in)
+litellm_settings:
+  success_callback: ["langfuse"]  # Track all requests
+  failure_callback: ["langfuse"]
+  set_verbose: true
+
+# Guardrails
+guardrails:
+  - guardrail_name: "content-filter"
+    litellm_params:
+      mode: during_call
+      guardrail: "azure-content-filter"
+EOF
+
+# Start proxy server
+litellm --config litellm_config.yaml --port 4000
+```


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify LiteLLM configuration syntax and available options.

The LiteLLM proxy configuration (lines 358-407) uses specific YAML structure and settings like routing_strategy: least-busy, success_callback: ["langfuse"], and guardrail: "azure-content-filter". These should be verified against the current LiteLLM documentation to ensure syntax accuracy.

🌐 Web query:

What is the current LiteLLM proxy server configuration syntax and available options as of November 2025?

💡 Result:

Summary of LiteLLM proxy configuration syntax & available options (as of Nov 13, 2025)

Quick overview

The proxy is configured via a YAML config file (passed with --config or via LITELLM_CONFIG_PATH / CONFIG_FILE_PATH / bucket env vars). The main top-level sections are: model_list, general_settings, litellm_settings, router_settings, environment_variables (and extras like UI/auth settings). [1][2][3]

Main sections and common keys (YAML)

model_list: list of model entries. Each entry commonly includes:

model_name: user-facing alias (what clients request).

litellm_params: provider params passed to litellm (model identifier, api_key, api_base, api_version, provider-specific fields).

any per-model overrides (temperature, max_tokens, stream, fallback, etc.). [1][2]

general_settings:

master_key (proxy admin key), database connection settings (DATABASE_URL, connection_pool_limit, connection_timeout), port, logging, NO_DOCS / NO_REDOC toggles, UI_USERNAME / UI_PASSWORD. [1][2]

litellm_settings:

module-level toggles like drop_params, set_verbose, api_base, cache settings and other litellm runtime flags. [1][3]

router_settings:

routing_strategy (e.g., "least-busy"), other routing params for load-balancing across model instances. [1]

environment_variables:

list or mapping to inject expected env vars (REDIS_HOST, REDIS_PORT, provider keys when referencing os.environ). [1]

Extras / deployment:

Support for reading config.yaml from S3 or GCS via LITELLM_CONFIG_BUCKET_TYPE/NAME/OBJECT_KEY env vars.

Docker/Helm/Kubernetes deployment examples; default proxy port 4000 (or 8000 in some docs), ability to run simple proxy mode with CLI flags --model, --port, --logs. [2][3]

Example minimal model_list entry

YAML (conceptual):

model_list:

model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
api_base: https://api.openai.com
api_version: "2023-05-15"
temperature: 0.0
stream: True
(Use env var references for secrets). [1][2]

Other notable options / behaviors

Admin UI & Swagger available at proxy root; Swagger exposes complete config schema at /#/config.yaml. Master key is required to use the admin UI. [2]

Logging: --logs returns recent API call logs (saved to api_logs.json). [3]

Two run modes: simple/basic proxy (CLI flags) and full proxy using config.yaml with DB-backed features (rate-limits, API key management, usage tracking). [3][1]

Sources
[1] LiteLLM proxy config docs — configs & examples.
[2] LiteLLM proxy Quick Start / UI docs (usage, master key, env vars).
[3] LiteLLM deploy / simple proxy docs (deployment, CLI flags, config-bucket support).

I need to search for more specific information about LiteLLM guardrails and callback features to determine if they are actually supported in the current version.

I need to search more specifically about the guardrails placement and the "azure-content-filter" guardrail type to fully verify the configuration.

Fix LiteLLM configuration structure and guardrail type.

The configuration has two critical issues:

Guardrails placement: Guardrails must be nested within each model's litellm_params, not defined as a separate top-level section. Move the guardrails block into each model entry under model_list.

Incorrect guardrail type: Use guardrail: azure/text_moderations or guardrail: azure/prompt_shield, not "azure-content-filter".

Undocumented router settings: num_retries and timeout in router_settings are not confirmed as valid LiteLLM proxy options. Verify these keys against current LiteLLM documentation or remove them if unsupported.

Move the guardrails block (lines ~390-396) into the appropriate model entries, and update the guardrail type.

🤖 Prompt for AI Agents

In docs/UPDATED_RECOMMENDATIONS.md around lines 358 to 407, the LiteLLM YAML places guardrails as a top-level section and uses an invalid guardrail type; move the guardrail configuration into each model's litellm_params (i.e., under each model entry in model_list), replace the guardrail value "azure-content-filter" with a supported type such as "azure/text_moderations" or "azure/prompt_shield", and verify or remove the unconfirmed router_settings keys (num_retries and timeout) against current LiteLLM docs if they are not supported.

coderabbitai · 2025-11-13T11:54:31Z

docs/UPDATED_RECOMMENDATIONS.md

+```
+┌─────────────────────────────────────────┐
+│         Next.js Application             │
+└─────────────┬───────────────────────────┘
+              │
+              │ HTTP (OpenAI format)
+              │
+┌─────────────▼───────────────────────────┐
+│         LiteLLM Proxy Server            │
+│  (Load Balancing, Cost Tracking,        │
+│   Guardrails, Retry Logic)              │
+└─────────────┬───────────────────────────┘
+              │
+    ┌─────────┼─────────┐
+    │         │         │
+┌───▼───┐ ┌──▼───┐ ┌──▼────┐
+│OpenAI │ │Claude│ │Azure  │
+└───────┘ └──────┘ └───────┘
+```


⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The ASCII architecture diagram at lines 431-449 is a fenced code block without a language specifier, violating markdown linting standards (MD040).

Apply this diff to fix the linting violation:

-``` +```text ┌─────────────────────────────────────────┐ │ Next.js Application │

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

431-431: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

In docs/UPDATED_RECOMMENDATIONS.md around lines 431 to 449 the fenced code block containing the ASCII architecture diagram lacks a language identifier (MD040); fix it by changing the opening fence from ``` to ```text so the block is explicitly marked as plain text, leaving the diagram content and closing fence unchanged.

coderabbitai · 2025-11-13T11:54:31Z

docs/UPDATED_RECOMMENDATIONS.md

+    return {
+      text: result.text,
+      layout: result.markdown,
+      confidence: 0.97, // DeepSeek reports 97% accuracy at 10x compression


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify specific performance and accuracy claims.

The document makes several specific quantitative claims that should be validated:

Line 552: "DeepSeek reports 97% accuracy at 10x compression"

Line 693: "200K+ pages/day" throughput on A100

Line 770: Cost figures for various LLM providers

These figures significantly influence architectural decisions and should be verified against current official documentation.

Also applies to: 693-693

🌐 Web query:

What is the current reported accuracy and performance of DeepSeek-OCR model as of November 2025?

💡 Result:

Short answer — reported numbers (as of Nov 13, 2025):

OCR accuracy: ~97% decoding/precision at compression ratios <10×; ≈60% at ~20× compression. [1][2][3]

Compression / tokens: claims of using ~100 vision tokens/page to match models that use 256–6,000+ tokens (i.e., 7–20× token reduction). [1][2][4]

Throughput / latency: published production throughput ~200,000 pages/day on a single NVIDIA A100-40G (scales to millions on large clusters). [1][4]

Benchmarks / comparisons: reported to outperform GOT‑OCR2.0 and MinerU2.0 on OmniDocBench/Fox at much lower vision-token budgets. [1][2]

Sources: DeepSeek-OCR arXiv paper (Oct 2025) and press coverage / vendor pages summarizing reported benchmarks. [1][2][3][4]

References
[1] DeepSeek‑OCR arXiv paper (Contexts Optical Compression, Oct 2025).
[2] InfoQ news summary (Oct 2025).
[3] Tom’s Hardware coverage (Oct–Nov 2025).
[4] DeepSeek product/site and blog posts (Oct–Nov 2025).

🌐 Web query:

What are the current official pricing and token costs for OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, and Google Gemini 1.5 Flash as of November 2025?

💡 Result:

Short answer (official sources, as of Nov 13, 2025):

OpenAI — GPT‑4o (official docs for GPT‑4o model snapshot gpt-4o-2024-08-06): inference listed at $3.75 per 1M input tokens and $15.00 per 1M output tokens; GPT‑4o‑mini: $0.15 per 1M input / $0.60 per 1M output. [OpenAI]. [1][2]

Anthropic — Claude 3.5 Sonnet: $3.00 per 1M input tokens; prompt-cache write $3.75/1M; cache read $0.30/1M; $15.00 per 1M output tokens. (Prices shown as $/MTok on Anthropic docs). [3]

Google — Gemini 1.5 Flash (Vertex AI / Google Cloud): Google publishes per‑character serving prices (example blog: $0.0375 per 1M characters input and $0.15 per 1M characters output for Gemini 1.5 Flash); that converts approximately (≈4 characters ≈1 token) to roughly $0.15 per 1M tokens input and $0.60 per 1M tokens output for typical token→character conversion — Vertex/Cloud docs note prices and recent Flash reductions. Check Google Cloud Vertex AI pricing for region-specific/product details. [4][5]

Sources

[1] OpenAI — “Introducing vision to the fine‑tuning API” / GPT‑4o model snapshot pricing. (OpenAI).

[2] OpenAI — GPT‑4o mini announcement (GPT‑4o mini pricing). (OpenAI).

[3] Anthropic — Claude pricing (Claude Sonnet / Claude 3.5 Sonnet prices). (Anthropic docs).

[4] Google Cloud blog / Vertex AI — Gemini 1.5 Flash pricing and cost reductions (per‑character pricing). (Google Cloud).

[5] Google Cloud Vertex AI pricing pages / model garden (Gemini Flash details).

If you want I can fetch the exact vendor pricing pages and paste the specific lines/links (or convert Google’s per‑character numbers to per‑token using a specific character→token ratio you prefer).

I've identified a factual inaccuracy in the document based on official DeepSeek-OCR benchmarks. Here's my analysis:

Line 552 — Accuracy claim requires correction:
The official benchmarks show ~97% accuracy at compression ratios <10×, with degraded performance (~60%) at ~20× compression. The current phrasing "97% accuracy at 10x compression" is misleading—97% applies only to compression ratios below 10×, not at the 10× threshold.

Line 693 — Throughput claim verified:
The 200K+ pages/day throughput on a single A100 matches the published production benchmark.

Line 770 — Pricing claims pending:
I cannot verify the cost figures on line 770 without seeing the actual document content. Current official pricing (Nov 2025) shows OpenAI GPT-4o at $3.75/$15.00 per 1M tokens, Anthropic Claude 3.5 Sonnet at $3.00/$15.00, and Google Gemini 1.5 Flash at approximately $0.15/$0.60 per 1M tokens—please share the specific figures from line 770 to confirm accuracy.

Action required: Correct line 552 to clarify that 97% accuracy applies to <10× compression ratios, not at 10× compression. Provide line 770 snippet for pricing verification.

🤖 Prompt for AI Agents

In docs/UPDATED_RECOMMENDATIONS.md around line 552, the statement "97% accuracy at 10x compression" is misleading; update the wording to state that DeepSeek reports ~97% accuracy for compression ratios below 10× (e.g., "<10×") and that accuracy degrades (to ~60%) at around 20× compression, ensuring the phrasing uses "<10×" or "below 10×" rather than "at 10×"; also, to verify the pricing on line 770, paste the current line 770 snippet so we can confirm and correct the cost figures as needed.

coderabbitai · 2025-11-13T11:54:31Z

docs/UPDATED_RECOMMENDATIONS.md

+```
+┌─────────────────────────────────────────────────────────┐
+│              Next.js Application (Frontend)             │
+│                                                         │
+│  ┌──────────────────────┐    ┌─────────────────────┐  │
+│  │   Vercel AI SDK      │    │   React UI Hooks    │  │
+│  │  (Multi-provider)    │◄───│   (useChat, etc.)   │  │
+│  └──────────┬───────────┘    └─────────────────────┘  │
+└─────────────┼───────────────────────────────────────────┘
+              │
+    ┌─────────┼─────────┐
+    │         │         │
+┌───▼──┐  ┌──▼───┐  ┌──▼────┐
+│OpenAI│  │Claude│  │Gemini │
+└──────┘  └──────┘  └───────┘
+
+
+┌─────────────────────────────────────────────────────────┐
+│           Next.js Application (Backend API)             │
+│                                                         │
+│  ┌──────────────────────┐    ┌─────────────────────┐  │
+│  │    OCR Service       │    │   PDF Processing    │  │
+│  │  (Multi-method)      │◄───│    Pipeline         │  │
+│  └──────────┬───────────┘    └─────────────────────┘  │
+└─────────────┼───────────────────────────────────────────┘
+              │
+    ┌─────────┼─────────┬──────────┐
+    │         │         │          │
+┌───▼────────┐│     ┌──▼───────┐ ┌▼─────────┐
+│LLMWhisperer││     │DeepSeek  │ │Textract  │
+│(API)       ││     │(Self-host│ │(AWS)     │
+└────────────┘│     └──────────┘ └──────────┘
+              │
+         ┌────▼────────┐
+         │  Tesseract  │
+         │  (Fallback) │
+         └─────────────┘
+
+
+Optional: LiteLLM Proxy (for advanced features)
+┌─────────────────────────────────────────────────────────┐
+│              LiteLLM Proxy Server                       │
+│  (Load Balancing, Cost Tracking, Guardrails)           │
+└─────────────┬───────────────────────────────────────────┘
+    ┌─────────┼─────────┬──────────┐
+    │         │         │          │
+┌───▼──┐  ┌──▼───┐  ┌──▼────┐  ┌──▼───┐
+│OpenAI│  │Claude│  │Azure  │  │Ollama│
+└──────┘  └──────┘  └───────┘  └──────┘
+```


⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block (architecture diagram).

The second ASCII architecture diagram (lines 831-880) is a fenced code block without a language specifier, violating markdown linting standards (MD040).

Apply this diff:

-``` +```text ┌─────────────────────────────────────────────────────────┐ │ Next.js Application (Frontend) │

🧰 Tools

🪛 markdownlint-cli2 (0.18.1)

831-831: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

docs/UPDATED_RECOMMENDATIONS.md around lines 831-880: the ASCII architecture diagram is in a fenced code block with no language specifier (triggering MD040); fix by changing the opening fence from ``` to ```text (or otherwise adding "text" after the opening backticks) so the block becomes a labeled text code fence, ensuring the closing fence remains ``` unchanged.

- Multi-provider LLM service using Vercel AI SDK - Support for OpenAI, Anthropic, Google, Azure, Ollama - Structured outputs, streaming, embeddings - Cost tracking and fallback providers - Multi-method OCR service with smart routing - LLMWhisperer (primary - layout preservation) - DeepSeek-OCR (self-hosted, high-volume) - AWS Textract (forms and tables) - PDF.js fallback - REST API v1 foundation - /api/v1/health - Service health check - /api/v1/extract - Document extraction - /api/v1/chat - Chat completion - API key auth and rate limiting - Environment variables template (.env.example)

coderabbitai

Actionable comments posted: 19

🧹 Nitpick comments (17)

src/app/api/v1/utils/auth.ts (1)
97-131: Unify auth errors with centralized error utilities and harden retryAfter calculation

authError and rateLimitError hand‑roll JSON responses that largely mirror what APIErrorResponse / ErrorResponses in errors.ts already provide, and retryAfter is computed twice with no clamping.

Consider:

Reusing ErrorResponses.unauthorized / ErrorResponses.forbidden for authError, or delegating via createErrorResponse so all errors share the exact same payload shape (including future additions like requestId).

For rateLimitError, computing retryAfterSeconds once and clamping it to a non‑negative value:
 export function rateLimitError(resetAt: number) {
+  const retryAfterSeconds = Math.max(
+    0,
+    Math.ceil((resetAt - Date.now()) / 1000),
+  );
   return Response.json(
     {
       error: {
         code: 'RATE_LIMITED',
         message: 'Too many requests',
-        retryAfter: Math.ceil((resetAt - Date.now()) / 1000),
+        retryAfter: retryAfterSeconds,
       },
     },
     {
       status: 429,
       headers: {
-        'Retry-After': String(Math.ceil((resetAt - Date.now()) / 1000)),
+        'Retry-After': String(retryAfterSeconds),
       },
     }
   );
 }
This keeps behavior consistent and a bit more robust to any future changes in how resetAt is computed.
src/lib/ocr/types.ts (1)

1-175: Well-structured OCR type surface

The request/response, provider-specific, and error/usage types are laid out cleanly and should be straightforward to consume. I don’t see functional issues here; any further tweaks (e.g., changing Buffer | string or Date serialization in OCRUsageRecord) would be design choices rather than fixes.
src/app/api/v1/utils/errors.ts (1)
81-93: Consider preserving thrown Response objects in handleError

Right now, handleError always converts any thrown value into an INTERNAL_ERROR response. If a route ever adopts the pattern throw ErrorResponses.unauthorized() or throws a prebuilt Response, that intent will be lost.

You could make handleError more flexible with a small check:
 export function handleError(error: unknown): Response {
   console.error('API Error:', error);

-  if (error instanceof Error) {
+  if (error instanceof Response) {
+    return error;
+  }
+
+  if (error instanceof Error) {
     // Don't expose internal error details in production
     const message =
       process.env.NODE_ENV === 'development' ? error.message : 'An unexpected error occurred';

     return ErrorResponses.internalError(message);
   }
That keeps your current behavior for real exceptions while allowing explicit Response throws to pass through unchanged.
src/app/api/v1/chat/route.ts (1)

135-137: Improve error mapping for upstream LLM failures

All thrown errors are funneled through handleError, which currently always returns a generic 500 response. For provider-specific failures (e.g. 429 rate limits, 503 from an upstream LLM), this loses useful semantics for clients (retry behavior, UX). Consider extending handleError to detect your domain errors (e.g. LLMError) and surface appropriate HTTP status codes and error codes instead of always returning an internal error.

src/lib/ocr/ocr-service.ts (5)

131-198: LLMWhisperer integration looks consistent; only minor polish

The LLMWhisperer flow (API key validation, form-data upload, error wrapping, and typed response mapping) is coherent and matches the declared LLMWhispererResponse/OCRResult types.

One minor behavior to be aware of: extractText overwrites processingTimeMs afterward with end-to-end latency, so the API’s reported processing_time_ms is effectively discarded. If you care about distinguishing provider time vs total time, you may want to keep both (e.g., providerProcessingTimeMs vs processingTimeMs).

203-226: Quota calculation semantics may not match upstream API

getLLMWhispererQuota derives limit and remaining from overage_page_count and pages_extracted. If the upstream API’s semantics differ (e.g., one is overage already used vs. total allowance), the numbers exposed here could be misleading, which in turn affects selectOCRMethod.

Recommend double-checking the LLMWhisperer usage API docs and adjusting the computation or field names (e.g., dailyPageLimit, overageUsed) so the meaning is unambiguous.

314-391: Textract integration is reasonable but currently single-page and basic

The Textract integration is sound for simple PDFs, but note:

DetectDocumentText and AnalyzeDocument in synchronous mode are effectively single-document calls; pageCount is hard-coded to 1, which may not reflect the actual page count of multi-page PDFs.

Credentials are wired manually via credentials rather than relying on the default AWS credential chain, which may be fine but reduces flexibility.

Table/form extraction helpers currently return empty arrays, so callers enabling extractTables/detectForms won’t see useful structures yet.

If multi-page or richer structured extraction is in scope, consider:

Adding a comment that pageCount is a placeholder and/or computing it from Blocks.

Implementing extractTablesFromTextract / extractFormFieldsFromTextract or clearly documenting that they’re stubs for now.

424-453: PDF.js extraction: potential page-range validation

The PDF.js path looks fine overall and correctly supports password and optional page selection. One edge case:

If options.pages includes values outside 1..pdf.numPages, pdf.getPage(pageNum) will throw and abort the whole extraction.

If you expect user-supplied page lists, consider clamping/filtering to valid pages or throwing a clearer OCRError when an invalid page is requested.

486-497: Textract table/form helpers intentionally stubbed

Both extractTablesFromTextract and extractFormFieldsFromTextract currently return empty arrays. That’s fine as an MVP, but make sure consumers (e.g., API responses) don’t assume non-empty arrays to infer “feature enabled”.

A short TODO comment tying these to a follow-up task (and tests) would help avoid them being mistaken for complete implementations later.

src/lib/llm/types.ts (1)

187-198: Usage record type is sufficient for basic telemetry

LLMUsageRecord captures the core metrics (tokens, cost, provider/model, requestType, resource IDs, timestamp). If you later add multi-tenant support, consider adding a first-class tenantId/userId field rather than overloading resourceType/resourceId.

src/lib/llm/providers.ts (2)

16-179: Provider configs are well-structured; keep an eye on pricing/version drift

The PROVIDER_CONFIGS block is clear and matches LLMProviderConfig/LLMModelConfig. Capabilities and recommended flags are helpful for UI and default selection. Just be aware that model IDs and pricing can drift over time; a brief comment about their “as of” date or a central place to update them can help avoid silently stale values.

285-308: Provider availability check overlaps with isProviderConfigured

isProviderAvailable checks env vars in almost the same way isProviderConfigured does in llm-service.ts. That’s fine, but you might later want a single source of truth (e.g., export and reuse this function) to avoid divergence.

Also note that Ollama is always treated as “available”, which is OK as long as callers understand this doesn’t guarantee the daemon is actually running.

src/lib/llm/llm-service.ts (5)

33-77: Completion flow is solid; minor note on usage extraction

generateCompletion correctly wires provider/model into generateText, extracts token usage defensively, and calculates cost via calculateCost. Error wrapping into LLMError with retryable on 429 is sensible.

Be aware that different providers may expose usage under different field names; your (result.usage as any).promptTokens ?? inputTokens pattern handles this, but you may want a small helper to centralize this mapping if it grows more complex.

83-108: Streaming completion casting relies on textStream shape

generateStreamingCompletion returns result.textStream as unknown as ReadableStream. This is fine if streamText’s textStream is already a ReadableStream-compatible object, but brittle otherwise.

If the ai SDK exposes a canonical stream type/helper (e.g., toAIStream or ReadableStream directly), consider using that instead of double casting to avoid surprises if the library’s internal stream type changes.

114-181: Structured response + text fallback are well designed

generateStructuredResponse cleanly uses generateObject with LLMResponseSchema and provides a robust fallback to plain text via generateText when structured output fails. Returning a text-type LLMResponse keeps the output contract stable for callers.

Only minor nits:

You log the error on fallback; if this becomes noisy in prod, consider gating behind a debug flag or structured logging.

The confidence fallback is hard-coded to 0.7; you may want to make that configurable or at least document the choice.

264-294: chatCompletion’s context/instruction split is reasonable; note empty-message behavior

The transformation from messages → (context, instruction) is straightforward and delegates to generateStructuredResponse, which maintains the structured output contract.

One edge case: if messages has no non-system entries, instruction becomes '' and you effectively call the model with no user instruction. That’s probably fine, but if this function is user-facing, you may want to assert at least one non-system message and throw a clearer error.

394-441: getAvailableModels + isProviderConfigured largely duplicate provider availability logic

getAvailableModels uses isProviderConfigured (env checks), while providers.ts has a similar isProviderAvailable. This duplication is easy to drift over time.

Consider exporting and reusing a single availability helper from one module to keep behavior consistent across LLM service and any health endpoints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 901020c and aa96908.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (14)

package.json (2 hunks)
src/app/api/v1/chat/route.ts (1 hunks)
src/app/api/v1/extract/route.ts (1 hunks)
src/app/api/v1/health/route.ts (1 hunks)
src/app/api/v1/utils/auth.ts (1 hunks)
src/app/api/v1/utils/errors.ts (1 hunks)
src/app/api/v1/utils/index.ts (1 hunks)
src/lib/llm/index.ts (1 hunks)
src/lib/llm/llm-service.ts (1 hunks)
src/lib/llm/providers.ts (1 hunks)
src/lib/llm/types.ts (1 hunks)
src/lib/ocr/index.ts (1 hunks)
src/lib/ocr/ocr-service.ts (1 hunks)
src/lib/ocr/types.ts (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

src/app/api/v1/chat/route.ts (4)

src/app/api/v1/utils/auth.ts (5)

validateAPIKey (24-50)

authError (100-110)

hasPermission (55-57)

checkRateLimit (66-95)

rateLimitError (115-131)

src/app/api/v1/utils/errors.ts (2)

ErrorResponses (52-76)

handleError (81-93)

src/lib/llm/types.ts (1)

LLMProviderType (13-13)

src/lib/llm/llm-service.ts (2)

generateStreamingCompletion (83-108)

generateCompletion (33-77)

src/app/api/v1/health/route.ts (2)

src/lib/llm/providers.ts (1)

getAvailableProviders (285-287)

src/lib/ocr/ocr-service.ts (1)

getLLMWhispererQuota (203-226)

src/app/api/v1/extract/route.ts (4)

src/app/api/v1/utils/auth.ts (5)

validateAPIKey (24-50)

authError (100-110)

hasPermission (55-57)

checkRateLimit (66-95)

rateLimitError (115-131)

src/app/api/v1/utils/errors.ts (2)

ErrorResponses (52-76)

handleError (81-93)

src/lib/ocr/ocr-service.ts (1)

extractText (37-77)

src/lib/ocr/types.ts (1)

OCRMethod (11-11)

src/lib/llm/providers.ts (1)

src/lib/llm/types.ts (3)

LLMProviderType (13-13)

LLMProviderConfig (15-23)

LLMModelConfig (25-36)

src/lib/ocr/ocr-service.ts (2)

src/lib/ocr/index.ts (5)

extractText (12-12)

getLLMWhispererQuota (13-13)

simpleExtract (14-14)

extractWithLayout (15-15)

extractForms (16-16)

src/lib/ocr/types.ts (6)

OCRRequest (19-49)

OCRResult (51-78)

OCRError (149-160)

OCRMethod (11-11)

LLMWhispererResponse (116-123)

LLMWhispererQuota (125-130)

src/lib/llm/llm-service.ts (2)

src/lib/llm/types.ts (9)

LLMCompletionRequest (50-54)

LLMCompletionResponse (56-67)

LLMError (170-181)

LLMConfig (42-48)

LLMResponse (144-144)

LLMResponseSchema (130-136)

EmbeddingRequest (150-157)

EmbeddingResponse (159-164)

LLMProviderType (13-13)

src/lib/llm/providers.ts (4)

getModel (242-245)

calculateCost (267-280)

getDefaultModel (250-254)

PROVIDER_CONFIGS (16-179)

🪛 Biome (2.1.2)

src/lib/llm/providers.ts

[error] 214-214: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

[error] 215-215: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

[error] 228-228: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (13)

src/lib/ocr/index.ts (1)

1-17: OCR barrel file looks clean and focused

Re-exporting types and a curated set of service functions keeps the public surface tidy and easy to consume; no issues spotted here.

src/app/api/v1/utils/errors.ts (1)

52-76: Centralized error helpers look consistent

APIErrorCode, createErrorResponse, and ErrorResponses give you a clean, uniform error surface across v1 endpoints. Status codes and messages are sensible defaults and easy to override per call site.

src/app/api/v1/utils/index.ts (1)

1-6: API utils index is straightforward and useful

Re-exporting auth and errors from a single index keeps route imports tidy and consistent; this looks good as-is.

src/app/api/v1/health/route.ts (1)

16-55: Health payload structure and status derivation look solid

Parallel checks, clear per‑service status, and an aggregated health.status based on LLM/OCR availability are all reasonable. No blocking issues from a correctness or resilience standpoint.

src/lib/llm/index.ts (1)

1-34: Barrel exports are coherent and improve LLM API ergonomics

Centralizing types, provider utilities, and service functions behind @/lib/llm makes the LLM surface easier to consume and evolve. Export list matches the described providers and service functions; no issues spotted.

src/lib/ocr/ocr-service.ts (1)

291-303: Health check looks fine; be aware of AbortSignal.timeout compatibility

The DeepSeek health probe with AbortSignal.timeout(2000) is a good guard. Just ensure your runtime (Node/Next.js version) consistently supports AbortSignal.timeout; otherwise, consider a manual AbortController fallback if you ever target older Node versions.

src/lib/llm/types.ts (4)

13-37: Provider and model config types are clear and align with usage

LLMProviderType, LLMProviderConfig, and LLMModelConfig are well-structured and match how PROVIDER_CONFIGS and getModel consume them. The pricing comments (“$ per 1M tokens”) are especially useful for downstream cost calculations.

42-67: Completion request/response shapes look coherent

LLMConfig, LLMCompletionRequest, and LLMCompletionResponse align cleanly with how generateCompletion constructs and returns data (provider/model, usage, cost, finishReason). This should make it straightforward to track cost and usage across providers.

79-145: Structured output schema is well-factored

The discriminated union (FormattedResponseSchema) and LLMResponseSchema give you a solid, type-safe contract for structured responses (single value, chart, table, text). This matches how generateStructuredResponse is implemented and should make downstream rendering straightforward.

170-181: LLMError type is consistent with OCR error pattern

LLMError mirrors the OCRError structure (message, code, provider, statusCode, retryable), which keeps error handling consistent across LLM and OCR surfaces. This is a good foundation for centralized logging/telemetry.

src/lib/llm/providers.ts (1)

242-280: Model lookup and cost calculation are straightforward and correct

getModel, getDefaultModel, getModelConfig, and calculateCost are clean and do what they say. Cost is correctly scaled per 1M tokens using the configured pricing.

src/lib/llm/llm-service.ts (2)

234-259: simpleCompletion is a clean backward-compatible helper

The helper correctly defaults provider/model, forwards overrides, and delegates to generateCompletion. This should make it easy to migrate existing call sites without touching provider plumbing.

351-385: compareProviders is straightforward A/B harness

The parallel Promise.allSettled approach and filtering fulfilled results is clean and robust. Just remember that model IDs must be valid per provider (same concern as above, but here the caller passes explicit pairs, so it’s their responsibility).

coderabbitai · 2025-11-19T08:39:59Z

package.json

+    "@ai-sdk/anthropic": "^2.0.45",
+    "@ai-sdk/google": "^2.0.38",
+    "@ai-sdk/openai": "^2.0.68",
+    "@aws-sdk/client-textract": "^3.934.0",
    "@radix-ui/react-checkbox": "^1.1.1",


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

New AI/SDK dependencies align with the LLM/OCR features; verify version compatibility

The added @ai-sdk/*, ai, and @aws-sdk/client-textract dependencies match the new multi‑provider LLM and Textract usage. From a structure standpoint this looks fine; just ensure the chosen versions are compatible with your Next 14 / React 18 / Node.js runtime and each other (peer deps, bundling, ESM/CJS).

Also applies to: 40-40

🌐 Web query:

Check the supported Node.js / React / Next.js versions and peer dependency requirements for: - `ai@5.0.95` - `@ai-sdk/openai@2.0.68` - `@ai-sdk/anthropic@2.0.45` - `@ai-sdk/google@2.0.38` - `@aws-sdk/client-textract@3.934.0` and confirm they are compatible with a Next.js 14.2.8 + React 18 app.

💡 Result:

Short answer: Yes — these package versions are compatible with a Next.js 14.2.8 + React 18 app, provided your runtime Node.js meets the Next.js v14 minimum (Node 18.17+) and you satisfy the AI SDK peer deps (notably zod >= 3.25.0). Evidence:

Next.js v14 requires Node >= 18.17 and is intended to be used with React 18. (Next.js upgrade notes). [1]

The AI SDK v5 (ai@5.x and the @ai-sdk/* v2 series) requires Node 18+ in its docs and additionally lists zod (>= 3.25.0) as a peer dependency for the v5 migration / providers. The migration/getting-started docs for AI SDK v5 show the ai package and @ai-sdk/* v2 series and call out zod as a required peer dependency. (@AI-SDK provider packages are the v2 series used by ai@5.x). [2][3]

Individual @AI-SDK provider packages (openai / anthropic / google) are part of the AI SDK v2 provider releases and are published as companion provider packages for ai@5 — their docs and package pages align with the AI SDK v5 requirements above. (Examples: Anthropic provider listing; OpenAI/Google providers are the same provider-series). [4]

@aws-sdk/client-textract (AWS SDK for JavaScript v3) targets modern Node.js LTS and AWS has announced v3 no longer supports Node 16.x; v3 is intended for Node 18+ (AWS SDK v3 Node support guidance). The modular v3 clients (including client-textract@3.934.0) do not impose React/Next peer deps and are compatible in server/runtime code (Node) used by Next.js. [5][6]

What to check in your project

Use Node >= 18.17 (Next.js 14 minimum). If your environment uses a lower Node, upgrade. [1]

Add/ensure zod >= 3.25.0 in your package.json if you use ai@5.x / @ai-sdk/* v2 features that declare it as a peer dependency. [2][3]

@aws-sdk/client-textract has no React/Next peer deps; install it normally for server-side usage (Node runtime). [5]

Sources

Next.js v14 upgrade / requirements (Node >=18.17, React 18). [1]

AI SDK v5 migration / getting-started (ai@5.x, @ai-sdk/* v2, and zod peer dep note). [2][3]

@ai-sdk/anthropic package listing / provider info (example provider page). [4]

AWS SDK for JavaScript (v3) Node support / end-of-support notices (Node 16 EOL; v3 targets Node 18+). [5][6]

If you want, I can fetch the exact peerDependencies sections from each package.json for these exact versions and paste them here.

--- 🏁 Script executed: ```shell #!/bin/bash # Check for zod dependency in package.json rg '"zod"' package.json -A 1

Length of output: 25

Upgrade zod to satisfy ai@5.x peer dependency requirement

The added dependencies are compatible with Next.js 14.2.8 + React 18 (Node 18.17+), but zod needs to be upgraded. The AI SDK v5 package (ai@5.0.95) requires zod >= 3.25.0 as a peer dependency. Currently, package.json specifies zod@^3.23.8, which falls below the minimum. Update to zod@^3.25.0 or higher to prevent runtime failures when using the AI SDK providers.

🤖 Prompt for AI Agents

In package.json around lines 17 to 21, the project currently pins zod to ^3.23.8 which is below the ai@5.x peer requirement; update the zod entry to ^3.25.0 (or a later 3.x release) in package.json, then run the package manager (npm install or yarn install) to update the lockfile and ensure dependencies resolve; finally run your test/build steps (or a simple start) to verify there are no runtime peer-dependency issues.

coderabbitai · 2025-11-19T08:39:59Z

src/app/api/v1/chat/route.ts

+    if (stream) {
+      // Streaming response
+      const textStream = await generateStreamingCompletion({
+        prompt,
+        config,
+        systemPrompt: systemMessage,
+      });
+
+      // Return the stream with appropriate headers
+      return new Response(textStream, {
+        headers: {
+          'Content-Type': 'text/event-stream',
+          'Cache-Control': 'no-cache',
+          Connection: 'keep-alive',
+          'X-RateLimit-Remaining': String(rateLimit.remaining),
+          'X-RateLimit-Reset': String(rateLimit.resetAt),
+        },
+      });


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Clarify streaming format vs text/event-stream header

generateStreamingCompletion returns a plain text ReadableStream, but the response is advertised as text/event-stream. Unless textStream is already SSE‑framed, EventSource/SSE clients may misinterpret the stream. Consider either:

Using the Vercel AI helper that returns a proper SSE Response (e.g. toDataStreamResponse() upstream in generateStreamingCompletion), or

Changing the content type to a generic streaming type (e.g. text/plain) if consumers will read raw chunks via fetch/ReadableStream.

🌐 Web query:

What is the recommended way to return a streamed chat completion with Vercel AI SDK `streamText` so that clients receive a proper `text/event-stream` SSE response?

💡 Result:

Use streamText and return its built-in SSE response helper — call result.toUIMessageStreamResponse() (or toDataStreamResponse for the simpler "data" protocol) and return that Response so the SDK sets the proper text/event-stream headers and protocol header for you.

Example (Node / Vercel function):

const result = streamText({ model: openai('gpt-4o-mini'), messages }); return result.toUIMessageStreamResponse({ headers: { // optional extra headers 'X-Chat-Id': chatId, }, });

This yields a Response with Content-Type: text/event-stream; charset=utf-8 and x-vercel-ai-ui-message-stream: v1 (uses the SDK's UI message stream conversion). [1][2][3]

Sources: Official SDK internals and examples. [1] [2] [3]

[/run_scripts] --- 🏁 Script executed: ```shell #!/bin/bash # Find and inspect generateStreamingCompletion implementation rg -n "generateStreamingCompletion" --type=ts -B2 -A10

Length of output: 2932

🏁 Script executed:

#!/bin/bash # Search for imports of Vercel AI SDK to understand what's being used rg -n "streamText\|toUIMessageStreamResponse\|toDataStreamResponse" --type=ts -B2 -A2

Length of output: 0

🏁 Script executed:

#!/bin/bash # Find where Response is constructed with text/event-stream rg -n "text/event-stream" --type=ts -B5 -A5

Length of output: 736

[/run_scripts] --- 🏁 Script executed: ```shell #!/bin/bash # View the full generateStreamingCompletion implementation sed -n '83,110p' src/lib/llm/llm-service.ts

Length of output: 729

Fix streaming response to use Vercel AI SDK's proper SSE helper

The review comment is correct. generateStreamingCompletion() currently returns result.textStream, which is raw text without SSE framing. While the route handler sets the text/event-stream header, the actual stream content is not SSE-formatted—clients expecting proper Server-Sent Events protocol will fail.

Per Vercel AI SDK recommendations, use result.toUIMessageStreamResponse() which automatically sets proper text/event-stream headers and protocol framing.

Required changes:

src/lib/llm/llm-service.ts (line 83–108): Change generateStreamingCompletion() to return Promise<Response> and use result.toUIMessageStreamResponse() instead of result.textStream.

src/app/api/v1/chat/route.ts (line 93–107): Update to directly return the Response from generateStreamingCompletion() instead of manually wrapping it.

🤖 Prompt for AI Agents

In src/app/api/v1/chat/route.ts around lines 91–108 and src/lib/llm/llm-service.ts around lines 83–108, the streaming flow currently returns a raw text stream and wraps it with SSE headers, but the stream lacks SSE framing; update generateStreamingCompletion in llm-service.ts to return a Promise<Response> using result.toUIMessageStreamResponse() (so the SDK applies proper SSE framing and headers), and in route.ts simply await and return that Response directly instead of re-wrapping the stream and headers.

coderabbitai · 2025-11-19T08:39:59Z

src/app/api/v1/extract/route.ts

+    const { file, url, method, preserveLayout, detectForms, extractTables, pages, language, password } =
+      validation.data;
+
+    // Ensure either file or url is provided
+    if (!file && !url) {
+      return ErrorResponses.validationError('Either file (base64) or url must be provided');
+    }
+
+    // Convert base64 to buffer if file provided
+    let input: Buffer | string;
+    if (file) {
+      try {
+        input = Buffer.from(file, 'base64');
+      } catch {
+        return ErrorResponses.validationError('Invalid base64 file data');
+      }
+    } else {
+      input = url!;
+    }


⚠️ Potential issue | 🟡 Minor

Tighten input-source and base64 handling; avoid unnecessary cast

A few small issues in the input handling block:

The comment says “Either file (base64) or url must be provided”, but the logic only checks that at least one is present. If both are sent, file silently wins and url is ignored. Consider enforcing “exactly one” source, e.g. with a refinement on the Zod schema or an explicit if (file && url) validation error.

Buffer.from(file, 'base64') seldom throws for malformed base64; your try/catch will almost never return “Invalid base64 file data”, and bad input is more likely to fail later inside extractText as a 500. If you care about returning a clear 4xx, consider validating the base64 string (length/charset) or using a small helper to detect obviously invalid data before calling OCR.

method: method as OCRMethod forces an optional field into the non‑optional OCRMethod type. Since extractText already chooses a method when options?.method is falsy, you can drop the cast and allow method to be undefined naturally for better type safety.

Minor: if pages are intended as page numbers, z.array(z.number().positive()) will accept non‑integers; z.array(z.number().int().positive()) would better model that constraint.

These refinements would make the endpoint behavior more predictable and validation errors clearer without changing the overall design.

Also applies to: 85-97

coderabbitai · 2025-11-19T08:40:00Z

src/app/api/v1/utils/auth.ts

+export async function validateAPIKey(request: NextRequest): Promise<APIKeyInfo | null> {
+  const authHeader = request.headers.get('authorization');
+  const apiKeyHeader = request.headers.get('x-api-key');
+
+  const apiKey = apiKeyHeader || (authHeader?.startsWith('Bearer ') ? authHeader.slice(7) : null);
+
+  if (!apiKey) {
+    return null;
+  }
+
+  // TODO: Implement actual API key validation against database
+  // For now, validate against environment variable for development
+  const validKey = process.env.API_KEY;
+  if (apiKey === validKey) {
+    return {
+      keyId: 'dev-key',
+      userId: 'dev-user',
+      permissions: ['read', 'write', 'extract', 'chat'],
+      rateLimit: {
+        requestsPerMinute: 60,
+        requestsPerDay: 10000,
+      },
+    };
+  }
+
+  return null;
+}


🛠️ Refactor suggestion | 🟠 Major

Dev-only API key stub should be guarded to avoid accidental production use

validateAPIKey currently treats a single process.env.API_KEY as the only valid key, with a TODO to back it by real storage. That’s fine for early development, but it’s easy to accidentally ship this into a multi-tenant or production environment.

Consider at least:

Gating this branch on NODE_ENV === 'development' or a dedicated feature flag, and

Failing fast (e.g., 500 / startup error) if API_KEY isn’t set when running in non‑dev.

That will make it harder to forget replacing this stub with proper key lookup and per-user/organization metadata.

🤖 Prompt for AI Agents

In src/app/api/v1/utils/auth.ts around lines 24-50, the dev-only API key stub is currently unguarded; change it so the environment API_KEY is only accepted when NODE_ENV === 'development' or a dedicated flag like ALLOW_DEV_API_KEY === 'true' is set, and add a startup-time sanity check that throws an error (or exits) if NODE_ENV !== 'development' and process.env.API_KEY is missing but the code would otherwise rely on it; keep the validateAPIKey return behavior the same (return dev key info only when the guard allows and the key matches, otherwise return null).

coderabbitai · 2025-11-19T08:40:00Z

src/app/api/v1/utils/auth.ts

+export function checkRateLimit(keyInfo: APIKeyInfo): {
+  allowed: boolean;
+  remaining: number;
+  resetAt: number;
+} {
+  const now = Date.now();
+  const windowMs = 60 * 1000; // 1 minute window
+  const key = `rate:${keyInfo.keyId}`;
+
+  let record = rateLimitStore.get(key);
+
+  // Reset if window expired
+  if (!record || record.resetAt < now) {
+    record = {
+      count: 0,
+      resetAt: now + windowMs,
+    };
+  }
+
+  record.count++;
+  rateLimitStore.set(key, record);
+
+  const remaining = Math.max(0, keyInfo.rateLimit.requestsPerMinute - record.count);
+
+  return {
+    allowed: record.count <= keyInfo.rateLimit.requestsPerMinute,
+    remaining,
+    resetAt: record.resetAt,
+  };
+}


⚠️ Potential issue | 🟡 Minor

Rate‑limit helper docs and shape are slightly out of sync

Two small inconsistencies here:

The JSDoc says “Returns true if request is allowed, false if rate limited”, but the function actually returns an object { allowed, remaining, resetAt }. The comment should be updated to match the real return type.

APIKeyInfo.rateLimit has requestsPerDay, but checkRateLimit only enforces requestsPerMinute. If daily limiting is a near‑term requirement, consider enforcing it here (or add a comment/TODO so the type doesn’t overpromise).

Neither is a runtime bug, but aligning the docs and type usage will reduce surprises for callers.

🤖 Prompt for AI Agents

In src/app/api/v1/utils/auth.ts around lines 66 to 95, the function returns an object { allowed, remaining, resetAt } but the JSDoc still says it returns a boolean and the APIKeyInfo.rateLimit type exposes requestsPerDay whereas the function only enforces per-minute limits; update the JSDoc to describe the actual return shape (allowed: boolean, remaining: number, resetAt: number) and either implement daily limiting or explicitly document TODO and leave a short comment in the function noting only requestsPerMinute is enforced (and if you implement daily limiting: track a separate daily key/window, increment and check against requestsPerDay, compute remaining accordingly, and set resetAt to the smaller of minute/day reset times).

coderabbitai · 2025-11-19T08:40:01Z

src/lib/ocr/ocr-service.ts

+async function fetchPDFBuffer(url: string): Promise<Buffer> {
+  const response = await fetch(url);
+  if (!response.ok) {
+    throw new Error(`Failed to fetch PDF: ${response.statusText}`);
+  }
+  const arrayBuffer = await response.arrayBuffer();
+  return Buffer.from(arrayBuffer);
+}


⚠️ Potential issue | 🟡 Minor

Non-URL string inputs are treated as URLs

fetchPDFBuffer unconditionally fetches any string input. If callers ever pass a filesystem path or raw PDF text instead of a URL, this will fail in a non-obvious way.

You may want to:

Either document that string inputs must be HTTP(S) URLs, or

Detect file:/relative paths and handle them differently (or reject them explicitly with a clearer error).

🤖 Prompt for AI Agents

In src/lib/ocr/ocr-service.ts around lines 462 to 469, fetchPDFBuffer currently calls fetch on any string which treats non-HTTP(S) inputs (local file paths or raw PDF text) as URLs and fails ambiguously; update the function to first validate the input: if it is an HTTP or HTTPS URL (e.g., via new URL(input) and checking protocol === 'http:' || 'https:'), use fetch as before; if it is a file URL (protocol === 'file:') or a local/relative path, read the file from disk with fs.promises.readFile and return that Buffer; otherwise reject with a clear, specific Error stating that only HTTP(S) URLs or filesystem paths are accepted (or add JSDoc to mandate HTTP(S) only and throw early for non-HTTP(S) inputs). Ensure proper async/await and error messages include the original input for easier debugging.

coderabbitai · 2025-11-19T08:40:01Z

src/lib/ocr/ocr-service.ts

+async function pdfToImages(buffer: Buffer, pages?: number[]): Promise<Buffer[]> {
+  // Note: This requires a PDF-to-image library like pdf-poppler or pdf2pic
+  // For now, return empty array - would need to implement based on your needs
+
+  // Placeholder implementation
+  console.warn('pdfToImages not fully implemented - need pdf-to-image library');
+  return [];
+}


⚠️ Potential issue | 🟠 Major

pdfToImages placeholder should fail fast instead of returning empty

Returning an empty array while logging a warning makes upstream code believe conversion succeeded (DeepSeek path in particular). This is more of a silent failure than a deliberate “not implemented”.

Better to throw an OCRError('NOT_IMPLEMENTED', ...) here until a real implementation is added, and gate selection of image-based backends accordingly.

🤖 Prompt for AI Agents

In src/lib/ocr/ocr-service.ts around lines 474 to 481, replace the placeholder that returns an empty array with a fast-failing behavior: throw an OCRError with code 'NOT_IMPLEMENTED' and a clear message indicating pdfToImages is not implemented (include optional details like required library name), so upstream callers cannot treat a silent empty result as success; ensure the thrown error uses the existing OCRError type/import and that callers which select image-based backends handle or propagate this error appropriately.

coderabbitai · 2025-11-19T08:40:01Z

src/lib/ocr/ocr-service.ts

+export async function simpleExtract(
+  input: Buffer | string,
+  options?: {
+    preserveLayout?: boolean;
+    password?: string;
+  }
+): Promise<string> {
+  const result = await extractText({
+    input,
+    options: {
+      preserveLayout: options?.preserveLayout,
+      password: options?.password,
+    },
+  });
+
+  return result.text;
+}


⚠️ Potential issue | 🟡 Minor

simpleExtract ignores layout even if preserveLayout is set

simpleExtract forwards preserveLayout but then always returns result.text, discarding result.markdown even when layout-preserving extraction is used.

If the goal is truly “simple text only”, you could:

Drop the preserveLayout option from this helper, or

Return result.markdown ?? result.text when preserveLayout is true.

Right now the option can be misleading for callers.

🤖 Prompt for AI Agents

In src/lib/ocr/ocr-service.ts around lines 506 to 522, the simpleExtract helper currently forwards preserveLayout but always returns result.text, so callers requesting layout-preserving output never receive result.markdown; update the function so that when options?.preserveLayout is true it returns result.markdown ?? result.text, otherwise return result.text; ensure the function signature keeps or documents the preserveLayout option and handle the case where markdown is undefined by falling back to plain text.

coderabbitai · 2025-11-19T08:40:01Z

src/lib/ocr/ocr-service.ts

+export async function extractWithLayout(input: Buffer | string): Promise<string> {
+  const result = await extractText({
+    input,
+    options: {
+      preserveLayout: true,
+      method: 'llmwhisperer',
+    },
+  });
+
+  return result.markdown || result.text;
+}


⚠️ Potential issue | 🟠 Major

extractWithLayout hard-codes LLMWhisperer and doesn’t fall back

extractWithLayout always uses method: 'llmwhisperer'. If LLMWhisperer is misconfigured or quota is exhausted, this will throw instead of falling back to another layout-preserving option (e.g., auto-selection with preserveLayout: true).

Consider delegating method selection (let extractText decide based on quota and availability) and only pinning preserveLayout: true, unless you explicitly want “fail if LLMWhisperer is unavailable”.

🤖 Prompt for AI Agents

In src/lib/ocr/ocr-service.ts around lines 527 to 537, extractWithLayout currently hard-codes method: 'llmwhisperer' which causes failures if that provider is misconfigured or out of quota; remove the explicit method option and only pass preserveLayout: true so extractText can auto-select or fallback as designed (alternatively, wrap the call in a try/catch and retry without the pinned method to provide a fallback), ensuring the function still returns result.markdown || result.text.

coderabbitai · 2025-11-19T08:40:01Z

src/lib/ocr/ocr-service.ts

+export async function extractForms(
+  input: Buffer | string
+): Promise<{ text: string; formFields: any[] }> {
+  const result = await extractText({
+    input,
+    options: {
+      detectForms: true,
+      method: 'textract',
+    },
+  });
+
+  return {
+    text: result.text,
+    formFields: result.formFields || [],
+  };
+}


⚠️ Potential issue | 🟠 Major

extractForms is hard-wired to Textract

extractForms forces method: 'textract', so:

It bypasses LLMWhisperer’s processing_mode = 'form' capabilities.

It will throw a config error outright when AWS isn’t configured, even if LLMWhisperer is available.

If you want best-effort forms extraction:

Let extractText pick the method based on detectForms: true, or

Implement a fallback chain: Textract first, then LLMWhisperer form mode.

Also consider returning typed form field structures instead of any[] once the Textract helpers are implemented.

claude added 2 commits November 5, 2025 07:07

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

vercel bot had a problem deploying to Preview November 13, 2025 11:51 Failure

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

vercel bot had a problem deploying to Preview November 19, 2025 08:33 Failure

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

Research app features and market demands #1

Are you sure you want to change the base?

Research app features and market demands #1

Uh oh!

Conversation

mnttnm commented Nov 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

mnttnm commented Nov 5, 2025 •

edited by coderabbitai bot

Loading

vercel bot commented Nov 5, 2025 •

edited

Loading

coderabbitai bot commented Nov 5, 2025 •

edited

Loading