-
Notifications
You must be signed in to change notification settings - Fork 0
Research app features and market demands #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Research app features and market demands #1
Conversation
This analysis examines Slicely's current capabilities against 2025 IDP market demands and provides strategic feature recommendations based on the 80/20 principle. Key findings: - IDP market projected at $6.78B by 2025 (35-40% CAGR) - 65% of enterprises actively implementing IDP initiatives - Critical gaps: API access, integrations, compliance features Top 5 priorities for enterprise readiness: 1. RESTful API layer (REST + GraphQL) 2. Audit logging & RBAC for compliance 3. Snowflake/Databricks integration 4. Elasticsearch integration for advanced search 5. Multi-LLM provider support Includes: - Market research and competitive analysis - Feature recommendations across 4 tiers - 12-month implementation roadmap - ROI projections (4.1x return) - Pricing strategy for enterprise tiers
Created detailed implementation plans for 5 critical enterprise features based on 80/20 principle analysis: 1. RESTful & GraphQL API Layer (01-api-layer.md) - Complete REST API with CRUD operations - GraphQL endpoint for complex queries - API authentication, rate limiting, versioning - Client SDKs for JavaScript and Python - OpenAPI/Swagger documentation 2. Enterprise Data Platform Integrations (02-data-platform-integrations.md) - Snowflake, Databricks, AWS S3, Azure Blob, GCS - Zapier and Make.com integration - Job queue system with BullMQ - Scheduled exports and batch processing 3. Elasticsearch Integration (03-elasticsearch-integration.md) - Hybrid search (keyword + semantic) - Faceted search and autocomplete - Fuzzy matching and search analytics - Real-time indexing via CDC 4. Compliance & Security Suite (04-compliance-security-suite.md) - Comprehensive audit logging - RBAC with granular permissions - PII detection and redaction - Data retention policies and GDPR tools 5. Multi-LLM Provider Support (05-multi-llm-provider-support.md) - Anthropic Claude, Azure OpenAI, AWS Bedrock, Ollama - Provider abstraction layer - Automatic fallback and cost tracking - Unified interface across all providers Plus comprehensive roadmap (ENTERPRISE_ROADMAP.md): - 12-month timeline with 5 phases - Resource requirements and budget ($726K investment) - Expected ROI: 7.6x ($5.5M ARR) - Success metrics and KPIs - Competitive positioning analysis - Risk mitigation strategies All plans include: - Technical architecture and implementation details - Database schemas and code examples - UI/UX designs - Testing strategies - Rollout plans - Success metrics
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughAdds enterprise strategy and roadmap docs plus detailed implementation plans (API layer, data integrations, Elasticsearch, compliance, multi‑LLM, OCR). Implements new v1 API utilities and endpoints (chat, extract, health), and adds multi‑provider LLM and multi‑method OCR service modules with types, providers, service logic, and UI selector scaffold. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant API as Backend/API
participant Auth as Auth & RateLimit
participant LLM as LLMService
participant Primary as PrimaryProvider
participant Fallback as FallbackProvider
Note over LLM: Multi‑LLM orchestration and fallback (new)
Client->>API: POST /api/v1/chat (payload)
API->>Auth: validate API key & permissions
Auth-->>API: allowed
API->>LLM: requestCompletion(payload, preferredProvider)
alt Primary succeeds
LLM->>Primary: callCompletion
Primary-->>LLM: response
LLM-->>API: response (usage, provider meta)
API-->>Client: 200 JSON / stream
else Primary fails
LLM->>Primary: callCompletion
Primary--x LLM: error
LLM->>Fallback: callCompletion
Fallback-->>LLM: response
LLM-->>API: response (logs failure, usage)
API-->>Client: 200 JSON (with provider metadata)
end
sequenceDiagram
autonumber
participant Client
participant API as Backend/API
participant Auth as Auth & RateLimit
participant OCR as OCRService
participant LLMW as LLMWhisperer
participant Textract as AWS Textract
Note over OCR: Multi‑method OCR selection and extraction
Client->>API: POST /api/v1/extract (file or url + options)
API->>Auth: validate API key & permissions
Auth-->>API: allowed
API->>OCR: extractText(input, options)
OCR->>OCR: selectOCRMethod (quota, availability, options)
alt method == llmwhisperer
OCR->>LLMW: callWhisperer
LLMW-->>OCR: text + meta
else method == textract
OCR->>Textract: analyzeDocument
Textract-->>OCR: text, tables, forms
else other
OCR-->>OCR: use DeepSeek/Tesseract/PDF.js
end
OCR-->>API: OCRResult (text, markdown, tables, forms)
API-->>Client: 200 JSON (includes rate limit headers)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
🧹 Nitpick comments (14)
docs/implementation-plans/02-data-platform-integrations.md (5)
535-547: CSV escaping is incomplete (newlines, quotes, commas).Current logic misses newline and leading/trailing spaces cases. Use a dedicated escaper.
- keys.map((key) => { - const value = record[key]; - // Escape quotes and wrap in quotes if contains comma - if (typeof value === 'string' && (value.includes(',') || value.includes('"'))) { - return `"${value.replace(/"/g, '""')}"`; - } - return value ?? ''; - }).join(',') + keys.map((key) => { + const v = record[key] ?? ''; + const s = typeof v === 'string' ? v : String(v); + const needsQuotes = /[",\n\r]/.test(s); + const escaped = s.replace(/"/g, '""'); + return needsQuotes ? `"${escaped}"` : escaped; + }).join(',')
549-566: Parquet writer usage likely incorrect.parquetjs typically requires a ParquetSchema and a concrete write target; openStream/openBuffer semantics differ. Double-check API, add explicit schema construction, and avoid returning writer.outputStream internals.
Example sketch:
- const { ParquetWriter } = await import('parquetjs'); - const schema = this.inferParquetSchema(records[0]); - const writer = await ParquetWriter.openStream(schema); + const { ParquetWriter, ParquetSchema } = await import('parquetjs'); + const schema = new ParquetSchema(this.inferParquetSchema(records[0])); + const chunks: Buffer[] = []; + const writer = await ParquetWriter.openStream(schema, { + write: (b: Buffer) => chunks.push(b), + end: () => {}, + } as any); ... - return writer.outputStream.getContents(); + return Buffer.concat(chunks);
894-910: Databricks SQL Statement API likely needs polling.POSTing a statement usually returns an id; you must poll until status is 'SUCCEEDED' (or use sync param if available). Current code returns immediately.
921-925: Coupling: reusing private method via bracket access.Calling S3Integration['toParquet'] breaks encapsulation and minifies poorly. Extract a shared parquet utility.
229-248: Schemas: add guardrails (constraints, RLS, audit).
- Add NOT NULL where appropriate (plugin_type, config).
- Unique constraint per organization on (name) to prevent duplicates.
- Add updated_at trigger ON UPDATE.
- Consider RLS policies by organization_id/user_id.
- export_jobs: add updated_at, an error_code field, and composite index (destination_id,status,created_at) for queues.
Also applies to: 268-301, 304-327
docs/implementation-plans/01-api-layer.md (1)
917-964: Error responses may leak internal messages.Avoid echoing raw error.message to clients for 5xx and security-sensitive 4xx; use stable titles/details and map codes.
docs/implementation-plans/04-compliance-security-suite.md (2)
845-875: GDPR deletion: review deletion of audit logs.Some regimes require retaining certain logs; blanket deletion of audit_logs by user_id may violate retention/legal hold. Gate by policy and legal basis.
477-485: Use cloud-native credentials (IAM roles) over static keys.Avoid long-lived AWS keys in env; prefer IAM roles (e.g., IRSA on Kubernetes, task roles on ECS/Lambda).
docs/implementation-plans/05-multi-llm-provider-support.md (3)
444-457: Azure provider method incomplete.completion() returns partial code with "same as OpenAI" comment; fill in or mark as TODO to prevent accidental use.
760-811: Add timeouts, retries, and per-provider error taxonomy.LLM calls are flaky. Wrap provider.completion with bounded timeouts, limited retries, and map provider errors to consistent codes; also support cancellation.
813-823: Usage tracking: add org/user context and request type.Insert organization_id, user_id, request_type to llm_usage for accurate cost allocation.
docs/implementation-plans/03-elasticsearch-integration.md (1)
232-236: Many indices (one per user) will strain ES.Prefer a shared index with user_id as a keyword and filter queries, plus index-level RBAC if needed.
docs/ENTERPRISE_ROADMAP.md (1)
570-586: Optional: clarify infra cost assumptions.Add footnotes for managed vs self-hosted costs and expected workload (docs/month, QPS) to make comparisons actionable.
ENTERPRISE_FEATURE_ANALYSIS.md (1)
1201-1201: Markdown hygiene: fix headings, lists, and bare URL.
- Replace emphasized lines acting as headings with proper markdown headings (##/###).
- Normalize list indentation (MD007).
- Convert bare URLs to links.
Also applies to: 1487-1504
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
ENTERPRISE_FEATURE_ANALYSIS.md(1 hunks)docs/ENTERPRISE_ROADMAP.md(1 hunks)docs/implementation-plans/01-api-layer.md(1 hunks)docs/implementation-plans/02-data-platform-integrations.md(1 hunks)docs/implementation-plans/03-elasticsearch-integration.md(1 hunks)docs/implementation-plans/04-compliance-security-suite.md(1 hunks)docs/implementation-plans/05-multi-llm-provider-support.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
ENTERPRISE_FEATURE_ANALYSIS.md
[uncategorized] ~1250-~1250: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... outputs) - Week 3: Authentication & rate limiting - Week 4: API documentation (Swagger...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.18.1)
ENTERPRISE_FEATURE_ANALYSIS.md
152-152: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
158-158: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
165-165: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
171-171: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
177-177: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
336-336: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
365-365: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
385-385: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
404-404: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
411-411: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
430-430: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
438-438: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
477-477: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
500-500: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
501-501: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
502-502: Unordered list indentation
Expected: 0; Actual: 3
(MD007, ul-indent)
609-609: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
629-629: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
654-654: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
678-678: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
687-687: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
703-703: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
753-753: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
773-773: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
805-805: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
821-821: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
828-828: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
858-858: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
869-869: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
879-879: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
898-898: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
906-906: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
951-951: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
969-969: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
990-990: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
999-999: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1036-1036: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1053-1053: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1081-1081: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1094-1094: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1123-1123: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1145-1145: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1168-1168: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1201-1201: Bare URL used
(MD034, no-bare-urls)
1487-1487: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1495-1495: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
1504-1504: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
docs/implementation-plans/03-elasticsearch-integration.md
52-52: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (1)
docs/implementation-plans/03-elasticsearch-integration.md (1)
220-236: indices.exists return handling.@elastic/elasticsearch v8 may not return a boolean unless you use the transport option; verify and adapt to the actual return shape.
| CREATE INDEX idx_api_keys_user_id ON api_keys(user_id); | ||
| CREATE INDEX idx_api_keys_key_hash ON api_keys(key_hash); | ||
| CREATE UNIQUE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unique index on key_prefix will block multiple keys.
key_prefix holds 'sk_live_'/'sk_test_'; making it UNIQUE allows only one key per environment.
Fix by dropping uniqueness or using a separate random prefix/id:
-CREATE UNIQUE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix);
+-- REMOVE unique index; optionally:
+-- CREATE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix);🤖 Prompt for AI Agents
In docs/implementation-plans/01-api-layer.md around lines 153 to 156 the CREATE
UNIQUE INDEX on api_keys(key_prefix) is incorrect because key_prefix contains
environment prefixes like 'sk_live_'/'sk_test_' and will prevent storing
multiple keys per environment; remove the UNIQUE constraint by changing to a
non-unique index (CREATE INDEX idx_api_keys_key_prefix ON api_keys(key_prefix))
or alternatively add a dedicated, random unique identifier column (e.g., key_id
or key_prefix_random) and create a UNIQUE index on that new column while keeping
key_prefix non-unique.
| // Look up API key in database | ||
| const apiKeyRecord = await supabase | ||
| .from('api_keys') | ||
| .select('*, user:users(*)') | ||
| .eq('key_hash', keyHash) | ||
| .eq('is_active', true) | ||
| .single(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t join against auth.users; return user_id instead.
Supabase PostgREST cannot join to auth.users in most setups; selecting user:users(*) likely fails. Return user_id and fetch via Admin API if needed, or maintain an application users table.
- .select('*, user:users(*)')
+ .select('id, user_id, scopes, rate_limit_tier, expires_at, is_active')
...
-return apiKeyRecord.user;
+return { id: apiKeyRecord.user_id };Also applies to: 709-710
🤖 Prompt for AI Agents
In docs/implementation-plans/01-api-layer.md around lines 685 to 692 (and also
apply the same change at lines 709-710), the code is joining to auth.users via
select('*, user:users(*)') which will fail in most Supabase/PostgREST setups;
change the query to return the foreign key (user_id) instead of trying to join
to auth.users, then fetch the full user record separately via the Supabase Admin
API or from your application users table as needed; update both occurrences to
select the user_id and add a follow-up step to retrieve the user using the
appropriate admin/user-table method.
| export async function checkRateLimit(apiKeyId: string, tier: string): Promise<void> { | ||
| const limiter = rateLimiters[tier] || rateLimiters.standard; | ||
|
|
||
| const { success, limit, reset, remaining } = await limiter.limit(apiKeyId); | ||
|
|
||
| if (!success) { | ||
| throw new APIError('Rate limit exceeded', 429, { | ||
| 'X-RateLimit-Limit': limit.toString(), | ||
| 'X-RateLimit-Remaining': '0', | ||
| 'X-RateLimit-Reset': reset.toString(), | ||
| 'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(), | ||
| }); | ||
| } | ||
|
|
||
| // Set rate limit headers (return these in response) | ||
| return { | ||
| 'X-RateLimit-Limit': limit.toString(), | ||
| 'X-RateLimit-Remaining': remaining.toString(), | ||
| 'X-RateLimit-Reset': reset.toString(), | ||
| }; | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type mismatch: function returns headers but signature says void.
checkRateLimit currently returns a headers object; update return type and callers.
-export async function checkRateLimit(apiKeyId: string, tier: string): Promise<void> {
+export async function checkRateLimit(apiKeyId: string, tier: string): Promise<Record<string,string>> {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export async function checkRateLimit(apiKeyId: string, tier: string): Promise<void> { | |
| const limiter = rateLimiters[tier] || rateLimiters.standard; | |
| const { success, limit, reset, remaining } = await limiter.limit(apiKeyId); | |
| if (!success) { | |
| throw new APIError('Rate limit exceeded', 429, { | |
| 'X-RateLimit-Limit': limit.toString(), | |
| 'X-RateLimit-Remaining': '0', | |
| 'X-RateLimit-Reset': reset.toString(), | |
| 'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(), | |
| }); | |
| } | |
| // Set rate limit headers (return these in response) | |
| return { | |
| 'X-RateLimit-Limit': limit.toString(), | |
| 'X-RateLimit-Remaining': remaining.toString(), | |
| 'X-RateLimit-Reset': reset.toString(), | |
| }; | |
| } | |
| ``` | |
| export async function checkRateLimit(apiKeyId: string, tier: string): Promise<Record<string,string>> { | |
| const limiter = rateLimiters[tier] || rateLimiters.standard; | |
| const { success, limit, reset, remaining } = await limiter.limit(apiKeyId); | |
| if (!success) { | |
| throw new APIError('Rate limit exceeded', 429, { | |
| 'X-RateLimit-Limit': limit.toString(), | |
| 'X-RateLimit-Remaining': '0', | |
| 'X-RateLimit-Reset': reset.toString(), | |
| 'Retry-After': Math.ceil((reset - Date.now()) / 1000).toString(), | |
| }); | |
| } | |
| // Set rate limit headers (return these in response) | |
| return { | |
| 'X-RateLimit-Limit': limit.toString(), | |
| 'X-RateLimit-Remaining': remaining.toString(), | |
| 'X-RateLimit-Reset': reset.toString(), | |
| }; | |
| } |
🤖 Prompt for AI Agents
In docs/implementation-plans/01-api-layer.md around lines 779 to 800, the
function checkRateLimit is declared to return Promise<void> but actually returns
an object of headers; change the signature to return
Promise<Record<string,string>> (or a named Headers type) and update all callers
to await the function and use the returned headers (or merge them into the
response) instead of ignoring the result; keep the existing APIError throw
behavior intact and ensure typing reflects header string values.
| const supabase = createClient( | ||
| process.env.SUPABASE_URL!, | ||
| process.env.SUPABASE_SERVICE_ROLE_KEY! // Service role key required | ||
| ); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Service role key must never be exposed to clients.
Ensure createClient with SUPABASE_SERVICE_ROLE_KEY runs only in trusted server contexts (no edge/functions with potential leakage, no client bundles). Add a guardrail comment and environment segregation.
I'll verify this review comment by examining the file context and consulting Supabase security best practices.
🏁 Script executed:
#!/bin/bash
# Examine the file and context around lines 344-348
cat "docs/implementation-plans/02-data-platform-integrations.md" | sed -n '330,360p'Length of output: 848
🏁 Script executed:
#!/bin/bash
# Search for other uses of SUPABASE_SERVICE_ROLE_KEY in the codebase
rg -n "SUPABASE_SERVICE_ROLE_KEY" -C 5Length of output: 2074
🏁 Script executed:
#!/bin/bash
# Check for any environment configuration or guardrails
rg -n "createClient" -C 3 --type=ts --type=js --type=tsx --type=jsxLength of output: 27
🏁 Script executed:
#!/bin/bash
# Verify the file type and its purpose
head -50 "docs/implementation-plans/02-data-platform-integrations.md"Length of output: 1461
🌐 Web query:
What are Supabase service role key security best practices and risks?
💡 Result:
Summary — risks
- Full-bypass of Row Level Security (RLS): service_role keys bypass RLS so a leak = full read/modify/delete of your DB. [1][2]
- Data breach & compliance exposure: attacker access → exfiltration, tampering, regulatory/reporting incidents. [6]
- Supply‑chain/public repo leaks: accidental commits or CI logs can expose keys. [7]
- Operational impact: abused keys can break apps, cause cost spikes, or force emergency rotation/restarts. [4][7]
Security best practices (concise)
- Never use service_role in client-side code—keep it strictly server-side or in trusted runtime (server, Edge Function, or private job). [1][2][5]
- Minimize use: prefer anon key + RLS for normal app flows; only use service_role for backend jobs that truly need bypass. [1][6]
- Use least privilege patterns: create narrow APIs (server endpoints, RPCs) that perform needed operations rather than giving direct DB access with the key. Consider Postgres roles/RPCs to limit scope. [5]
- Protect secrets in a vault/secret manager or environment secrets (not in code/commits). Use cloud secret managers or Supabase Edge Function secrets. [0][1]
- Rotate regularly and on suspicion: rotate service_role/JWT secrets promptly; be aware rotation invalidates active connections. Automate rotation if possible. [8]
- Monitor & alert: enable auditing/logging, watch unusual queries or bulk exports, and alert on anomalous use. Integrate secret-scanning for repos. [7][6]
- Revoke and remediate fast: have an incident playbook to revoke keys, rotate, and re-deploy with new secrets. [8][7]
- Prefer Edge Functions / serverless with scoped secrets for third‑party webhooks/integrations (e.g., Stripe) instead of embedding service_role in backend code accessible to multiple teams. [0][5]
- Use network controls where possible: IP allowlists for backend jobs or service accounts, and enforce TLS. [5]
Quick operational checklist
- Audit where service_role is used and replace with RPCs or server endpoints where possible. [5]
- Move keys to secret manager; remove from repos/CI logs; add secret scanning. [7]
- Implement monitoring + alerts for abnormal queries or large exports. [7][6]
- Document rotation process and test it (note: rotation restarts services). [8]
Sources
- Supabase docs — securing data; service_role guidance (never expose, use server-side). [1]
- Supabase docs — Edge Functions environment secrets & guidance (service_role usable in Edge Functions, not browsers). [0]
- Supabase docs — API/serverless guidance; service_role explanations and safeguards. [5]
- Supabase docs — rotating anon/service/JWT secrets (procedure & impact). [8]
- Community/ops guidance on protecting keys, using server endpoints/RPCs and edge functions. [3][6]
- GitGuardian / security posts on risks and detection of leaked service_role tokens. [7]
Add explicit server-only context and environment segregation guardrails to documentation.
The documentation currently shows correct backend pattern but lacks critical security warnings. Per Supabase best practices, service_role keys must never be used in client-side code and should be kept strictly server-side. The existing comment "Service role key required" does not adequately communicate this constraint.
Update the code comment and add a preceding caution block to explicitly state:
- This function runs server-side only (no client bundles, no edge functions accessible to untrusted code)
- Service role key must reside in
.envwith appropriate access controls - Environment segregation: never log, expose, or commit the key
- Link to Supabase vault security best practices for implementers
🤖 Prompt for AI Agents
In docs/implementation-plans/02-data-platform-integrations.md around lines 344
to 348, the inline comment beside the Supabase service_role key is
insufficiently warning — add a clear server-only caution block immediately above
the code that states this function must run server-side only (no client bundles
or publicly accessible edge functions), that the service_role key must be stored
in a protected .env with strict access controls and never logged, exposed, or
committed, and update the inline comment to explicitly state "Server-side only —
do not expose service_role key to clients"; also include a link to Supabase
vault/security best practices for implementers and recommend environment
segregation guardrails (no client access, restrict CI/ops, and rotate keys).
| // Get credential record | ||
| const { data: credRecord, error } = await supabase | ||
| .from('encrypted_credentials') | ||
| .select('vault_secret_id') | ||
| .eq('id', credentialId) | ||
| .single(); | ||
|
|
||
| if (error) { | ||
| throw new Error(`Credential not found: ${error.message}`); | ||
| } | ||
|
|
||
| // Retrieve from Vault | ||
| const { data: vaultSecret, error: vaultError } = await supabase | ||
| .from('vault.secrets') | ||
| .select('secret') | ||
| .eq('id', credRecord.vault_secret_id) | ||
| .single(); | ||
|
|
||
| if (vaultError) { | ||
| throw new Error(`Failed to retrieve credentials: ${vaultError.message}`); | ||
| } | ||
|
|
||
| return JSON.parse(vaultSecret.decrypted_secret); // Automatically decrypted | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Supabase Vault retrieval bug and table access assumptions.
- You select "secret" from vault.secrets but later read vaultSecret.decrypted_secret. Field mismatch; will throw. Also, accessing "vault.secrets" via supabase-js may not work depending on Vault exposure/permissions.
Suggested fix:
- const { data: vaultSecret, error: vaultError } = await supabase
- .from('vault.secrets')
- .select('secret')
+ const { data: vaultSecret, error: vaultError } = await supabase
+ .from('vault.secrets')
+ .select('decrypted_secret')
.eq('id', credRecord.vault_secret_id)
.single();
- return JSON.parse(vaultSecret.decrypted_secret); // Automatically decrypted
+ return JSON.parse(vaultSecret.decrypted_secret);Also verify that Vault access is server-only and enabled in your project; consider a dedicated secrets API rather than querying the table directly.
[/run_scripts]
[/web_search]
🏁 Script executed:
#!/bin/bash
# Verify the file exists and check its actual content at the specified lines
if [ -f "docs/implementation-plans/02-data-platform-integrations.md" ]; then
echo "=== File exists, showing lines 389-413 ==="
sed -n '389,413p' "docs/implementation-plans/02-data-platform-integrations.md"
echo ""
echo "=== Context: lines 380-420 ==="
sed -n '380,420p' "docs/implementation-plans/02-data-platform-integrations.md"
else
echo "File not found"
fiLength of output: 1688
🌐 Web query:
Supabase Vault API select fields decrypted_secret correct usage
💡 Result:
Short answer: query the provided view vault.decrypted_secrets and include the decrypted_secret column in your SELECT. The view decrypts on-the-fly, so usage is just normal SQL, but you must ensure the DB role you’re using has permission to read the view (and you should lock down access because anyone who can select from the view sees plaintext). Example:
-
Basic:
select id, name, decrypted_secret from vault.decrypted_secrets; -
By id:
select id, name, decrypted_secret
from vault.decrypted_secrets
where id = '7095d222-efe5-4cd5-b5c6-5755b451e223';
Notes:
- decrypted_secret is exposed only via the view (vault.secrets stores the ciphertext). The view decrypts at query time. Protect access with proper privileges/RLS. [1][2]
Sources:
[1] Supabase Vault docs — Using Vault / decrypted_secrets.
[2] supabase/vault GitHub README (examples showing decrypted_secret in the view).
Fix Supabase Vault table and field references.
The code queries vault.secrets but references a field that only exists on the vault.decrypted_secrets view. The view decrypts on-the-fly, so decrypted_secret is exposed only via the view, not the underlying table. Update to:
const { data: vaultSecret, error: vaultError } = await supabase
- .from('vault.secrets')
- .select('secret')
+ .from('vault.decrypted_secrets')
+ .select('decrypted_secret')
.eq('id', credRecord.vault_secret_id)
.single();
- return JSON.parse(vaultSecret.decrypted_secret); // Automatically decrypted
+ return JSON.parse(vaultSecret.decrypted_secret);Also verify that Vault access is restricted to server-only contexts and that SERVICE_ROLE_KEY usage here is intentional; consider whether a dedicated secrets management API would be safer than direct table access.
🤖 Prompt for AI Agents
In docs/implementation-plans/02-data-platform-integrations.md around lines 389
to 413, the code queries the underlying vault.secrets table but then tries to
read decrypted_secret which only exists on the vault.decrypted_secrets view;
change the Supabase query to select from the vault.decrypted_secrets view and
select the decrypted_secret field (e.g.,
.from('vault.decrypted_secrets').select('decrypted_secret').eq('id',
credRecord.vault_secret_id).single()), and ensure this code runs only in a
server-only context using the SERVICE_ROLE_KEY (or replace direct DB access with
a dedicated secrets-management API) so decryption access is restricted to
trusted backend environments.
| <p className="text-sm text-muted-foreground"> | ||
| {results.length} results ({took}ms) | ||
| </p> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UI uses took variable but it’s not defined in component state.
Add took to state from API response to avoid runtime errors.
🤖 Prompt for AI Agents
In docs/implementation-plans/03-elasticsearch-integration.md around lines 792 to
795, the JSX references a took variable that is not defined in the component
state causing runtime errors; add a took state property (e.g., initialize via
useState(0) or include in the component's state object) and set its value from
the search API response when results are fetched (parse the response.took and
update state alongside results), and ensure the UI uses this state variable so
took is never undefined.
| private sanitizePayload(payload: any): any { | ||
| if (!payload) return null; | ||
|
|
||
| // Remove sensitive fields | ||
| const sensitiveFields = ['password', 'apiKey', 'token', 'secret', 'creditCard']; | ||
| const sanitized = { ...payload }; | ||
|
|
||
| for (const field of sensitiveFields) { | ||
| if (field in sanitized) { | ||
| sanitized[field] = '[REDACTED]'; | ||
| } | ||
| } | ||
|
|
||
| return sanitized; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanitization should be deep and include nested secrets.
Current sanitizePayload only redacts top-level fields. Traverse recursively and handle nested arrays/objects.
🤖 Prompt for AI Agents
In docs/implementation-plans/04-compliance-security-suite.md around lines 195 to
209, sanitizePayload only redacts top-level keys; update it to perform a deep
recursive traversal that handles nested objects and arrays, redacting any
sensitive keys (password, apiKey, token, secret, creditCard) at any depth and
returning a new sanitized structure (avoid mutating original), skip or safely
handle circular references (e.g., track visited nodes) and preserve
non-sensitive primitive values unchanged.
| // Middleware for Next.js API routes | ||
| export function auditMiddleware(handler: any) { | ||
| return async (req: NextRequest, res: NextResponse) => { | ||
| const startTime = Date.now(); | ||
| const requestId = req.headers.get('X-Request-ID') || crypto.randomUUID(); | ||
|
|
||
| try { | ||
| const response = await handler(req, res); | ||
|
|
||
| // Log successful request | ||
| await auditLogger.log({ | ||
| userId: req.user?.id, | ||
| eventType: `api.${req.method.toLowerCase()}`, | ||
| eventCategory: 'data', | ||
| action: this.mapMethodToAction(req.method), | ||
| resourceType: this.extractResourceType(req.url), | ||
| ipAddress: req.ip, | ||
| userAgent: req.headers.get('User-Agent'), | ||
| requestId, | ||
| requestMethod: req.method, | ||
| requestPath: req.url, | ||
| responseStatus: response.status, | ||
| status: 'success', | ||
| metadata: { | ||
| responseTime: Date.now() - startTime, | ||
| }, | ||
| }); | ||
|
|
||
| return response; | ||
| } catch (error: any) { | ||
| // Log failed request | ||
| await auditLogger.log({ | ||
| userId: req.user?.id, | ||
| eventType: `api.${req.method.toLowerCase()}.failed`, | ||
| eventCategory: 'data', | ||
| action: this.mapMethodToAction(req.method), | ||
| resourceType: this.extractResourceType(req.url), | ||
| ipAddress: req.ip, | ||
| userAgent: req.headers.get('User-Agent'), | ||
| requestId, | ||
| requestMethod: req.method, | ||
| requestPath: req.url, | ||
| status: 'error', | ||
| errorMessage: error.message, | ||
| severity: error.statusCode === 403 ? 'warning' : 'info', | ||
| }); | ||
|
|
||
| throw error; | ||
| } | ||
| }; | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Middleware uses this. in a closure — will be undefined.*
this.mapMethodToAction / this.extractResourceType will throw. Use module-level helpers or call static functions.
- action: this.mapMethodToAction(req.method),
- resourceType: this.extractResourceType(req.url),
+ action: mapMethodToAction(req.method),
+ resourceType: extractResourceType(req.url),Provide those helpers in scope.
Committable suggestion skipped: line range outside the PR's diff.
| const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g; | ||
| while ((match = emailRegex.exec(text)) !== null) { | ||
| entities.push({ | ||
| type: 'EMAIL', | ||
| value: match[0], | ||
| start: match.index, | ||
| end: match.index + match[0].length, | ||
| confidence: 0.95, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Email regex includes a literal '|' in TLD class.
[A-Z|a-z]{2,} allows pipe char.
-const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
+const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g; | |
| while ((match = emailRegex.exec(text)) !== null) { | |
| entities.push({ | |
| type: 'EMAIL', | |
| value: match[0], | |
| start: match.index, | |
| end: match.index + match[0].length, | |
| confidence: 0.95, | |
| }); | |
| } | |
| const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g; | |
| while ((match = emailRegex.exec(text)) !== null) { | |
| entities.push({ | |
| type: 'EMAIL', | |
| value: match[0], | |
| start: match.index, | |
| end: match.index + match[0].length, | |
| confidence: 0.95, | |
| }); | |
| } |
🤖 Prompt for AI Agents
In docs/implementation-plans/04-compliance-security-suite.md around lines
574-583, the email regex character class [A-Z|a-z]{2,} wrongly allows the pipe
character; replace that class with a correct TLD character class (e.g.
[A-Za-z]{2,}) or remove the explicit case range and add the case-insensitive
flag (/i) to the regex, ensuring the pattern no longer permits '|' and continues
to match valid TLDs.
| private async deleteRecords(resourceType: string, records: any[]): Promise<void> { | ||
| const ids = records.map((r) => r.id); | ||
|
|
||
| const tableName = resourceType === 'pdf' ? 'pdfs' : | ||
| resourceType === 'output' ? 'outputs' : | ||
| 'audit_logs'; | ||
|
|
||
| await supabase.from(tableName).delete().in('id', ids); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retention deletion fallback risks deleting audit logs by mistake.
Defaulting unknown resourceType to 'audit_logs' is dangerous. Fail closed or throw on unknown type.
- const tableName = resourceType === 'pdf' ? 'pdfs' :
- resourceType === 'output' ? 'outputs' :
- 'audit_logs';
+ const tableName =
+ resourceType === 'pdf' ? 'pdfs' :
+ resourceType === 'output' ? 'outputs' :
+ resourceType === 'audit_log' ? 'audit_logs' :
+ null;
+ if (!tableName) throw new Error(`Unsupported resource type: ${resourceType}`);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private async deleteRecords(resourceType: string, records: any[]): Promise<void> { | |
| const ids = records.map((r) => r.id); | |
| const tableName = resourceType === 'pdf' ? 'pdfs' : | |
| resourceType === 'output' ? 'outputs' : | |
| 'audit_logs'; | |
| await supabase.from(tableName).delete().in('id', ids); | |
| } | |
| private async deleteRecords(resourceType: string, records: any[]): Promise<void> { | |
| const ids = records.map((r) => r.id); | |
| const tableName = | |
| resourceType === 'pdf' ? 'pdfs' : | |
| resourceType === 'output' ? 'outputs' : | |
| resourceType === 'audit_log' ? 'audit_logs' : | |
| null; | |
| if (!tableName) throw new Error(`Unsupported resource type: ${resourceType}`); | |
| await supabase.from(tableName).delete().in('id', ids); | |
| } |
🤖 Prompt for AI Agents
In docs/implementation-plans/04-compliance-security-suite.md around lines 767 to
775, the deleteRecords function currently defaults unknown resourceType to
'audit_logs', which risks deleting audit logs by mistake; change the
implementation to explicitly map allowed resourceType values ('pdf' -> 'pdfs',
'output' -> 'outputs') and throw an error (or reject) for any unknown
resourceType, performing no deletion; also validate that ids is non-empty before
calling supabase.delete() to avoid unnecessary operations.
Based on industry research and best practices, updated implementation recommendations to use battle-tested, widely-adopted frameworks: ## Multi-LLM Provider: Vercel AI SDK (Recommended) Previous approach: Custom provider abstraction layer Updated approach: Vercel AI SDK + optional LiteLLM Benefits: - 100+ providers with unified interface - Native streaming, tool calling, structured outputs - Model Context Protocol support - Better maintained (Vercel backing) - Perfect fit for TypeScript/Next.js stack - 12K+ GitHub stars Alternative: LiteLLM for Python pipelines - 1.8K+ models, 80+ providers - Built-in cost tracking, load balancing, guardrails - 20K+ GitHub stars, Y Combinator backed - Can run as AI Gateway proxy server ## OCR Solutions: Multi-Method Approach Previous: AWS Textract only Updated: LLMWhisperer + DeepSeek-OCR + AWS Textract + Tesseract 1. LLMWhisperer (Unstract) - Primary - Layout preservation for LLM preprocessing - Free tier: 100 pages/day - Multiple modes: native_text, high_quality, form - Document enhancement (de-skewing, contrast) 2. DeepSeek-OCR - High-Volume - 97% accuracy with 10x compression - 200K+ pages/day on single A100 GPU - Self-hosted, $0 per page - Best for complex layouts, formulas 3. AWS Textract - Forms & Tables - Enterprise-grade reliability - Best for forms, key-value extraction - Pay-per-use 4. Tesseract - Fallback - Free, open-source - Simple text extraction Cost savings: 60-80% vs AWS Textract only Includes: - Detailed Vercel AI SDK implementation examples - LiteLLM proxy server configuration - OCR decision logic and method selection - Code examples for all solutions - Migration path from current implementation - Cost comparison and architecture diagrams - Updated timeline: 4-5 weeks vs 6 weeks previous Recommendation: Adopt Vercel AI SDK + LLMWhisperer as primary, with DeepSeek-OCR for high-volume and AWS Textract for forms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (1)
docs/UPDATED_RECOMMENDATIONS.md (1)
690-694: Clarify DeepSeek-OCR hardware requirements.The hardware requirements section (lines 690-693) mentions "NVIDIA A100-40G (minimum)" for production but also notes "RTX 4090 (slower but works)". This is ambiguous—clarify whether A100 is the recommended minimum or an aspirational target, and provide more realistic entry-level GPU options.
Revise to:
**Hardware Requirements:** -- **Minimum:** NVIDIA A100-40G (for production speed) -- **Budget:** RTX 4090 (slower but works) -- **Performance:** 2500 tokens/s on A100, 200K+ pages/day +- **Recommended (Production):** NVIDIA A100-40G or H100 (2500+ tokens/s, 200K+ pages/day) +- **Budget Alternative:** RTX 4090 (significantly slower, ~200-400 tokens/s) +- **Minimum:** RTX 3090 or A10 (not recommended for high-volume production)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/UPDATED_RECOMMENDATIONS.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/UPDATED_RECOMMENDATIONS.md
[grammar] ~800-~800: Use a hyphen to join words.
Context: ...support (self-hosted) - ✅ Implement cost tracking middleware - ✅ UI for provider ...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
docs/UPDATED_RECOMMENDATIONS.md
431-431: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
831-831: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (3)
docs/UPDATED_RECOMMENDATIONS.md (3)
302-329: Code example references undefinedsupabaseimport.The cost-tracking middleware code (lines 316-323) calls
supabase.from('llm_usage').insert()without showing the import or initialization. While this is documentation, it could mislead readers about the source of this object.Add a comment or code snippet showing how
supabaseis imported and initialized, e.g.:// Add near the top of the middleware code example import { createClient } from '@supabase/supabase-js'; const supabase = createClient( process.env.SUPABASE_URL!, process.env.SUPABASE_ANON_KEY! );
1-11: Document coordinates well with PR context.The document appropriately frames itself as an update to previous recommendations and aligns well with the enterprise roadmap and implementation plans mentioned in the enriched summary. The structured approach (comparison → detailed implementation → costs → timeline) is logical and professional.
829-880: Architecture diagrams are clear and comprehensive.The two ASCII architecture diagrams effectively illustrate the recommended system design (Vercel AI SDK frontend, OCR service backend, and optional LiteLLM proxy). They provide good visual reference for implementers.
| npm install ai @ai-sdk/openai @ai-sdk/anthropic @ai-sdk/google @ai-sdk/mistral | ||
| ``` | ||
|
|
||
| ### 2.2 Provider Configuration | ||
|
|
||
| ```typescript | ||
| // src/lib/llm/providers.ts | ||
|
|
||
| import { openai } from '@ai-sdk/openai'; | ||
| import { anthropic } from '@ai-sdk/anthropic'; | ||
| import { google } from '@ai-sdk/google'; | ||
| import { createOpenAI } from '@ai-sdk/openai'; // For Azure, custom endpoints | ||
|
|
||
| // OpenAI | ||
| export const openaiProvider = openai; | ||
|
|
||
| // Anthropic Claude | ||
| export const claudeProvider = anthropic; | ||
|
|
||
| // Google Gemini | ||
| export const geminiProvider = google; | ||
|
|
||
| // Azure OpenAI (custom endpoint) | ||
| export const azureProvider = createOpenAI({ | ||
| baseURL: process.env.AZURE_OPENAI_ENDPOINT, | ||
| apiKey: process.env.AZURE_OPENAI_KEY, | ||
| organization: '', | ||
| }); | ||
|
|
||
| // Ollama (local/self-hosted) | ||
| export const ollamaProvider = createOpenAI({ | ||
| baseURL: 'http://localhost:11434/v1', | ||
| apiKey: 'ollama', // Ollama doesn't require real API key | ||
| }); | ||
|
|
||
| // AWS Bedrock (via custom provider) | ||
| export const bedrockProvider = createOpenAI({ | ||
| baseURL: process.env.BEDROCK_ENDPOINT, | ||
| apiKey: process.env.AWS_ACCESS_KEY, | ||
| // Note: Bedrock requires AWS signature, may need middleware | ||
| }); | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify Vercel AI SDK provider package availability and naming.
The installation and provider configuration code uses specific package names (@ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, @ai-sdk/mistral). Ensure these packages exist and are the correct names in the current npm registry.
🌐 Web query:
What are the correct and latest official Vercel AI SDK provider packages available as of November 2025?
💡 Result:
Here are the official Vercel AI SDK provider packages (first‑party / Vercel‑maintained) available as of November 13, 2025:
- @ai-sdk/openai. [1][2]
- @ai-sdk/anthropic. [1][2]
- @ai-sdk/google (Google Generative AI). [1][2]
- @ai-sdk/google-vertex (Google Vertex). [1][2]
- @ai-sdk/azure (Azure AI). [3][2]
- @ai-sdk/cohere. [1][2]
- @ai-sdk/mistral. [4][2]
- @ai-sdk/xai (xAI / Grok). [4][2]
- @ai-sdk/vercel (Vercel provider / v0 API). [5][2]
- @ai-sdk/gateway (AI Gateway — unified access to multiple providers). [6][2]
Notes:
- Some community or third‑party provider packages (e.g., Ollama adapters, Mem0, Auth0 integrations) exist but are not first‑party Vercel packages — check their repos/docs if you need them. [1][5]
Sources: Vercel blog & changelog, Vercel Docs (AI Gateway / Models & Providers), and AI SDK provider documentation. [1][2][3][4][6]
References
- Vercel — Introducing Vercel AI SDK (providers added).
- Vercel Docs — Models & Providers / AI Gateway documentation.
- Vercel changelog — Azure AI provider announcement (Jun 12, 2024).
- AI SDK provider listings (examples showing xAI, Mistral, OpenAI, Anthropic, etc.).
- npm / @ai-sdk/vercel package listing.
- AI SDK docs — @ai-sdk/gateway provider.
Update npm install and use official @ai-sdk/azure package for Azure OpenAI.
The packages @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, and @ai-sdk/mistral are correct and available. However, the provider configuration needs updates:
- Add
@ai-sdk/azureto the npm install command - Replace the Azure provider implementation with the official @ai-sdk/azure package instead of
createOpenAI()with custom baseURL settings
🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around lines 74 to 115, the npm install and
Azure provider usage are incorrect: add @ai-sdk/azure to the install command and
replace the current Azure provider that uses createOpenAI() with the official
@ai-sdk/azure package. Update the top imports to import the Azure factory from
@ai-sdk/azure (instead of using createOpenAI from @ai-sdk/openai), then
instantiate the azureProvider via that Azure package’s factory/function passing
the AZURE endpoint and key from environment variables (and any required
Azure-specific options), and remove the custom createOpenAI-based Azure
configuration. Ensure the npm install example includes @ai-sdk/azure and the
code example imports and constructs the Azure provider using the official
package and env vars.
| ### 2.6 Cost Tracking (Add Custom Middleware) | ||
|
|
||
| ```typescript | ||
| // Middleware for cost tracking | ||
| import { experimental_wrapLanguageModel as wrapLanguageModel } from 'ai'; | ||
|
|
||
| function withCostTracking(model: any, providerId: string) { | ||
| return wrapLanguageModel({ | ||
| model, | ||
| middleware: { | ||
| transformParams: async ({ params }) => params, | ||
| wrapGenerate: async ({ doGenerate, params }) => { | ||
| const startTime = Date.now(); | ||
| const result = await doGenerate(); | ||
|
|
||
| // Track usage | ||
| await supabase.from('llm_usage').insert({ | ||
| provider: providerId, | ||
| model: params.model, | ||
| input_tokens: result.usage.promptTokens, | ||
| output_tokens: result.usage.completionTokens, | ||
| cost: calculateCost(providerId, result.usage), | ||
| duration_ms: Date.now() - startTime, | ||
| }); | ||
|
|
||
| return result; | ||
| }, | ||
| }, | ||
| }); | ||
| } | ||
|
|
||
| // Usage | ||
| const trackedModel = withCostTracking( | ||
| openai('gpt-4o-mini'), | ||
| 'openai' | ||
| ); | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimental API usage requires stability warning.
The cost-tracking middleware uses experimental_wrapLanguageModel (line 304), which is an experimental Vercel AI SDK API. This may change or be deprecated in future versions, and code relying on it could break.
Consider adding a prominent warning in the documentation about the experimental nature of this API, and suggest monitoring Vercel AI SDK release notes for stability updates. Alternatively, propose an alternative approach that doesn't rely on experimental features, or note that users should pin the ai package version to a known-compatible release.
🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around lines 300 to 336, the example uses the
experimental experimental_wrapLanguageModel API without warning; add a prominent
stability notice immediately above the code block that states this API is
experimental and may change or be removed, advise readers to pin the ai package
to a known-compatible version, monitor Vercel AI SDK release notes for breaking
changes, and suggest an alternative approach (e.g., implement
middleware/wrapping at your application layer or use stable public SDK hooks) as
a fallback if the experimental API changes.
| cat > litellm_config.yaml <<EOF | ||
| model_list: | ||
| # OpenAI Models | ||
| - model_name: gpt-4o-mini | ||
| litellm_params: | ||
| model: openai/gpt-4o-mini | ||
| api_key: os.environ/OPENAI_API_KEY | ||
|
|
||
| # Anthropic Models | ||
| - model_name: claude-sonnet | ||
| litellm_params: | ||
| model: anthropic/claude-3-5-sonnet-20241022 | ||
| api_key: os.environ/ANTHROPIC_API_KEY | ||
|
|
||
| # Azure OpenAI | ||
| - model_name: azure-gpt4 | ||
| litellm_params: | ||
| model: azure/gpt-4o | ||
| api_base: os.environ/AZURE_OPENAI_ENDPOINT | ||
| api_key: os.environ/AZURE_OPENAI_KEY | ||
|
|
||
| # Ollama (self-hosted) | ||
| - model_name: llama3 | ||
| litellm_params: | ||
| model: ollama/llama3 | ||
| api_base: http://localhost:11434 | ||
|
|
||
| # Load balancing | ||
| router_settings: | ||
| routing_strategy: least-busy | ||
| num_retries: 3 | ||
| timeout: 600 | ||
|
|
||
| # Cost tracking (built-in) | ||
| litellm_settings: | ||
| success_callback: ["langfuse"] # Track all requests | ||
| failure_callback: ["langfuse"] | ||
| set_verbose: true | ||
|
|
||
| # Guardrails | ||
| guardrails: | ||
| - guardrail_name: "content-filter" | ||
| litellm_params: | ||
| mode: during_call | ||
| guardrail: "azure-content-filter" | ||
| EOF | ||
|
|
||
| # Start proxy server | ||
| litellm --config litellm_config.yaml --port 4000 | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify LiteLLM configuration syntax and available options.
The LiteLLM proxy configuration (lines 358-407) uses specific YAML structure and settings like routing_strategy: least-busy, success_callback: ["langfuse"], and guardrail: "azure-content-filter". These should be verified against the current LiteLLM documentation to ensure syntax accuracy.
🌐 Web query:
What is the current LiteLLM proxy server configuration syntax and available options as of November 2025?
💡 Result:
Summary of LiteLLM proxy configuration syntax & available options (as of Nov 13, 2025)
Quick overview
- The proxy is configured via a YAML config file (passed with --config or via LITELLM_CONFIG_PATH / CONFIG_FILE_PATH / bucket env vars). The main top-level sections are: model_list, general_settings, litellm_settings, router_settings, environment_variables (and extras like UI/auth settings). [1][2][3]
Main sections and common keys (YAML)
- model_list: list of model entries. Each entry commonly includes:
- model_name: user-facing alias (what clients request).
- litellm_params: provider params passed to litellm (model identifier, api_key, api_base, api_version, provider-specific fields).
- any per-model overrides (temperature, max_tokens, stream, fallback, etc.). [1][2]
- general_settings:
- master_key (proxy admin key), database connection settings (DATABASE_URL, connection_pool_limit, connection_timeout), port, logging, NO_DOCS / NO_REDOC toggles, UI_USERNAME / UI_PASSWORD. [1][2]
- litellm_settings:
- module-level toggles like drop_params, set_verbose, api_base, cache settings and other litellm runtime flags. [1][3]
- router_settings:
- routing_strategy (e.g., "least-busy"), other routing params for load-balancing across model instances. [1]
- environment_variables:
- list or mapping to inject expected env vars (REDIS_HOST, REDIS_PORT, provider keys when referencing os.environ). [1]
- Extras / deployment:
- Support for reading config.yaml from S3 or GCS via LITELLM_CONFIG_BUCKET_TYPE/NAME/OBJECT_KEY env vars.
- Docker/Helm/Kubernetes deployment examples; default proxy port 4000 (or 8000 in some docs), ability to run simple proxy mode with CLI flags --model, --port, --logs. [2][3]
Example minimal model_list entry
- YAML (conceptual):
- model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
api_base: https://api.openai.com
api_version: "2023-05-15"
temperature: 0.0
stream: True
(Use env var references for secrets). [1][2]
- model_name: gpt-4o
- model_list:
Other notable options / behaviors
- Admin UI & Swagger available at proxy root; Swagger exposes complete config schema at /#/config.yaml. Master key is required to use the admin UI. [2]
- Logging: --logs returns recent API call logs (saved to api_logs.json). [3]
- Two run modes: simple/basic proxy (CLI flags) and full proxy using config.yaml with DB-backed features (rate-limits, API key management, usage tracking). [3][1]
Sources
[1] LiteLLM proxy config docs — configs & examples.
[2] LiteLLM proxy Quick Start / UI docs (usage, master key, env vars).
[3] LiteLLM deploy / simple proxy docs (deployment, CLI flags, config-bucket support).
I need to search for more specific information about LiteLLM guardrails and callback features to determine if they are actually supported in the current version.
I need to search more specifically about the guardrails placement and the "azure-content-filter" guardrail type to fully verify the configuration.
Fix LiteLLM configuration structure and guardrail type.
The configuration has two critical issues:
-
Guardrails placement: Guardrails must be nested within each model's
litellm_params, not defined as a separate top-level section. Move the guardrails block into each model entry undermodel_list. -
Incorrect guardrail type: Use
guardrail: azure/text_moderationsorguardrail: azure/prompt_shield, not"azure-content-filter". -
Undocumented router settings:
num_retriesandtimeoutinrouter_settingsare not confirmed as valid LiteLLM proxy options. Verify these keys against current LiteLLM documentation or remove them if unsupported.
Move the guardrails block (lines ~390-396) into the appropriate model entries, and update the guardrail type.
🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around lines 358 to 407, the LiteLLM YAML
places guardrails as a top-level section and uses an invalid guardrail type;
move the guardrail configuration into each model's litellm_params (i.e., under
each model entry in model_list), replace the guardrail value
"azure-content-filter" with a supported type such as "azure/text_moderations" or
"azure/prompt_shield", and verify or remove the unconfirmed router_settings keys
(num_retries and timeout) against current LiteLLM docs if they are not
supported.
| ``` | ||
| ┌─────────────────────────────────────────┐ | ||
| │ Next.js Application │ | ||
| └─────────────┬───────────────────────────┘ | ||
| │ | ||
| │ HTTP (OpenAI format) | ||
| │ | ||
| ┌─────────────▼───────────────────────────┐ | ||
| │ LiteLLM Proxy Server │ | ||
| │ (Load Balancing, Cost Tracking, │ | ||
| │ Guardrails, Retry Logic) │ | ||
| └─────────────┬───────────────────────────┘ | ||
| │ | ||
| ┌─────────┼─────────┐ | ||
| │ │ │ | ||
| ┌───▼───┐ ┌──▼───┐ ┌──▼────┐ | ||
| │OpenAI │ │Claude│ │Azure │ | ||
| └───────┘ └──────┘ └───────┘ | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language identifier to fenced code block.
The ASCII architecture diagram at lines 431-449 is a fenced code block without a language specifier, violating markdown linting standards (MD040).
Apply this diff to fix the linting violation:
-```
+```text
┌─────────────────────────────────────────┐
│ Next.js Application │🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
431-431: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around lines 431 to 449 the fenced code block
containing the ASCII architecture diagram lacks a language identifier (MD040);
fix it by changing the opening fence from ``` to ```text so the block is
explicitly marked as plain text, leaving the diagram content and closing fence
unchanged.
| return { | ||
| text: result.text, | ||
| layout: result.markdown, | ||
| confidence: 0.97, // DeepSeek reports 97% accuracy at 10x compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify specific performance and accuracy claims.
The document makes several specific quantitative claims that should be validated:
- Line 552: "DeepSeek reports 97% accuracy at 10x compression"
- Line 693: "200K+ pages/day" throughput on A100
- Line 770: Cost figures for various LLM providers
These figures significantly influence architectural decisions and should be verified against current official documentation.
Also applies to: 693-693
🌐 Web query:
What is the current reported accuracy and performance of DeepSeek-OCR model as of November 2025?
💡 Result:
Short answer — reported numbers (as of Nov 13, 2025):
- OCR accuracy: ~97% decoding/precision at compression ratios <10×; ≈60% at ~20× compression. [1][2][3]
- Compression / tokens: claims of using ~100 vision tokens/page to match models that use 256–6,000+ tokens (i.e., 7–20× token reduction). [1][2][4]
- Throughput / latency: published production throughput ~200,000 pages/day on a single NVIDIA A100-40G (scales to millions on large clusters). [1][4]
- Benchmarks / comparisons: reported to outperform GOT‑OCR2.0 and MinerU2.0 on OmniDocBench/Fox at much lower vision-token budgets. [1][2]
Sources: DeepSeek-OCR arXiv paper (Oct 2025) and press coverage / vendor pages summarizing reported benchmarks. [1][2][3][4]
References
[1] DeepSeek‑OCR arXiv paper (Contexts Optical Compression, Oct 2025).
[2] InfoQ news summary (Oct 2025).
[3] Tom’s Hardware coverage (Oct–Nov 2025).
[4] DeepSeek product/site and blog posts (Oct–Nov 2025).
🌐 Web query:
What are the current official pricing and token costs for OpenAI GPT-4o, Anthropic Claude 3.5 Sonnet, and Google Gemini 1.5 Flash as of November 2025?
💡 Result:
Short answer (official sources, as of Nov 13, 2025):
-
OpenAI — GPT‑4o (official docs for GPT‑4o model snapshot gpt-4o-2024-08-06): inference listed at $3.75 per 1M input tokens and $15.00 per 1M output tokens; GPT‑4o‑mini: $0.15 per 1M input / $0.60 per 1M output. [OpenAI]. [1][2]
-
Anthropic — Claude 3.5 Sonnet: $3.00 per 1M input tokens; prompt-cache write $3.75/1M; cache read $0.30/1M;
$15.00 per 1M output tokens. (Prices shown as $ /MTok on Anthropic docs). [3] -
Google — Gemini 1.5 Flash (Vertex AI / Google Cloud): Google publishes per‑character serving prices (example blog: $0.0375 per 1M characters input and $0.15 per 1M characters output for Gemini 1.5 Flash); that converts approximately (≈4 characters ≈1 token) to roughly $0.15 per 1M tokens input and $0.60 per 1M tokens output for typical token→character conversion — Vertex/Cloud docs note prices and recent Flash reductions. Check Google Cloud Vertex AI pricing for region-specific/product details. [4][5]
Sources
- [1] OpenAI — “Introducing vision to the fine‑tuning API” / GPT‑4o model snapshot pricing. (OpenAI).
- [2] OpenAI — GPT‑4o mini announcement (GPT‑4o mini pricing). (OpenAI).
- [3] Anthropic — Claude pricing (Claude Sonnet / Claude 3.5 Sonnet prices). (Anthropic docs).
- [4] Google Cloud blog / Vertex AI — Gemini 1.5 Flash pricing and cost reductions (per‑character pricing). (Google Cloud).
- [5] Google Cloud Vertex AI pricing pages / model garden (Gemini Flash details).
If you want I can fetch the exact vendor pricing pages and paste the specific lines/links (or convert Google’s per‑character numbers to per‑token using a specific character→token ratio you prefer).
I've identified a factual inaccuracy in the document based on official DeepSeek-OCR benchmarks. Here's my analysis:
Line 552 — Accuracy claim requires correction:
The official benchmarks show ~97% accuracy at compression ratios <10×, with degraded performance (~60%) at ~20× compression. The current phrasing "97% accuracy at 10x compression" is misleading—97% applies only to compression ratios below 10×, not at the 10× threshold.
Line 693 — Throughput claim verified:
The 200K+ pages/day throughput on a single A100 matches the published production benchmark.
Line 770 — Pricing claims pending:
I cannot verify the cost figures on line 770 without seeing the actual document content. Current official pricing (Nov 2025) shows OpenAI GPT-4o at $3.75/$15.00 per 1M tokens, Anthropic Claude 3.5 Sonnet at $3.00/$15.00, and Google Gemini 1.5 Flash at approximately $0.15/$0.60 per 1M tokens—please share the specific figures from line 770 to confirm accuracy.
Action required: Correct line 552 to clarify that 97% accuracy applies to <10× compression ratios, not at 10× compression. Provide line 770 snippet for pricing verification.
🤖 Prompt for AI Agents
In docs/UPDATED_RECOMMENDATIONS.md around line 552, the statement "97% accuracy
at 10x compression" is misleading; update the wording to state that DeepSeek
reports ~97% accuracy for compression ratios below 10× (e.g., "<10×") and that
accuracy degrades (to ~60%) at around 20× compression, ensuring the phrasing
uses "<10×" or "below 10×" rather than "at 10×"; also, to verify the pricing on
line 770, paste the current line 770 snippet so we can confirm and correct the
cost figures as needed.
| ``` | ||
| ┌─────────────────────────────────────────────────────────┐ | ||
| │ Next.js Application (Frontend) │ | ||
| │ │ | ||
| │ ┌──────────────────────┐ ┌─────────────────────┐ │ | ||
| │ │ Vercel AI SDK │ │ React UI Hooks │ │ | ||
| │ │ (Multi-provider) │◄───│ (useChat, etc.) │ │ | ||
| │ └──────────┬───────────┘ └─────────────────────┘ │ | ||
| └─────────────┼───────────────────────────────────────────┘ | ||
| │ | ||
| ┌─────────┼─────────┐ | ||
| │ │ │ | ||
| ┌───▼──┐ ┌──▼───┐ ┌──▼────┐ | ||
| │OpenAI│ │Claude│ │Gemini │ | ||
| └──────┘ └──────┘ └───────┘ | ||
|
|
||
|
|
||
| ┌─────────────────────────────────────────────────────────┐ | ||
| │ Next.js Application (Backend API) │ | ||
| │ │ | ||
| │ ┌──────────────────────┐ ┌─────────────────────┐ │ | ||
| │ │ OCR Service │ │ PDF Processing │ │ | ||
| │ │ (Multi-method) │◄───│ Pipeline │ │ | ||
| │ └──────────┬───────────┘ └─────────────────────┘ │ | ||
| └─────────────┼───────────────────────────────────────────┘ | ||
| │ | ||
| ┌─────────┼─────────┬──────────┐ | ||
| │ │ │ │ | ||
| ┌───▼────────┐│ ┌──▼───────┐ ┌▼─────────┐ | ||
| │LLMWhisperer││ │DeepSeek │ │Textract │ | ||
| │(API) ││ │(Self-host│ │(AWS) │ | ||
| └────────────┘│ └──────────┘ └──────────┘ | ||
| │ | ||
| ┌────▼────────┐ | ||
| │ Tesseract │ | ||
| │ (Fallback) │ | ||
| └─────────────┘ | ||
|
|
||
|
|
||
| Optional: LiteLLM Proxy (for advanced features) | ||
| ┌─────────────────────────────────────────────────────────┐ | ||
| │ LiteLLM Proxy Server │ | ||
| │ (Load Balancing, Cost Tracking, Guardrails) │ | ||
| └─────────────┬───────────────────────────────────────────┘ | ||
| ┌─────────┼─────────┬──────────┐ | ||
| │ │ │ │ | ||
| ┌───▼──┐ ┌──▼───┐ ┌──▼────┐ ┌──▼───┐ | ||
| │OpenAI│ │Claude│ │Azure │ │Ollama│ | ||
| └──────┘ └──────┘ └───────┘ └──────┘ | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language identifier to fenced code block (architecture diagram).
The second ASCII architecture diagram (lines 831-880) is a fenced code block without a language specifier, violating markdown linting standards (MD040).
Apply this diff:
-```
+```text
┌─────────────────────────────────────────────────────────┐
│ Next.js Application (Frontend) │🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
831-831: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
docs/UPDATED_RECOMMENDATIONS.md around lines 831-880: the ASCII architecture
diagram is in a fenced code block with no language specifier (triggering MD040);
fix by changing the opening fence from ``` to ```text (or otherwise adding
"text" after the opening backticks) so the block becomes a labeled text code
fence, ensuring the closing fence remains ``` unchanged.
- Multi-provider LLM service using Vercel AI SDK - Support for OpenAI, Anthropic, Google, Azure, Ollama - Structured outputs, streaming, embeddings - Cost tracking and fallback providers - Multi-method OCR service with smart routing - LLMWhisperer (primary - layout preservation) - DeepSeek-OCR (self-hosted, high-volume) - AWS Textract (forms and tables) - PDF.js fallback - REST API v1 foundation - /api/v1/health - Service health check - /api/v1/extract - Document extraction - /api/v1/chat - Chat completion - API key auth and rate limiting - Environment variables template (.env.example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 19
🧹 Nitpick comments (17)
src/app/api/v1/utils/auth.ts (1)
97-131: Unify auth errors with centralized error utilities and hardenretryAftercalculation
authErrorandrateLimitErrorhand‑roll JSON responses that largely mirror whatAPIErrorResponse/ErrorResponsesinerrors.tsalready provide, andretryAfteris computed twice with no clamping.Consider:
- Reusing
ErrorResponses.unauthorized/ErrorResponses.forbiddenforauthError, or delegating viacreateErrorResponseso all errors share the exact same payload shape (including future additions likerequestId).- For
rateLimitError, computingretryAfterSecondsonce and clamping it to a non‑negative value:export function rateLimitError(resetAt: number) { + const retryAfterSeconds = Math.max( + 0, + Math.ceil((resetAt - Date.now()) / 1000), + ); return Response.json( { error: { code: 'RATE_LIMITED', message: 'Too many requests', - retryAfter: Math.ceil((resetAt - Date.now()) / 1000), + retryAfter: retryAfterSeconds, }, }, { status: 429, headers: { - 'Retry-After': String(Math.ceil((resetAt - Date.now()) / 1000)), + 'Retry-After': String(retryAfterSeconds), }, } ); }This keeps behavior consistent and a bit more robust to any future changes in how
resetAtis computed.src/lib/ocr/types.ts (1)
1-175: Well-structured OCR type surfaceThe request/response, provider-specific, and error/usage types are laid out cleanly and should be straightforward to consume. I don’t see functional issues here; any further tweaks (e.g., changing
Buffer | stringorDateserialization inOCRUsageRecord) would be design choices rather than fixes.src/app/api/v1/utils/errors.ts (1)
81-93: Consider preserving thrownResponseobjects inhandleErrorRight now,
handleErroralways converts any thrown value into anINTERNAL_ERRORresponse. If a route ever adopts the patternthrow ErrorResponses.unauthorized()or throws a prebuiltResponse, that intent will be lost.You could make
handleErrormore flexible with a small check:export function handleError(error: unknown): Response { console.error('API Error:', error); - if (error instanceof Error) { + if (error instanceof Response) { + return error; + } + + if (error instanceof Error) { // Don't expose internal error details in production const message = process.env.NODE_ENV === 'development' ? error.message : 'An unexpected error occurred'; return ErrorResponses.internalError(message); }That keeps your current behavior for real exceptions while allowing explicit
Responsethrows to pass through unchanged.src/app/api/v1/chat/route.ts (1)
135-137: Improve error mapping for upstream LLM failuresAll thrown errors are funneled through
handleError, which currently always returns a generic 500 response. For provider-specific failures (e.g. 429 rate limits, 503 from an upstream LLM), this loses useful semantics for clients (retry behavior, UX). Consider extendinghandleErrorto detect your domain errors (e.g.LLMError) and surface appropriate HTTP status codes and error codes instead of always returning an internal error.src/lib/ocr/ocr-service.ts (5)
131-198: LLMWhisperer integration looks consistent; only minor polishThe LLMWhisperer flow (API key validation, form-data upload, error wrapping, and typed response mapping) is coherent and matches the declared
LLMWhispererResponse/OCRResulttypes.One minor behavior to be aware of:
extractTextoverwritesprocessingTimeMsafterward with end-to-end latency, so the API’s reportedprocessing_time_msis effectively discarded. If you care about distinguishing provider time vs total time, you may want to keep both (e.g.,providerProcessingTimeMsvsprocessingTimeMs).
203-226: Quota calculation semantics may not match upstream API
getLLMWhispererQuotaderiveslimitandremainingfromoverage_page_countandpages_extracted. If the upstream API’s semantics differ (e.g., one is overage already used vs. total allowance), the numbers exposed here could be misleading, which in turn affectsselectOCRMethod.Recommend double-checking the LLMWhisperer usage API docs and adjusting the computation or field names (e.g.,
dailyPageLimit,overageUsed) so the meaning is unambiguous.
314-391: Textract integration is reasonable but currently single-page and basicThe Textract integration is sound for simple PDFs, but note:
DetectDocumentTextandAnalyzeDocumentin synchronous mode are effectively single-document calls;pageCountis hard-coded to1, which may not reflect the actual page count of multi-page PDFs.- Credentials are wired manually via
credentialsrather than relying on the default AWS credential chain, which may be fine but reduces flexibility.- Table/form extraction helpers currently return empty arrays, so callers enabling
extractTables/detectFormswon’t see useful structures yet.If multi-page or richer structured extraction is in scope, consider:
- Adding a comment that
pageCountis a placeholder and/or computing it fromBlocks.- Implementing
extractTablesFromTextract/extractFormFieldsFromTextractor clearly documenting that they’re stubs for now.
424-453: PDF.js extraction: potential page-range validationThe PDF.js path looks fine overall and correctly supports password and optional page selection. One edge case:
- If
options.pagesincludes values outside1..pdf.numPages,pdf.getPage(pageNum)will throw and abort the whole extraction.If you expect user-supplied page lists, consider clamping/filtering to valid pages or throwing a clearer
OCRErrorwhen an invalid page is requested.
486-497: Textract table/form helpers intentionally stubbedBoth
extractTablesFromTextractandextractFormFieldsFromTextractcurrently return empty arrays. That’s fine as an MVP, but make sure consumers (e.g., API responses) don’t assume non-empty arrays to infer “feature enabled”.A short TODO comment tying these to a follow-up task (and tests) would help avoid them being mistaken for complete implementations later.
src/lib/llm/types.ts (1)
187-198: Usage record type is sufficient for basic telemetry
LLMUsageRecordcaptures the core metrics (tokens, cost, provider/model, requestType, resource IDs, timestamp). If you later add multi-tenant support, consider adding a first-classtenantId/userIdfield rather than overloadingresourceType/resourceId.src/lib/llm/providers.ts (2)
16-179: Provider configs are well-structured; keep an eye on pricing/version driftThe
PROVIDER_CONFIGSblock is clear and matchesLLMProviderConfig/LLMModelConfig. Capabilities andrecommendedflags are helpful for UI and default selection. Just be aware that model IDs and pricing can drift over time; a brief comment about their “as of” date or a central place to update them can help avoid silently stale values.
285-308: Provider availability check overlaps withisProviderConfigured
isProviderAvailablechecks env vars in almost the same wayisProviderConfigureddoes inllm-service.ts. That’s fine, but you might later want a single source of truth (e.g., export and reuse this function) to avoid divergence.Also note that Ollama is always treated as “available”, which is OK as long as callers understand this doesn’t guarantee the daemon is actually running.
src/lib/llm/llm-service.ts (5)
33-77: Completion flow is solid; minor note on usage extraction
generateCompletioncorrectly wires provider/model intogenerateText, extracts token usage defensively, and calculates cost viacalculateCost. Error wrapping intoLLMErrorwithretryableon 429 is sensible.Be aware that different providers may expose usage under different field names; your
(result.usage as any).promptTokens ?? inputTokenspattern handles this, but you may want a small helper to centralize this mapping if it grows more complex.
83-108: Streaming completion casting relies ontextStreamshape
generateStreamingCompletionreturnsresult.textStream as unknown as ReadableStream. This is fine ifstreamText’stextStreamis already aReadableStream-compatible object, but brittle otherwise.If the
aiSDK exposes a canonical stream type/helper (e.g.,toAIStreamorReadableStreamdirectly), consider using that instead of double casting to avoid surprises if the library’s internal stream type changes.
114-181: Structured response + text fallback are well designed
generateStructuredResponsecleanly usesgenerateObjectwithLLMResponseSchemaand provides a robust fallback to plain text viagenerateTextwhen structured output fails. Returning a text-typeLLMResponsekeeps the output contract stable for callers.Only minor nits:
- You log the error on fallback; if this becomes noisy in prod, consider gating behind a debug flag or structured logging.
- The
confidencefallback is hard-coded to0.7; you may want to make that configurable or at least document the choice.
264-294:chatCompletion’s context/instruction split is reasonable; note empty-message behaviorThe transformation from messages →
(context, instruction)is straightforward and delegates togenerateStructuredResponse, which maintains the structured output contract.One edge case: if
messageshas no non-system entries,instructionbecomes''and you effectively call the model with no user instruction. That’s probably fine, but if this function is user-facing, you may want to assert at least one non-system message and throw a clearer error.
394-441:getAvailableModels+isProviderConfiguredlargely duplicate provider availability logic
getAvailableModelsusesisProviderConfigured(env checks), whileproviders.tshas a similarisProviderAvailable. This duplication is easy to drift over time.Consider exporting and reusing a single availability helper from one module to keep behavior consistent across LLM service and any health endpoints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (14)
package.json(2 hunks)src/app/api/v1/chat/route.ts(1 hunks)src/app/api/v1/extract/route.ts(1 hunks)src/app/api/v1/health/route.ts(1 hunks)src/app/api/v1/utils/auth.ts(1 hunks)src/app/api/v1/utils/errors.ts(1 hunks)src/app/api/v1/utils/index.ts(1 hunks)src/lib/llm/index.ts(1 hunks)src/lib/llm/llm-service.ts(1 hunks)src/lib/llm/providers.ts(1 hunks)src/lib/llm/types.ts(1 hunks)src/lib/ocr/index.ts(1 hunks)src/lib/ocr/ocr-service.ts(1 hunks)src/lib/ocr/types.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
src/app/api/v1/chat/route.ts (4)
src/app/api/v1/utils/auth.ts (5)
validateAPIKey(24-50)authError(100-110)hasPermission(55-57)checkRateLimit(66-95)rateLimitError(115-131)src/app/api/v1/utils/errors.ts (2)
ErrorResponses(52-76)handleError(81-93)src/lib/llm/types.ts (1)
LLMProviderType(13-13)src/lib/llm/llm-service.ts (2)
generateStreamingCompletion(83-108)generateCompletion(33-77)
src/app/api/v1/health/route.ts (2)
src/lib/llm/providers.ts (1)
getAvailableProviders(285-287)src/lib/ocr/ocr-service.ts (1)
getLLMWhispererQuota(203-226)
src/app/api/v1/extract/route.ts (4)
src/app/api/v1/utils/auth.ts (5)
validateAPIKey(24-50)authError(100-110)hasPermission(55-57)checkRateLimit(66-95)rateLimitError(115-131)src/app/api/v1/utils/errors.ts (2)
ErrorResponses(52-76)handleError(81-93)src/lib/ocr/ocr-service.ts (1)
extractText(37-77)src/lib/ocr/types.ts (1)
OCRMethod(11-11)
src/lib/llm/providers.ts (1)
src/lib/llm/types.ts (3)
LLMProviderType(13-13)LLMProviderConfig(15-23)LLMModelConfig(25-36)
src/lib/ocr/ocr-service.ts (2)
src/lib/ocr/index.ts (5)
extractText(12-12)getLLMWhispererQuota(13-13)simpleExtract(14-14)extractWithLayout(15-15)extractForms(16-16)src/lib/ocr/types.ts (6)
OCRRequest(19-49)OCRResult(51-78)OCRError(149-160)OCRMethod(11-11)LLMWhispererResponse(116-123)LLMWhispererQuota(125-130)
src/lib/llm/llm-service.ts (2)
src/lib/llm/types.ts (9)
LLMCompletionRequest(50-54)LLMCompletionResponse(56-67)LLMError(170-181)LLMConfig(42-48)LLMResponse(144-144)LLMResponseSchema(130-136)EmbeddingRequest(150-157)EmbeddingResponse(159-164)LLMProviderType(13-13)src/lib/llm/providers.ts (4)
getModel(242-245)calculateCost(267-280)getDefaultModel(250-254)PROVIDER_CONFIGS(16-179)
🪛 Biome (2.1.2)
src/lib/llm/providers.ts
[error] 214-214: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Safe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
[error] 215-215: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Safe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
[error] 228-228: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Safe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
🔇 Additional comments (13)
src/lib/ocr/index.ts (1)
1-17: OCR barrel file looks clean and focusedRe-exporting types and a curated set of service functions keeps the public surface tidy and easy to consume; no issues spotted here.
src/app/api/v1/utils/errors.ts (1)
52-76: Centralized error helpers look consistent
APIErrorCode,createErrorResponse, andErrorResponsesgive you a clean, uniform error surface across v1 endpoints. Status codes and messages are sensible defaults and easy to override per call site.src/app/api/v1/utils/index.ts (1)
1-6: API utils index is straightforward and usefulRe-exporting
authanderrorsfrom a singleindexkeeps route imports tidy and consistent; this looks good as-is.src/app/api/v1/health/route.ts (1)
16-55: Health payload structure and status derivation look solidParallel checks, clear per‑service status, and an aggregated
health.statusbased on LLM/OCR availability are all reasonable. No blocking issues from a correctness or resilience standpoint.src/lib/llm/index.ts (1)
1-34: Barrel exports are coherent and improve LLM API ergonomicsCentralizing types, provider utilities, and service functions behind
@/lib/llmmakes the LLM surface easier to consume and evolve. Export list matches the described providers and service functions; no issues spotted.src/lib/ocr/ocr-service.ts (1)
291-303: Health check looks fine; be aware ofAbortSignal.timeoutcompatibilityThe DeepSeek health probe with
AbortSignal.timeout(2000)is a good guard. Just ensure your runtime (Node/Next.js version) consistently supportsAbortSignal.timeout; otherwise, consider a manualAbortControllerfallback if you ever target older Node versions.src/lib/llm/types.ts (4)
13-37: Provider and model config types are clear and align with usage
LLMProviderType,LLMProviderConfig, andLLMModelConfigare well-structured and match howPROVIDER_CONFIGSandgetModelconsume them. The pricing comments (“$ per 1M tokens”) are especially useful for downstream cost calculations.
42-67: Completion request/response shapes look coherent
LLMConfig,LLMCompletionRequest, andLLMCompletionResponsealign cleanly with howgenerateCompletionconstructs and returns data (provider/model, usage, cost, finishReason). This should make it straightforward to track cost and usage across providers.
79-145: Structured output schema is well-factoredThe discriminated union (
FormattedResponseSchema) andLLMResponseSchemagive you a solid, type-safe contract for structured responses (single value, chart, table, text). This matches howgenerateStructuredResponseis implemented and should make downstream rendering straightforward.
170-181: LLMError type is consistent with OCR error pattern
LLMErrormirrors theOCRErrorstructure (message, code, provider, statusCode, retryable), which keeps error handling consistent across LLM and OCR surfaces. This is a good foundation for centralized logging/telemetry.src/lib/llm/providers.ts (1)
242-280: Model lookup and cost calculation are straightforward and correct
getModel,getDefaultModel,getModelConfig, andcalculateCostare clean and do what they say. Cost is correctly scaled per 1M tokens using the configured pricing.src/lib/llm/llm-service.ts (2)
234-259:simpleCompletionis a clean backward-compatible helperThe helper correctly defaults provider/model, forwards overrides, and delegates to
generateCompletion. This should make it easy to migrate existing call sites without touching provider plumbing.
351-385:compareProvidersis straightforward A/B harnessThe parallel
Promise.allSettledapproach and filtering fulfilled results is clean and robust. Just remember that model IDs must be valid per provider (same concern as above, but here the caller passes explicit pairs, so it’s their responsibility).
| "@ai-sdk/anthropic": "^2.0.45", | ||
| "@ai-sdk/google": "^2.0.38", | ||
| "@ai-sdk/openai": "^2.0.68", | ||
| "@aws-sdk/client-textract": "^3.934.0", | ||
| "@radix-ui/react-checkbox": "^1.1.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
New AI/SDK dependencies align with the LLM/OCR features; verify version compatibility
The added @ai-sdk/*, ai, and @aws-sdk/client-textract dependencies match the new multi‑provider LLM and Textract usage. From a structure standpoint this looks fine; just ensure the chosen versions are compatible with your Next 14 / React 18 / Node.js runtime and each other (peer deps, bundling, ESM/CJS).
Also applies to: 40-40
🌐 Web query:
Check the supported Node.js / React / Next.js versions and peer dependency requirements for:
- `ai@5.0.95`
- `@ai-sdk/openai@2.0.68`
- `@ai-sdk/anthropic@2.0.45`
- `@ai-sdk/google@2.0.38`
- `@aws-sdk/client-textract@3.934.0`
and confirm they are compatible with a Next.js 14.2.8 + React 18 app.
💡 Result:
Short answer: Yes — these package versions are compatible with a Next.js 14.2.8 + React 18 app, provided your runtime Node.js meets the Next.js v14 minimum (Node 18.17+) and you satisfy the AI SDK peer deps (notably zod >= 3.25.0). Evidence:
- Next.js v14 requires Node >= 18.17 and is intended to be used with React 18. (Next.js upgrade notes). [1]
- The AI SDK v5 (ai@5.x and the @ai-sdk/* v2 series) requires Node 18+ in its docs and additionally lists zod (>= 3.25.0) as a peer dependency for the v5 migration / providers. The migration/getting-started docs for AI SDK v5 show the ai package and @ai-sdk/* v2 series and call out zod as a required peer dependency. (@AI-SDK provider packages are the v2 series used by ai@5.x). [2][3]
- Individual @AI-SDK provider packages (openai / anthropic / google) are part of the AI SDK v2 provider releases and are published as companion provider packages for ai@5 — their docs and package pages align with the AI SDK v5 requirements above. (Examples: Anthropic provider listing; OpenAI/Google providers are the same provider-series). [4]
- @aws-sdk/client-textract (AWS SDK for JavaScript v3) targets modern Node.js LTS and AWS has announced v3 no longer supports Node 16.x; v3 is intended for Node 18+ (AWS SDK v3 Node support guidance). The modular v3 clients (including client-textract@3.934.0) do not impose React/Next peer deps and are compatible in server/runtime code (Node) used by Next.js. [5][6]
What to check in your project
- Use Node >= 18.17 (Next.js 14 minimum). If your environment uses a lower Node, upgrade. [1]
- Add/ensure zod >= 3.25.0 in your package.json if you use ai@5.x / @ai-sdk/* v2 features that declare it as a peer dependency. [2][3]
- @aws-sdk/client-textract has no React/Next peer deps; install it normally for server-side usage (Node runtime). [5]
Sources
- Next.js v14 upgrade / requirements (Node >=18.17, React 18). [1]
- AI SDK v5 migration / getting-started (ai@5.x, @ai-sdk/* v2, and zod peer dep note). [2][3]
- @ai-sdk/anthropic package listing / provider info (example provider page). [4]
- AWS SDK for JavaScript (v3) Node support / end-of-support notices (Node 16 EOL; v3 targets Node 18+). [5][6]
If you want, I can fetch the exact peerDependencies sections from each package.json for these exact versions and paste them here.
---
🏁 Script executed:
```shell
#!/bin/bash
# Check for zod dependency in package.json
rg '"zod"' package.json -A 1
Length of output: 25
Upgrade zod to satisfy ai@5.x peer dependency requirement
The added dependencies are compatible with Next.js 14.2.8 + React 18 (Node 18.17+), but zod needs to be upgraded. The AI SDK v5 package (ai@5.0.95) requires zod >= 3.25.0 as a peer dependency. Currently, package.json specifies zod@^3.23.8, which falls below the minimum. Update to zod@^3.25.0 or higher to prevent runtime failures when using the AI SDK providers.
🤖 Prompt for AI Agents
In package.json around lines 17 to 21, the project currently pins zod to ^3.23.8
which is below the ai@5.x peer requirement; update the zod entry to ^3.25.0 (or
a later 3.x release) in package.json, then run the package manager (npm install
or yarn install) to update the lockfile and ensure dependencies resolve; finally
run your test/build steps (or a simple start) to verify there are no runtime
peer-dependency issues.
| if (stream) { | ||
| // Streaming response | ||
| const textStream = await generateStreamingCompletion({ | ||
| prompt, | ||
| config, | ||
| systemPrompt: systemMessage, | ||
| }); | ||
|
|
||
| // Return the stream with appropriate headers | ||
| return new Response(textStream, { | ||
| headers: { | ||
| 'Content-Type': 'text/event-stream', | ||
| 'Cache-Control': 'no-cache', | ||
| Connection: 'keep-alive', | ||
| 'X-RateLimit-Remaining': String(rateLimit.remaining), | ||
| 'X-RateLimit-Reset': String(rateLimit.resetAt), | ||
| }, | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Clarify streaming format vs text/event-stream header
generateStreamingCompletion returns a plain text ReadableStream, but the response is advertised as text/event-stream. Unless textStream is already SSE‑framed, EventSource/SSE clients may misinterpret the stream. Consider either:
- Using the Vercel AI helper that returns a proper SSE
Response(e.g.toDataStreamResponse()upstream ingenerateStreamingCompletion), or - Changing the content type to a generic streaming type (e.g.
text/plain) if consumers will read raw chunks viafetch/ReadableStream.
🌐 Web query:
What is the recommended way to return a streamed chat completion with Vercel AI SDK `streamText` so that clients receive a proper `text/event-stream` SSE response?
💡 Result:
Use streamText and return its built-in SSE response helper — call result.toUIMessageStreamResponse() (or toDataStreamResponse for the simpler "data" protocol) and return that Response so the SDK sets the proper text/event-stream headers and protocol header for you.
Example (Node / Vercel function):
const result = streamText({ model: openai('gpt-4o-mini'), messages });
return result.toUIMessageStreamResponse({
headers: {
// optional extra headers
'X-Chat-Id': chatId,
},
});
This yields a Response with Content-Type: text/event-stream; charset=utf-8 and x-vercel-ai-ui-message-stream: v1 (uses the SDK's UI message stream conversion). [1][2][3]
Sources: Official SDK internals and examples. [1] [2] [3]
[/run_scripts]
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and inspect generateStreamingCompletion implementation
rg -n "generateStreamingCompletion" --type=ts -B2 -A10
Length of output: 2932
🏁 Script executed:
#!/bin/bash
# Search for imports of Vercel AI SDK to understand what's being used
rg -n "streamText\|toUIMessageStreamResponse\|toDataStreamResponse" --type=ts -B2 -A2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Find where Response is constructed with text/event-stream
rg -n "text/event-stream" --type=ts -B5 -A5Length of output: 736
[/run_scripts]
---
🏁 Script executed:
```shell
#!/bin/bash
# View the full generateStreamingCompletion implementation
sed -n '83,110p' src/lib/llm/llm-service.ts
Length of output: 729
Fix streaming response to use Vercel AI SDK's proper SSE helper
The review comment is correct. generateStreamingCompletion() currently returns result.textStream, which is raw text without SSE framing. While the route handler sets the text/event-stream header, the actual stream content is not SSE-formatted—clients expecting proper Server-Sent Events protocol will fail.
Per Vercel AI SDK recommendations, use result.toUIMessageStreamResponse() which automatically sets proper text/event-stream headers and protocol framing.
Required changes:
- src/lib/llm/llm-service.ts (line 83–108): Change
generateStreamingCompletion()to returnPromise<Response>and useresult.toUIMessageStreamResponse()instead ofresult.textStream. - src/app/api/v1/chat/route.ts (line 93–107): Update to directly return the Response from
generateStreamingCompletion()instead of manually wrapping it.
🤖 Prompt for AI Agents
In src/app/api/v1/chat/route.ts around lines 91–108 and
src/lib/llm/llm-service.ts around lines 83–108, the streaming flow currently
returns a raw text stream and wraps it with SSE headers, but the stream lacks
SSE framing; update generateStreamingCompletion in llm-service.ts to return a
Promise<Response> using result.toUIMessageStreamResponse() (so the SDK applies
proper SSE framing and headers), and in route.ts simply await and return that
Response directly instead of re-wrapping the stream and headers.
| const { file, url, method, preserveLayout, detectForms, extractTables, pages, language, password } = | ||
| validation.data; | ||
|
|
||
| // Ensure either file or url is provided | ||
| if (!file && !url) { | ||
| return ErrorResponses.validationError('Either file (base64) or url must be provided'); | ||
| } | ||
|
|
||
| // Convert base64 to buffer if file provided | ||
| let input: Buffer | string; | ||
| if (file) { | ||
| try { | ||
| input = Buffer.from(file, 'base64'); | ||
| } catch { | ||
| return ErrorResponses.validationError('Invalid base64 file data'); | ||
| } | ||
| } else { | ||
| input = url!; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tighten input-source and base64 handling; avoid unnecessary cast
A few small issues in the input handling block:
- The comment says “Either file (base64) or url must be provided”, but the logic only checks that at least one is present. If both are sent,
filesilently wins andurlis ignored. Consider enforcing “exactly one” source, e.g. with a refinement on the Zod schema or an explicitif (file && url)validation error. Buffer.from(file, 'base64')seldom throws for malformed base64; yourtry/catchwill almost never return “Invalid base64 file data”, and bad input is more likely to fail later insideextractTextas a 500. If you care about returning a clear 4xx, consider validating the base64 string (length/charset) or using a small helper to detect obviously invalid data before calling OCR.method: method as OCRMethodforces an optional field into the non‑optionalOCRMethodtype. SinceextractTextalready chooses a method whenoptions?.methodis falsy, you can drop the cast and allowmethodto beundefinednaturally for better type safety.- Minor: if
pagesare intended as page numbers,z.array(z.number().positive())will accept non‑integers;z.array(z.number().int().positive())would better model that constraint.
These refinements would make the endpoint behavior more predictable and validation errors clearer without changing the overall design.
Also applies to: 85-97
| export async function validateAPIKey(request: NextRequest): Promise<APIKeyInfo | null> { | ||
| const authHeader = request.headers.get('authorization'); | ||
| const apiKeyHeader = request.headers.get('x-api-key'); | ||
|
|
||
| const apiKey = apiKeyHeader || (authHeader?.startsWith('Bearer ') ? authHeader.slice(7) : null); | ||
|
|
||
| if (!apiKey) { | ||
| return null; | ||
| } | ||
|
|
||
| // TODO: Implement actual API key validation against database | ||
| // For now, validate against environment variable for development | ||
| const validKey = process.env.API_KEY; | ||
| if (apiKey === validKey) { | ||
| return { | ||
| keyId: 'dev-key', | ||
| userId: 'dev-user', | ||
| permissions: ['read', 'write', 'extract', 'chat'], | ||
| rateLimit: { | ||
| requestsPerMinute: 60, | ||
| requestsPerDay: 10000, | ||
| }, | ||
| }; | ||
| } | ||
|
|
||
| return null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Dev-only API key stub should be guarded to avoid accidental production use
validateAPIKey currently treats a single process.env.API_KEY as the only valid key, with a TODO to back it by real storage. That’s fine for early development, but it’s easy to accidentally ship this into a multi-tenant or production environment.
Consider at least:
- Gating this branch on
NODE_ENV === 'development'or a dedicated feature flag, and - Failing fast (e.g., 500 / startup error) if
API_KEYisn’t set when running in non‑dev.
That will make it harder to forget replacing this stub with proper key lookup and per-user/organization metadata.
🤖 Prompt for AI Agents
In src/app/api/v1/utils/auth.ts around lines 24-50, the dev-only API key stub is
currently unguarded; change it so the environment API_KEY is only accepted when
NODE_ENV === 'development' or a dedicated flag like ALLOW_DEV_API_KEY === 'true'
is set, and add a startup-time sanity check that throws an error (or exits) if
NODE_ENV !== 'development' and process.env.API_KEY is missing but the code would
otherwise rely on it; keep the validateAPIKey return behavior the same (return
dev key info only when the guard allows and the key matches, otherwise return
null).
| export function checkRateLimit(keyInfo: APIKeyInfo): { | ||
| allowed: boolean; | ||
| remaining: number; | ||
| resetAt: number; | ||
| } { | ||
| const now = Date.now(); | ||
| const windowMs = 60 * 1000; // 1 minute window | ||
| const key = `rate:${keyInfo.keyId}`; | ||
|
|
||
| let record = rateLimitStore.get(key); | ||
|
|
||
| // Reset if window expired | ||
| if (!record || record.resetAt < now) { | ||
| record = { | ||
| count: 0, | ||
| resetAt: now + windowMs, | ||
| }; | ||
| } | ||
|
|
||
| record.count++; | ||
| rateLimitStore.set(key, record); | ||
|
|
||
| const remaining = Math.max(0, keyInfo.rateLimit.requestsPerMinute - record.count); | ||
|
|
||
| return { | ||
| allowed: record.count <= keyInfo.rateLimit.requestsPerMinute, | ||
| remaining, | ||
| resetAt: record.resetAt, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rate‑limit helper docs and shape are slightly out of sync
Two small inconsistencies here:
- The JSDoc says “Returns true if request is allowed, false if rate limited”, but the function actually returns an object
{ allowed, remaining, resetAt }. The comment should be updated to match the real return type. APIKeyInfo.rateLimithasrequestsPerDay, butcheckRateLimitonly enforcesrequestsPerMinute. If daily limiting is a near‑term requirement, consider enforcing it here (or add a comment/TODO so the type doesn’t overpromise).
Neither is a runtime bug, but aligning the docs and type usage will reduce surprises for callers.
🤖 Prompt for AI Agents
In src/app/api/v1/utils/auth.ts around lines 66 to 95, the function returns an
object { allowed, remaining, resetAt } but the JSDoc still says it returns a
boolean and the APIKeyInfo.rateLimit type exposes requestsPerDay whereas the
function only enforces per-minute limits; update the JSDoc to describe the
actual return shape (allowed: boolean, remaining: number, resetAt: number) and
either implement daily limiting or explicitly document TODO and leave a short
comment in the function noting only requestsPerMinute is enforced (and if you
implement daily limiting: track a separate daily key/window, increment and check
against requestsPerDay, compute remaining accordingly, and set resetAt to the
smaller of minute/day reset times).
| async function fetchPDFBuffer(url: string): Promise<Buffer> { | ||
| const response = await fetch(url); | ||
| if (!response.ok) { | ||
| throw new Error(`Failed to fetch PDF: ${response.statusText}`); | ||
| } | ||
| const arrayBuffer = await response.arrayBuffer(); | ||
| return Buffer.from(arrayBuffer); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-URL string inputs are treated as URLs
fetchPDFBuffer unconditionally fetches any string input. If callers ever pass a filesystem path or raw PDF text instead of a URL, this will fail in a non-obvious way.
You may want to:
- Either document that string inputs must be HTTP(S) URLs, or
- Detect
file:/relative paths and handle them differently (or reject them explicitly with a clearer error).
🤖 Prompt for AI Agents
In src/lib/ocr/ocr-service.ts around lines 462 to 469, fetchPDFBuffer currently
calls fetch on any string which treats non-HTTP(S) inputs (local file paths or
raw PDF text) as URLs and fails ambiguously; update the function to first
validate the input: if it is an HTTP or HTTPS URL (e.g., via new URL(input) and
checking protocol === 'http:' || 'https:'), use fetch as before; if it is a file
URL (protocol === 'file:') or a local/relative path, read the file from disk
with fs.promises.readFile and return that Buffer; otherwise reject with a clear,
specific Error stating that only HTTP(S) URLs or filesystem paths are accepted
(or add JSDoc to mandate HTTP(S) only and throw early for non-HTTP(S) inputs).
Ensure proper async/await and error messages include the original input for
easier debugging.
| async function pdfToImages(buffer: Buffer, pages?: number[]): Promise<Buffer[]> { | ||
| // Note: This requires a PDF-to-image library like pdf-poppler or pdf2pic | ||
| // For now, return empty array - would need to implement based on your needs | ||
|
|
||
| // Placeholder implementation | ||
| console.warn('pdfToImages not fully implemented - need pdf-to-image library'); | ||
| return []; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pdfToImages placeholder should fail fast instead of returning empty
Returning an empty array while logging a warning makes upstream code believe conversion succeeded (DeepSeek path in particular). This is more of a silent failure than a deliberate “not implemented”.
Better to throw an OCRError('NOT_IMPLEMENTED', ...) here until a real implementation is added, and gate selection of image-based backends accordingly.
🤖 Prompt for AI Agents
In src/lib/ocr/ocr-service.ts around lines 474 to 481, replace the placeholder
that returns an empty array with a fast-failing behavior: throw an OCRError with
code 'NOT_IMPLEMENTED' and a clear message indicating pdfToImages is not
implemented (include optional details like required library name), so upstream
callers cannot treat a silent empty result as success; ensure the thrown error
uses the existing OCRError type/import and that callers which select image-based
backends handle or propagate this error appropriately.
| export async function simpleExtract( | ||
| input: Buffer | string, | ||
| options?: { | ||
| preserveLayout?: boolean; | ||
| password?: string; | ||
| } | ||
| ): Promise<string> { | ||
| const result = await extractText({ | ||
| input, | ||
| options: { | ||
| preserveLayout: options?.preserveLayout, | ||
| password: options?.password, | ||
| }, | ||
| }); | ||
|
|
||
| return result.text; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simpleExtract ignores layout even if preserveLayout is set
simpleExtract forwards preserveLayout but then always returns result.text, discarding result.markdown even when layout-preserving extraction is used.
If the goal is truly “simple text only”, you could:
- Drop the
preserveLayoutoption from this helper, or - Return
result.markdown ?? result.textwhenpreserveLayoutis true.
Right now the option can be misleading for callers.
🤖 Prompt for AI Agents
In src/lib/ocr/ocr-service.ts around lines 506 to 522, the simpleExtract helper
currently forwards preserveLayout but always returns result.text, so callers
requesting layout-preserving output never receive result.markdown; update the
function so that when options?.preserveLayout is true it returns result.markdown
?? result.text, otherwise return result.text; ensure the function signature
keeps or documents the preserveLayout option and handle the case where markdown
is undefined by falling back to plain text.
| export async function extractWithLayout(input: Buffer | string): Promise<string> { | ||
| const result = await extractText({ | ||
| input, | ||
| options: { | ||
| preserveLayout: true, | ||
| method: 'llmwhisperer', | ||
| }, | ||
| }); | ||
|
|
||
| return result.markdown || result.text; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extractWithLayout hard-codes LLMWhisperer and doesn’t fall back
extractWithLayout always uses method: 'llmwhisperer'. If LLMWhisperer is misconfigured or quota is exhausted, this will throw instead of falling back to another layout-preserving option (e.g., auto-selection with preserveLayout: true).
Consider delegating method selection (let extractText decide based on quota and availability) and only pinning preserveLayout: true, unless you explicitly want “fail if LLMWhisperer is unavailable”.
🤖 Prompt for AI Agents
In src/lib/ocr/ocr-service.ts around lines 527 to 537, extractWithLayout
currently hard-codes method: 'llmwhisperer' which causes failures if that
provider is misconfigured or out of quota; remove the explicit method option and
only pass preserveLayout: true so extractText can auto-select or fallback as
designed (alternatively, wrap the call in a try/catch and retry without the
pinned method to provide a fallback), ensuring the function still returns
result.markdown || result.text.
| export async function extractForms( | ||
| input: Buffer | string | ||
| ): Promise<{ text: string; formFields: any[] }> { | ||
| const result = await extractText({ | ||
| input, | ||
| options: { | ||
| detectForms: true, | ||
| method: 'textract', | ||
| }, | ||
| }); | ||
|
|
||
| return { | ||
| text: result.text, | ||
| formFields: result.formFields || [], | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extractForms is hard-wired to Textract
extractForms forces method: 'textract', so:
- It bypasses LLMWhisperer’s
processing_mode = 'form'capabilities. - It will throw a config error outright when AWS isn’t configured, even if LLMWhisperer is available.
If you want best-effort forms extraction:
- Let
extractTextpick the method based ondetectForms: true, or - Implement a fallback chain: Textract first, then LLMWhisperer
formmode.
Also consider returning typed form field structures instead of any[] once the Textract helpers are implemented.
Summary by CodeRabbit
Documentation
New Features