[comp] Production Deploy #1859

github-actions · 2025-12-05T00:55:15Z

This is an automated pull request to release the candidate branch into production, which will trigger a deployment.
It was created by the [Production PR] action.

…#1858) Co-authored-by: Tofik Hasanov <annexcies@gmail.com>

comp-ai-code-review · 2025-12-05T00:55:19Z

🔒 Comp AI - Security Review

🔴 Risk Level: HIGH

2 high CVEs in xlsx (Prototype Pollution, ReDoS) and 1 low CVE in ai; .env.example exposes sensitive var names and http endpoints; code accepts unsanitized uploads and user content (Buffer.from, filenames, LLM prompts).

📦 Dependency Vulnerabilities

🟠 NPM Packages (HIGH)

Risk Score: 8/10 | Summary: 2 high, 1 low CVEs found

Package	Version	CVE	Severity	CVSS	Summary	Fixed In
xlsx	0.18.5	GHSA-4r6h-8v6p-xvw6	HIGH	N/A	Prototype Pollution in sheetJS	No fix yet
xlsx	0.18.5	GHSA-5pgg-2g8v-p4x9	HIGH	N/A	SheetJS Regular Expression Denial of Service (ReDoS)	No fix yet
ai	5.0.0	GHSA-rwvc-j5jr-mgvh	LOW	N/A	Vercel’s AI SDK's filetype whitelists can be bypassed when uploading files	5.0.52

🛡️ Code Security Analysis

View 4 file(s) with issues

🟡 apps/api/.env.example (MEDIUM Risk)

#	Issue	Risk Level
1	Multiple sensitive env variable names (AWS, OPENAI, DB, UPSTASH, TRIGGER) exposed	MEDIUM
2	Plaintext env file risk if committed to repo or shared	MEDIUM
3	Insecure HTTP endpoints for auth (BASE_URL, BETTER_AUTH_URL use http)	MEDIUM
4	Empty critical vars (DATABASE_URL, API keys) may trigger insecure defaults	MEDIUM

Recommendations:

Remove runtime .env from version control and add it to .gitignore. Keep only a template (.env.example) with no real secrets and non-sensitive placeholder values.
Rotate any credentials that may have been committed historically and run a secret-scan across the repo and its history (git-secrets, truffleHog, GitHub secret scanning).
Use a secrets manager (AWS Secrets Manager, HashiCorp Vault, or equivalent) or environment-specific secure storage for production credentials rather than plaintext files.
Enforce HTTPS/TLS for production endpoints. Ensure BASE_URL and BETTER_AUTH_URL are configured per-environment (localhost http only for dev, https required for staging/production).
Validate and fail-fast on startup if required environment variables are missing or empty (no insecure defaults). Explicitly document required vars and their acceptable formats.
Remove or neutralize any hardcoded keys from repository files and replace with references to secure config. If credentials were leaked, rotate them immediately and revoke old tokens/keys.
Ensure CI/CD and deployment pipelines inject secrets at deploy-time and do not write secrets into logs or artifacts.

🟡 apps/api/src/questionnaire/questionnaire.service.ts (MEDIUM Risk)

#	Issue	Risk Level
1	No file size/type validation on uploads	MEDIUM
2	Buffer.from(dto.fileData) used without base64 validation or try/catch	MEDIUM
3	User-controlled fileName/vendorName used in exports without sanitization	MEDIUM
4	Raw file content or metadata logged via content/storage loggers	MEDIUM
5	DTO numeric fields (questionIndex,totalQuestions) lack validation	MEDIUM
6	Uploaded files persisted without malware scanning or content checks	MEDIUM

Recommendations:

Validate file type and size before processing or uploading
Validate base64 and wrap Buffer.from in try/catch
Sanitize/escape filenames and vendorName before export or display
Redact/truncate file contents in logs; avoid logging raw data
Validate DTO fields and add auth, rate-limiting, and malware scan

🔴 apps/api/src/questionnaire/utils/content-extractor.ts (HIGH Risk)

#	Issue	Risk Level
1	No file size limit before base64 decoding (memory/DoS risk)	HIGH
2	No MIME/magic-bytes validation for uploaded files	HIGH
3	Unvalidated Excel ZIP parsing may allow zip-bomb or malformed ZIPs	HIGH
4	Parallel chunked AI calls lack rate limiting (resource exhaustion)	HIGH
5	User file content sent to external AI services without sanitization	HIGH
6	No limits on extracted text length before AI parsing (large input)	HIGH
7	No validation of CSV/text encoding or input charset	HIGH

Recommendations:

Enforce a maximum upload size and validate it before base64 decoding. For example, check the base64 string length or the Content-Length header and reject/trim requests above an allowed limit. Use streaming decoding for large files instead of decoding the entire file into memory.
Validate file contents with magic-bytes (file signature) in addition to the provided MIME type (fileType). Reject files that don't match expected signatures for declared types.
Harden ZIP handling: inspect ZIP central directory entries, cap the number of entries, cap individual entry uncompressed sizes, and detect high compression ratios (potential zip bomb). Prefer using streaming unzip parsers that enforce size limits and timeouts.
Limit concurrency when processing chunks (use a worker pool / p-limit) instead of Promise.all on arbitrary chunk counts. Add request-level rate limits and circuit breakers for external AI calls, and enforce per-request and per-model timeouts.
Redact or classify sensitive data (PII, credentials, financial data) before sending content to external AI providers. Provide an opt-out or consent flow if PII may be sent. Log only metadata and remove sensitive fields from the prompt.
Enforce absolute caps on total text length sent to AI (not just chunk size). For large documents, sample, summarize, or require user confirmation before processing. Reject or queue overly large inputs for offline/manual processing.
Detect and validate text encodings for CSV/text (e.g., BOM detection, chardet), and normalize to UTF-8. Handle malformed encodings gracefully rather than assuming UTF-8 everywhere.
Additional defensive measures: add input timeouts, memory-usage monitoring, safe parsing libraries where possible, unit tests for malformed/large inputs, and monitoring/alerts for unusually large or frequent processing requests.

🟡 apps/api/src/questionnaire/utils/question-parser.ts (MEDIUM Risk)

#	Issue	Risk Level
1	Unvalidated user content sent directly to LLM prompt	MEDIUM
2	buildQuestionAwareChunks ignores maxQuestionsPerChunk option	MEDIUM
3	Processing all chunks with Promise.all can exhaust resources (DoS)	MEDIUM
4	No rate limiting or retry/backoff for LLM API calls	MEDIUM
5	console.error may leak error details to logs	MEDIUM

Recommendations:

Sanitize and validate content before embedding it into LLM prompts. Apply size limits and strip/escape any control sequences or prompt-injection patterns you consider harmful. Consider treating the incoming content as untrusted and avoid trusting delimiters inside it.
Honor the options passed into buildQuestionAwareChunks: enforce maxQuestionsPerChunk, maxChunkChars and minChunkChars. Also enforce an absolute cap on total chunks/questions returned to avoid unbounded work.
Replace Promise.all over an unbounded array of chunks with controlled concurrency (e.g., p-limit, Bottleneck, or an async pool) to limit simultaneous LLM calls and memory/CPU usage. Consider batching or sequential fallback for very large inputs.
Add retry with exponential backoff and jitter for transient LLM API errors, and implement rate limiting (per-tenant or global) to avoid throttling/DoS. Respect provider rate limits and consider circuit-breaking on repeated failures.
Avoid printing raw error objects to stdout/stderr. Use structured logging that redacts sensitive details (e.g., input content, stack traces, API keys). Log minimal contextual info (chunk index, failure reason category) and send detailed errors to secured internal error monitoring only.
Enforce overall input size limits early (reject or truncate extremely large uploads) and validate the provenance of content where applicable (e.g., limit uploads per user, require authentication).
Consider adding content provenance/ownership checks and monitoring/alerts for sudden spikes in chunk counts or LLM errors to detect abuse.

💡 Recommendations

View 3 recommendation(s)

Upgrade/patch the vulnerable packages reported by the OSV scan: update xlsx to a patched release that addresses GHSA-4r6h-8v6p-xvw6 and GHSA-5pgg-2g8v-p4x9, and upgrade ai to >=5.0.52 which fixes GHSA-rwvc-j5jr-mgvh.
Sanitize/remove secrets in tracked env files: replace any real/placeholder secret values in apps/api/.env.example with non-sensitive placeholders and change BASE_URL/BETTER_AUTH_URL to https:// style placeholders in the example so no credentials or insecure HTTP endpoints are present in committed files.
Fix upload/parsing and injection code paths: validate file type/size and magic-bytes before decoding; validate base64 and wrap Buffer.from(dto.fileData) in try/catch; whitelist/sanitize filenames and vendorName prior to exporting or displaying; and sanitize or redact user-provided content before embedding it into LLM prompts.

Powered by Comp AI - AI that handles compliance for you. Reviewed Dec 5, 2025

vercel · 2025-12-05T00:55:20Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
app (staging)	Ready	Preview	Comment	Dec 5, 2025 1:42am
portal (staging)	Error			Dec 5, 2025 1:42am

apps/api/src/questionnaire/utils/content-extractor.ts

+
+    for (const si of siMatches) {
+      // Extract text from both <t>...</t> and <d:t>...</d:t> (or any namespace:t)
+      const textMatches = si.match(/<[^>]*:?t[^>]*>([^<]*)<\/[^>]*:?t>/g) || [];


To fix the issue without changing existing functionality:

Update the regular expression at line 538 to use the case-insensitive i flag, so that it matches any casing of t (e.g., <t>, <T>, <d:t>, <D:T>).

In JavaScript's String.match() function, this means changing the regexp from /.../g to /.../gi.

Only this line (538) needs to be changed. No imports or other file edits are required.

apps/api/src/questionnaire/utils/content-extractor.ts

+      let fullText = '';
+      for (const match of textMatches) {
+        // Extract just the text content between tags
+        const textContent = match.replace(/<[^>]*>/g, '');


To fully sanitize the extracted string content, especially from possibly hostile XML input, we should ensure all tags are removed, including scenarios where fragments can mask their presence. The best approach in this context is to repeatedly apply the tag-removal until no further tags remain, as described in the CodeQL recommendation. Alternatively, we could use a library to properly handle XML/HTML decoding; however, since the content seems to be simple text within <t> tags, a repeated .replace() loop is effective and non-intrusive.

Required changes:

Update the sanitization at line 543 in apps/api/src/questionnaire/utils/content-extractor.ts to repeatedly remove all tags until none remain.

No new imports are necessary unless opting for a third-party library. In this fix, use the repeated .replace() approach.

The only change is in the block inside the extractSharedStrings function where the text content is extracted by stripping tags, i.e., replace line 543 and possibly update the logic for full clarity.

* chore(bun.lock): update package versions and lockfile configuration * chore(bun.lock): update @jridgewell/trace-mapping to version 0.3.31 --------- Co-authored-by: Tofik Hasanov <annexcies@gmail.com>

CLAassistant · 2025-12-05T01:25:30Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Marfuen
❌ github-actions[bot]
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

#1861) * chore(api): update @ai-sdk/anthropic to version 2.0.53 and adjust tsconfig * chore(bun.lock): update package versions and lockfile configuration

* chore(api): update @ai-sdk/anthropic to version 2.0.53 and adjust tsconfig * chore(bun.lock): update package versions and lockfile configuration * chore(bun.lock): update @jridgewell/trace-mapping to version 0.3.31 * chore(bun.lock): update @jridgewell/trace-mapping to version 0.3.31 * chore(api): add adm-zip dependency to package.json

claudfuen · 2025-12-05T01:56:37Z

🎉 This PR is included in version 1.67.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

feat(api): add AI-powered question extraction and update dependencies (…

33600d0

…#1858) Co-authored-by: Tofik Hasanov <annexcies@gmail.com>

github-actions bot added prod-deploy automated-pr labels Dec 5, 2025

github-advanced-security bot found potential problems Dec 5, 2025

View reviewed changes

vercel bot deployed to staging – portal December 5, 2025 00:56 View deployment

vercel bot deployed to staging – app December 5, 2025 00:57 View deployment

[dev] [tofikwest] tofik/fix-bun (#1860)

99fad0d

* chore(bun.lock): update package versions and lockfile configuration * chore(bun.lock): update @jridgewell/trace-mapping to version 0.3.31 --------- Co-authored-by: Tofik Hasanov <annexcies@gmail.com>

vercel bot had a problem deploying to staging – portal December 5, 2025 01:27 Failure

vercel bot deployed to staging – app December 5, 2025 01:28 View deployment

chore(api): update @ai-sdk/anthropic to version 2.0.53 and adjust tsc… (

a66adbc

#1861) * chore(api): update @ai-sdk/anthropic to version 2.0.53 and adjust tsconfig * chore(bun.lock): update package versions and lockfile configuration

vercel bot had a problem deploying to staging – portal December 5, 2025 01:32 Failure

vercel bot deployed to staging – app December 5, 2025 01:33 View deployment

vercel bot had a problem deploying to staging – portal December 5, 2025 01:41 Failure

vercel bot deployed to staging – app December 5, 2025 01:42 View deployment

Marfuen merged commit 1830a8c into release Dec 5, 2025
10 of 13 checks passed

claudfuen added the released label Dec 5, 2025

@@ -539,8 +539,13 @@
                   let fullText = '';
                   for (const match of textMatches) {
-                    // Extract just the text content between tags
-                    const textContent = match.replace(/<[^>]*>/g, '');
+                    // Extract just the text content between tags, safely remove all tags by repeating replace
+                    let textContent = match;
+                    let previous;
+                    do {
+                      previous = textContent;
+                      textContent = textContent.replace(/<[^>]*>/g, '');
+                    } while (textContent !== previous);
                     fullText += textContent;
                   }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[comp] Production Deploy #1859

[comp] Production Deploy #1859

github-actions bot commented Dec 5, 2025

Uh oh!

comp-ai-code-review bot commented Dec 5, 2025 •

edited

Loading

🟡 apps/api/.env.example (MEDIUM Risk)

🟡 apps/api/src/questionnaire/questionnaire.service.ts (MEDIUM Risk)

🔴 apps/api/src/questionnaire/utils/content-extractor.ts (HIGH Risk)

🟡 apps/api/src/questionnaire/utils/question-parser.ts (MEDIUM Risk)

Uh oh!

vercel bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

Check failure

Copilot Autofix

Check failure

Copilot Autofix

CLAassistant commented Dec 5, 2025

Uh oh!

Uh oh!

claudfuen commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[comp] Production Deploy #1859

[comp] Production Deploy #1859

Conversation

github-actions bot commented Dec 5, 2025

Uh oh!

comp-ai-code-review bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Comp AI - Security Review

🔴 Risk Level: HIGH

📦 Dependency Vulnerabilities

🟠 NPM Packages (HIGH)

🛡️ Code Security Analysis

🟡 apps/api/.env.example (MEDIUM Risk)

🟡 apps/api/src/questionnaire/questionnaire.service.ts (MEDIUM Risk)

🔴 apps/api/src/questionnaire/utils/content-extractor.ts (HIGH Risk)

🟡 apps/api/src/questionnaire/utils/question-parser.ts (MEDIUM Risk)

💡 Recommendations

Uh oh!

vercel bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Check failure

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

CLAassistant commented Dec 5, 2025

Uh oh!

Uh oh!

claudfuen commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

comp-ai-code-review bot commented Dec 5, 2025 •

edited

Loading

vercel bot commented Dec 5, 2025 •

edited

Loading