Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Dec 5, 2025

This is an automated pull request to release the candidate branch into production, which will trigger a deployment.
It was created by the [Production PR] action.

@comp-ai-code-review
Copy link

comp-ai-code-review bot commented Dec 5, 2025

🔒 Comp AI - Security Review

🔴 Risk Level: HIGH

2 high CVEs in xlsx (Prototype Pollution, ReDoS) and 1 low CVE in ai; .env.example exposes sensitive var names and http endpoints; code accepts unsanitized uploads and user content (Buffer.from, filenames, LLM prompts).


📦 Dependency Vulnerabilities

🟠 NPM Packages (HIGH)

Risk Score: 8/10 | Summary: 2 high, 1 low CVEs found

Package Version CVE Severity CVSS Summary Fixed In
xlsx 0.18.5 GHSA-4r6h-8v6p-xvw6 HIGH N/A Prototype Pollution in sheetJS No fix yet
xlsx 0.18.5 GHSA-5pgg-2g8v-p4x9 HIGH N/A SheetJS Regular Expression Denial of Service (ReDoS) No fix yet
ai 5.0.0 GHSA-rwvc-j5jr-mgvh LOW N/A Vercel’s AI SDK's filetype whitelists can be bypassed when uploading files 5.0.52

🛡️ Code Security Analysis

View 4 file(s) with issues

🟡 apps/api/.env.example (MEDIUM Risk)

# Issue Risk Level
1 Multiple sensitive env variable names (AWS, OPENAI, DB, UPSTASH, TRIGGER) exposed MEDIUM
2 Plaintext env file risk if committed to repo or shared MEDIUM
3 Insecure HTTP endpoints for auth (BASE_URL, BETTER_AUTH_URL use http) MEDIUM
4 Empty critical vars (DATABASE_URL, API keys) may trigger insecure defaults MEDIUM

Recommendations:

  1. Remove runtime .env from version control and add it to .gitignore. Keep only a template (.env.example) with no real secrets and non-sensitive placeholder values.
  2. Rotate any credentials that may have been committed historically and run a secret-scan across the repo and its history (git-secrets, truffleHog, GitHub secret scanning).
  3. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault, or equivalent) or environment-specific secure storage for production credentials rather than plaintext files.
  4. Enforce HTTPS/TLS for production endpoints. Ensure BASE_URL and BETTER_AUTH_URL are configured per-environment (localhost http only for dev, https required for staging/production).
  5. Validate and fail-fast on startup if required environment variables are missing or empty (no insecure defaults). Explicitly document required vars and their acceptable formats.
  6. Remove or neutralize any hardcoded keys from repository files and replace with references to secure config. If credentials were leaked, rotate them immediately and revoke old tokens/keys.
  7. Ensure CI/CD and deployment pipelines inject secrets at deploy-time and do not write secrets into logs or artifacts.

🟡 apps/api/src/questionnaire/questionnaire.service.ts (MEDIUM Risk)

# Issue Risk Level
1 No file size/type validation on uploads MEDIUM
2 Buffer.from(dto.fileData) used without base64 validation or try/catch MEDIUM
3 User-controlled fileName/vendorName used in exports without sanitization MEDIUM
4 Raw file content or metadata logged via content/storage loggers MEDIUM
5 DTO numeric fields (questionIndex,totalQuestions) lack validation MEDIUM
6 Uploaded files persisted without malware scanning or content checks MEDIUM

Recommendations:

  1. Validate file type and size before processing or uploading
  2. Validate base64 and wrap Buffer.from in try/catch
  3. Sanitize/escape filenames and vendorName before export or display
  4. Redact/truncate file contents in logs; avoid logging raw data
  5. Validate DTO fields and add auth, rate-limiting, and malware scan

🔴 apps/api/src/questionnaire/utils/content-extractor.ts (HIGH Risk)

# Issue Risk Level
1 No file size limit before base64 decoding (memory/DoS risk) HIGH
2 No MIME/magic-bytes validation for uploaded files HIGH
3 Unvalidated Excel ZIP parsing may allow zip-bomb or malformed ZIPs HIGH
4 Parallel chunked AI calls lack rate limiting (resource exhaustion) HIGH
5 User file content sent to external AI services without sanitization HIGH
6 No limits on extracted text length before AI parsing (large input) HIGH
7 No validation of CSV/text encoding or input charset HIGH

Recommendations:

  1. Enforce a maximum upload size and validate it before base64 decoding. For example, check the base64 string length or the Content-Length header and reject/trim requests above an allowed limit. Use streaming decoding for large files instead of decoding the entire file into memory.
  2. Validate file contents with magic-bytes (file signature) in addition to the provided MIME type (fileType). Reject files that don't match expected signatures for declared types.
  3. Harden ZIP handling: inspect ZIP central directory entries, cap the number of entries, cap individual entry uncompressed sizes, and detect high compression ratios (potential zip bomb). Prefer using streaming unzip parsers that enforce size limits and timeouts.
  4. Limit concurrency when processing chunks (use a worker pool / p-limit) instead of Promise.all on arbitrary chunk counts. Add request-level rate limits and circuit breakers for external AI calls, and enforce per-request and per-model timeouts.
  5. Redact or classify sensitive data (PII, credentials, financial data) before sending content to external AI providers. Provide an opt-out or consent flow if PII may be sent. Log only metadata and remove sensitive fields from the prompt.
  6. Enforce absolute caps on total text length sent to AI (not just chunk size). For large documents, sample, summarize, or require user confirmation before processing. Reject or queue overly large inputs for offline/manual processing.
  7. Detect and validate text encodings for CSV/text (e.g., BOM detection, chardet), and normalize to UTF-8. Handle malformed encodings gracefully rather than assuming UTF-8 everywhere.
  8. Additional defensive measures: add input timeouts, memory-usage monitoring, safe parsing libraries where possible, unit tests for malformed/large inputs, and monitoring/alerts for unusually large or frequent processing requests.

🟡 apps/api/src/questionnaire/utils/question-parser.ts (MEDIUM Risk)

# Issue Risk Level
1 Unvalidated user content sent directly to LLM prompt MEDIUM
2 buildQuestionAwareChunks ignores maxQuestionsPerChunk option MEDIUM
3 Processing all chunks with Promise.all can exhaust resources (DoS) MEDIUM
4 No rate limiting or retry/backoff for LLM API calls MEDIUM
5 console.error may leak error details to logs MEDIUM

Recommendations:

  1. Sanitize and validate content before embedding it into LLM prompts. Apply size limits and strip/escape any control sequences or prompt-injection patterns you consider harmful. Consider treating the incoming content as untrusted and avoid trusting delimiters inside it.
  2. Honor the options passed into buildQuestionAwareChunks: enforce maxQuestionsPerChunk, maxChunkChars and minChunkChars. Also enforce an absolute cap on total chunks/questions returned to avoid unbounded work.
  3. Replace Promise.all over an unbounded array of chunks with controlled concurrency (e.g., p-limit, Bottleneck, or an async pool) to limit simultaneous LLM calls and memory/CPU usage. Consider batching or sequential fallback for very large inputs.
  4. Add retry with exponential backoff and jitter for transient LLM API errors, and implement rate limiting (per-tenant or global) to avoid throttling/DoS. Respect provider rate limits and consider circuit-breaking on repeated failures.
  5. Avoid printing raw error objects to stdout/stderr. Use structured logging that redacts sensitive details (e.g., input content, stack traces, API keys). Log minimal contextual info (chunk index, failure reason category) and send detailed errors to secured internal error monitoring only.
  6. Enforce overall input size limits early (reject or truncate extremely large uploads) and validate the provenance of content where applicable (e.g., limit uploads per user, require authentication).
  7. Consider adding content provenance/ownership checks and monitoring/alerts for sudden spikes in chunk counts or LLM errors to detect abuse.

💡 Recommendations

View 3 recommendation(s)
  1. Upgrade/patch the vulnerable packages reported by the OSV scan: update xlsx to a patched release that addresses GHSA-4r6h-8v6p-xvw6 and GHSA-5pgg-2g8v-p4x9, and upgrade ai to >=5.0.52 which fixes GHSA-rwvc-j5jr-mgvh.
  2. Sanitize/remove secrets in tracked env files: replace any real/placeholder secret values in apps/api/.env.example with non-sensitive placeholders and change BASE_URL/BETTER_AUTH_URL to https:// style placeholders in the example so no credentials or insecure HTTP endpoints are present in committed files.
  3. Fix upload/parsing and injection code paths: validate file type/size and magic-bytes before decoding; validate base64 and wrap Buffer.from(dto.fileData) in try/catch; whitelist/sanitize filenames and vendorName prior to exporting or displaying; and sanitize or redact user-provided content before embedding it into LLM prompts.

Powered by Comp AI - AI that handles compliance for you. Reviewed Dec 5, 2025

@vercel
Copy link

vercel bot commented Dec 5, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
app (staging) Ready Ready Preview Comment Dec 5, 2025 1:42am
portal (staging) Error Error Dec 5, 2025 1:42am


for (const si of siMatches) {
// Extract text from both <t>...</t> and <d:t>...</d:t> (or any namespace:t)
const textMatches = si.match(/<[^>]*:?t[^>]*>([^<]*)<\/[^>]*:?t>/g) || [];

Check failure

Code scanning / CodeQL

Bad HTML filtering regexp High

This regular expression does not match upper case <SCRIPT> tags.

Copilot Autofix

AI about 2 months ago

To fix the issue without changing existing functionality:

  • Update the regular expression at line 538 to use the case-insensitive i flag, so that it matches any casing of t (e.g., <t>, <T>, <d:t>, <D:T>).
  • In JavaScript's String.match() function, this means changing the regexp from /.../g to /.../gi.
  • Only this line (538) needs to be changed. No imports or other file edits are required.
Suggested changeset 1
apps/api/src/questionnaire/utils/content-extractor.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/apps/api/src/questionnaire/utils/content-extractor.ts b/apps/api/src/questionnaire/utils/content-extractor.ts
--- a/apps/api/src/questionnaire/utils/content-extractor.ts
+++ b/apps/api/src/questionnaire/utils/content-extractor.ts
@@ -535,7 +535,7 @@
     
     for (const si of siMatches) {
       // Extract text from both <t>...</t> and <d:t>...</d:t> (or any namespace:t)
-      const textMatches = si.match(/<[^>]*:?t[^>]*>([^<]*)<\/[^>]*:?t>/g) || [];
+      const textMatches = si.match(/<[^>]*:?t[^>]*>([^<]*)<\/[^>]*:?t>/gi) || [];
       
       let fullText = '';
       for (const match of textMatches) {
EOF
@@ -535,7 +535,7 @@

for (const si of siMatches) {
// Extract text from both <t>...</t> and <d:t>...</d:t> (or any namespace:t)
const textMatches = si.match(/<[^>]*:?t[^>]*>([^<]*)<\/[^>]*:?t>/g) || [];
const textMatches = si.match(/<[^>]*:?t[^>]*>([^<]*)<\/[^>]*:?t>/gi) || [];

let fullText = '';
for (const match of textMatches) {
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
let fullText = '';
for (const match of textMatches) {
// Extract just the text content between tags
const textContent = match.replace(/<[^>]*>/g, '');

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High

This string may still contain
<script
, which may cause an HTML element injection vulnerability.

Copilot Autofix

AI about 2 months ago

To fully sanitize the extracted string content, especially from possibly hostile XML input, we should ensure all tags are removed, including scenarios where fragments can mask their presence. The best approach in this context is to repeatedly apply the tag-removal until no further tags remain, as described in the CodeQL recommendation. Alternatively, we could use a library to properly handle XML/HTML decoding; however, since the content seems to be simple text within <t> tags, a repeated .replace() loop is effective and non-intrusive.

Required changes:

  • Update the sanitization at line 543 in apps/api/src/questionnaire/utils/content-extractor.ts to repeatedly remove all tags until none remain.
  • No new imports are necessary unless opting for a third-party library. In this fix, use the repeated .replace() approach.
  • The only change is in the block inside the extractSharedStrings function where the text content is extracted by stripping tags, i.e., replace line 543 and possibly update the logic for full clarity.

Suggested changeset 1
apps/api/src/questionnaire/utils/content-extractor.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/apps/api/src/questionnaire/utils/content-extractor.ts b/apps/api/src/questionnaire/utils/content-extractor.ts
--- a/apps/api/src/questionnaire/utils/content-extractor.ts
+++ b/apps/api/src/questionnaire/utils/content-extractor.ts
@@ -539,8 +539,13 @@
       
       let fullText = '';
       for (const match of textMatches) {
-        // Extract just the text content between tags
-        const textContent = match.replace(/<[^>]*>/g, '');
+        // Extract just the text content between tags, safely remove all tags by repeating replace
+        let textContent = match;
+        let previous;
+        do {
+          previous = textContent;
+          textContent = textContent.replace(/<[^>]*>/g, '');
+        } while (textContent !== previous);
         fullText += textContent;
       }
       
EOF
@@ -539,8 +539,13 @@

let fullText = '';
for (const match of textMatches) {
// Extract just the text content between tags
const textContent = match.replace(/<[^>]*>/g, '');
// Extract just the text content between tags, safely remove all tags by repeating replace
let textContent = match;
let previous;
do {
previous = textContent;
textContent = textContent.replace(/<[^>]*>/g, '');
} while (textContent !== previous);
fullText += textContent;
}

Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
* chore(bun.lock): update package versions and lockfile configuration

* chore(bun.lock): update @jridgewell/trace-mapping to version 0.3.31

---------

Co-authored-by: Tofik Hasanov <annexcies@gmail.com>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Marfuen
❌ github-actions[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

#1861)

* chore(api): update @ai-sdk/anthropic to version 2.0.53 and adjust tsconfig

* chore(bun.lock): update package versions and lockfile configuration
* chore(api): update @ai-sdk/anthropic to version 2.0.53 and adjust tsconfig

* chore(bun.lock): update package versions and lockfile configuration

* chore(bun.lock): update @jridgewell/trace-mapping to version 0.3.31

* chore(bun.lock): update @jridgewell/trace-mapping to version 0.3.31

* chore(api): add adm-zip dependency to package.json
@Marfuen Marfuen merged commit 1830a8c into release Dec 5, 2025
10 of 13 checks passed
@claudfuen
Copy link
Contributor

🎉 This PR is included in version 1.67.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants