Skip to content

Conversation

@sudo-tee
Copy link
Owner

  • update the reference parser to be more generic instead of relying on a system prompt
  • add edit/read files to the references

This should fix #194

- update the reference parser to be more generic instead of relying on a
  system prompt
- add edit/read files to the references

This should fix #194
@sudo-tee
Copy link
Owner Author

@aweis89 I refactored the references collection to not rely on a system prompt, and prefer parsing common file patterns.

Let me know if this works as you expect.

@sudo-tee sudo-tee force-pushed the refactor/references-picker-parser branch 3 times, most recently from 7c26716 to d4519b7 Compare January 22, 2026 18:06
- Update formatter to show relative paths when inside cwd, basename when outside
- Normalize file paths in replay test snapshots to be cwd-relative
- Update expected test data to use relative paths instead of absolute paths
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the code reference detection system to work generically without relying on system prompts, making it compatible with all LLM providers including those without system prompt support. It also adds support for detecting file references from tool operations (read/edit).

Changes:

  • Refactored reference parser to support multiple patterns (backtick-wrapped, file:// URIs, plain paths) with deduplication logic
  • Removed system prompt instructions from core.lua that were previously guiding LLMs to use file:// URIs
  • Added file path normalization in formatter to show relative paths within project
  • Enhanced test infrastructure with getcwd mocking and file path normalization utilities
  • Updated documentation to reflect new reference detection formats

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
lua/opencode/ui/reference_picker.lua Major refactoring: added multiple parsing strategies, deduplication, and tool-based reference extraction
lua/opencode/ui/formatter.lua Modified file path display logic to show relative paths for files within project
lua/opencode/core.lua Removed system prompt instructions for file:// URI format
AGENTS.md Added comprehensive documentation on supported file reference formats
tests/manual/renderer_replay.lua Added normalize_file_paths helper and simplified load_events to use raw file paths
tests/helpers.lua Added MOCK_CWD constant and mock_getcwd function for consistent test paths
tests/data/*.expected.json Updated test snapshots with normalized timestamps and field ordering

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@aweis89
Copy link
Contributor

aweis89 commented Jan 23, 2026

False Positive Concerns

Thanks for this refactor! Removing the system prompt dependency improves provider compatibility. However, I have concerns about false positives with the more permissive patterns:

Potential False Positives

1. Plain path pattern is very broad
The pattern ([%w_%./-]+%.[%w]+) will match many non-path strings:

  • URLs: github.com/user/repo.git, api.openai.com
  • Method calls in prose: Array.from, Object.keys, process.env
  • Dotted identifiers: com.example.App, org.apache.Log4j

2. Backtick pattern too greedy
Pattern `([^`]+%.[%w]+)` matches anything in backticks ending with a dot-extension:

  • config.timeout - matches if file config.timeout exists
  • process.env - .env looks like an extension
  • npm install package.json - extracts trailing package.json

3. Email addresses
user@example.com - the example.com portion could match.

Suggestion

Without the explicit file:// prefix, there may be significant overhead from checking file_exists() on many false candidates, plus residual false positives when those strings happen to match existing files.

Consider whether the added complexity and edge cases are worth it vs. keeping the file:// prefix requirement (perhaps just making it optional in the system prompt rather than removing it entirely). The file:// scheme provides unambiguous intent that a string is meant to be a file reference.

The tool part extraction (filePath from read/edit tools) is a nice addition regardless!

@sudo-tee
Copy link
Owner Author

sudo-tee commented Jan 23, 2026

False Positive Concerns

Thanks for this refactor! Removing the system prompt dependency improves provider compatibility. However, I have concerns about false positives with the more permissive patterns:

Potential False Positives

1. Plain path pattern is very broad The pattern ([%w_%./-]+%.[%w]+) will match many non-path strings:

  • URLs: github.com/user/repo.git, api.openai.com
  • Method calls in prose: Array.from, Object.keys, process.env
  • Dotted identifiers: com.example.App, org.apache.Log4j

2. Backtick pattern too greedy Pattern `([^`]+%.[%w]+)` matches anything in backticks ending with a dot-extension:

  • config.timeout - matches if file config.timeout exists
  • process.env - .env looks like an extension
  • npm install package.json - extracts trailing package.json

3. Email addresses user@example.com - the example.com portion could match.

Suggestion

Without the explicit file:// prefix, there may be significant overhead from checking file_exists() on many false candidates, plus residual false positives when those strings happen to match existing files.

Consider whether the added complexity and edge cases are worth it vs. keeping the file:// prefix requirement (perhaps just making it optional in the system prompt rather than removing it entirely). The file:// scheme provides unambiguous intent that a string is meant to be a file reference.

The tool part extraction (filePath from read/edit tools) is a nice addition regardless!

Thanks for the comments,

I understand your issue with false positives. and Will try to work on a better parsing solution. The file_exists is a fail safe on his own for most of the matches so I'm not to worry.

The issue I have with only the file:// is staht it's to restrictive and requires a system prompt, and even with the system prompt it misses a lot of paths.

I will also ad as part of this MR a config for specifying a system prompt

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sudo-tee and others added 3 commits January 23, 2026 08:48
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sudo-tee sudo-tee force-pushed the refactor/references-picker-parser branch from af8982c to ff8f103 Compare January 23, 2026 14:01
@sudo-tee
Copy link
Owner Author

@aweis89

I did add more checks around urls, and better patterns to avoid false positives.

I also added a system prompt config.

I will merge this MR as it is right now and we can always update this later on if we see perf issues.

I did some profiling on large outputs and it didn't impact the performance.

@sudo-tee sudo-tee merged commit 6940ea6 into main Jan 23, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Confusing system prompt

3 participants