Add Echo Rule watermark analysis for LLM learning article#8
Conversation
johnzfitch
commented
Nov 22, 2025
- Add manual_analysis.py script for watermark detection when spaCy model unavailable
- Include sample article text (LLM learning research) for analysis
- Generate detailed JSON analysis output with 46 clause pairs analyzed
- Result: LIKELY_HUMAN verdict with 0.209 final score (below 0.45 threshold)
- Add manual_analysis.py script for watermark detection when spaCy model unavailable - Include sample article text (LLM learning research) for analysis - Generate detailed JSON analysis output with 46 clause pairs analyzed - Result: LIKELY_HUMAN verdict with 0.209 final score (below 0.45 threshold)
There was a problem hiding this comment.
Pull request overview
This PR adds a manual Echo Rule watermark analysis capability for detecting AI-generated text when the full spaCy NLP model is unavailable. The implementation analyzes phonetic, structural, and semantic "echoes" at clause boundaries to determine if text exhibits watermark patterns characteristic of LLM-generated content.
- Implements a standalone watermark detection script with fallback dependencies (cmudict, Levenshtein)
- Includes sample analysis of an LLM learning research article with 46 clause pairs analyzed
- Provides detailed JSON output showing LIKELY_HUMAN verdict (0.209 score below 0.45 threshold)
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| scripts/manual_analysis.py | Core analysis script implementing phonetic, structural, and semantic echo detection with pattern matching algorithms |
| data/analysis_output.json | Generated analysis results with detailed scoring for 46 clause pairs and final classification verdict |
| data/analysis_input.txt | Sample input text (research article on LLM learning) used for watermark analysis demonstration |
Comments suppressed due to low confidence (1)
scripts/manual_analysis.py:39
- Import of 'np' is not used.
import numpy as np
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| input_file = Path('/home/user/specHO/data/analysis_input.txt') | ||
|
|
||
| if not input_file.exists(): | ||
| print("Error: Input file not found") | ||
| sys.exit(1) | ||
|
|
||
| text = input_file.read_text() | ||
| print(f"Analyzing text ({len(text):,} characters)...") | ||
|
|
||
| # Run analysis | ||
| report = analyze_text(text) | ||
|
|
||
| # Print report | ||
| print_report(report, verbose=True) | ||
|
|
||
| # Save JSON report | ||
| output_file = Path('/home/user/specHO/data/analysis_output.json') |
There was a problem hiding this comment.
Hardcoded absolute paths make the script non-portable. Consider using relative paths or command-line arguments to allow users to specify input/output file locations.
Example fix:
import argparse
def main():
parser = argparse.ArgumentParser(description='Manual Echo Rule Watermark Analysis')
parser.add_argument('--input', default='data/analysis_input.txt', help='Input text file')
parser.add_argument('--output', default='data/analysis_output.json', help='Output JSON file')
args = parser.parse_args()
input_file = Path(args.input)
output_file = Path(args.output)
# ...| try: | ||
| import numpy as np | ||
| HAS_NUMPY = True | ||
| except ImportError: | ||
| HAS_NUMPY = False |
There was a problem hiding this comment.
The numpy import is unused throughout the script. The HAS_NUMPY flag is set but never checked or used in any conditional logic. Consider removing this unused import to reduce dependencies.
| try: | |
| import numpy as np | |
| HAS_NUMPY = True | |
| except ImportError: | |
| HAS_NUMPY = False |
| from pathlib import Path | ||
| from typing import List, Dict, Tuple, Optional | ||
| from dataclasses import dataclass, field | ||
| from collections import Counter |
There was a problem hiding this comment.
The Counter import from collections is unused in the script. Consider removing this import.
| from collections import Counter |
| for sentence in sentences: | ||
| # Rule A: Punctuation-linked clauses (semicolon, em-dash, colon) | ||
| for sep in [';', ' – ', ': ', ' — ']: | ||
| if sep in sentence: | ||
| parts = sentence.split(sep) | ||
| for i in range(len(parts) - 1): | ||
| if len(parts[i].strip()) > 10 and len(parts[i+1].strip()) > 10: | ||
| pairs.append(create_clause_pair(parts[i], parts[i+1], sep)) | ||
|
|
||
| # Rule B: Conjunction-linked clauses | ||
| conj_pattern = r',?\s*(but|and|or|yet)\s+' | ||
| matches = list(re.finditer(conj_pattern, sentence, re.IGNORECASE)) | ||
| for match in matches: | ||
| before = sentence[:match.start()] | ||
| after = sentence[match.end():] | ||
| if len(before.strip()) > 10 and len(after.strip()) > 10: | ||
| pairs.append(create_clause_pair(before, after, match.group(1))) |
There was a problem hiding this comment.
The clause splitting logic can create duplicate pairs when a sentence contains multiple separators. For example, if a sentence has both a semicolon and an "and" conjunction, it may generate overlapping clause pairs. Consider deduplicating pairs or tracking already-processed text spans to avoid analyzing the same text multiple times.
| with open(output_file, 'w') as f: | ||
| json.dump(report_dict, f, indent=2, default=str) | ||
|
|
||
| print(f"\nJSON report saved to: {output_file}") | ||
|
|
There was a problem hiding this comment.
Missing error handling for file I/O operations. If the output directory doesn't exist or if there are permission issues, the script will crash with an unhandled exception. Consider adding proper error handling:
try:
output_file.parent.mkdir(parents=True, exist_ok=True)
with open(output_file, 'w') as f:
json.dump(report_dict, f, indent=2, default=str)
print(f"\nJSON report saved to: {output_file}")
except (IOError, PermissionError) as e:
print(f"Error saving output file: {e}", file=sys.stderr)
sys.exit(1)| with open(output_file, 'w') as f: | |
| json.dump(report_dict, f, indent=2, default=str) | |
| print(f"\nJSON report saved to: {output_file}") | |
| try: | |
| output_file.parent.mkdir(parents=True, exist_ok=True) | |
| with open(output_file, 'w') as f: | |
| json.dump(report_dict, f, indent=2, default=str) | |
| print(f"\nJSON report saved to: {output_file}") | |
| except (IOError, PermissionError) as e: | |
| print(f"Error saving output file: {e}", file=sys.stderr) | |
| sys.exit(1) |