-
Notifications
You must be signed in to change notification settings - Fork 230
feat(reporting): add passive voice detection to TipTap editor #796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(reporting): add passive voice detection to TipTap editor #796
Conversation
Implement server-side passive voice detection using spaCy NLP library with visual highlighting in the TipTap rich text editor. Signed-off-by: marc fuller <gogita99@gmail.com>
Signed-off-by: marc fuller <gogita99@gmail.com>
Signed-off-by: marc fuller <gogita99@gmail.com>
6547886 to
c4c4f67
Compare
Signed-off-by: marc fuller <gogita99@gmail.com>
dfd0385 to
ac0545b
Compare
ColonelThirtyTwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about adding in an entire ML model to the server, both from a performance angle and in regards to how much this will be used. Obviously there's push from SpecterOps to reduce passive voice in reports but I'm not sure how much other GW users care about it.
During Robby's Roundup, there appears to be many who would care about. In regards to the ML model, the model itself is small (50mb). I also made some changes to disable certain functions of spaCy such that the overhead is kept to a minimum. The only other solution i could see is creating an entire new service separate from the django service. That way the performance of the django service would be less of an issue. |
Signed-off-by: marc fuller <gogita99@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds server-side passive voice detection to the TipTap rich text editor using spaCy NLP. Users can scan editor content via a button that highlights passive voice sentences with yellow background and wavy red underline. Detection uses spaCy's en_core_web_sm model loaded once per Django worker via singleton pattern.
Changes:
- Added spaCy dependency and
ghostwriter/modules/passive_voice/module with NLP-based passive voice detection - Created REST API endpoint
POST /api/v1/passive-voice/detectwith authentication and validation - Implemented TipTap extension with React button component for scan/clear functionality and visual decorations
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| requirements/base.txt | Adds spaCy 3.8.11 dependency |
| javascript/src/tiptap_gw/passive_voice_decoration.ts | TipTap extension for passive voice decorations with hover effects |
| javascript/src/tiptap_gw/index.ts | Registers PassiveVoiceDecoration extension |
| javascript/src/services/passive_voice_api.ts | API service for passive voice detection |
| javascript/src/services/csrf.ts | CSRF token utility extracted for reuse |
| javascript/src/frontend/collab_forms/rich_text_editor/passive_voice.tsx | React button component for scanning and clearing highlights |
| javascript/src/frontend/collab_forms/rich_text_editor/index.tsx | Adds PassiveVoiceButton to toolbar |
| javascript/src/frontend/collab_forms/rich_text_editor/evidence/upload.tsx | Refactored to use shared getCsrfToken utility |
| javascript/src/frontend/collab_forms/editor.scss | Styles for passive voice highlights |
| ghostwriter/modules/passive_voice/tests/test_detector.py | Unit tests for PassiveVoiceDetector class |
| ghostwriter/modules/passive_voice/tests/test_api.py | Unit tests for API endpoint |
| ghostwriter/modules/passive_voice/tests/init.py | Test module initialization |
| ghostwriter/modules/passive_voice/detector.py | Singleton detector class using spaCy |
| ghostwriter/modules/passive_voice/init.py | Module initialization |
| ghostwriter/api/views.py | Adds detect_passive_voice endpoint |
| ghostwriter/api/urls.py | Registers passive voice detection URL |
| config/settings/base.py | Adds spaCy configuration settings |
| compose/production/django/Dockerfile | Downloads spaCy model during build |
| compose/local/django/Dockerfile | Downloads spaCy model during build |
Files not reviewed (1)
- javascript/package-lock.json: Language not supported
javascript/src/frontend/collab_forms/rich_text_editor/evidence/upload.tsx
Show resolved
Hide resolved
Signed-off-by: marc fuller <gogita99@gmail.com>
|
I don't think we need to run the model in a separate container - it would be running on the same computer anyway (unless someone went through the effort of distributing the containers across machines). I'm just a bit concerned with adding machine learning stuff during this LLM craze, but it seems like spacy predates it and is focused on the specific topic of natural language processing, so it doesn't seem too crazy. |
Signed-off-by: marc fuller <gogita99@gmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #796 +/- ##
=======================================
Coverage 91.37% 91.38%
=======================================
Files 368 368
Lines 20933 20955 +22
=======================================
+ Hits 19127 19149 +22
Misses 1806 1806 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: marc fuller <gogita99@gmail.com>
Signed-off-by: marc fuller <gogita99@gmail.com>
Signed-off-by: marc fuller <gogita99@gmail.com>
…ler/Ghostwriter into passive_voice_detection
Signed-off-by: marc fuller <gogita99@gmail.com>
Issue
N/A
Description of the Change
Added server-side passive voice detection to the TipTap rich text editor using spaCy NLP library. Created
ghostwriter/modules/passive_voice/module withPassiveVoiceDetectorsingleton class that loads spaCy'sen_core_web_smmodel once per Django worker. The detector identifies passive voice by finding auxiliary passive dependencies (auxpass) combined with past participle verb tags (VBN). Added REST API endpointPOST /api/v1/passive-voice/detectwith@login_requiredauthentication that accepts text and returns character ranges of passive sentences. Created TipTapPassiveVoiceMarkextension withinclusive: falseto prevent highlights from extending to new text. Added React button component with scan/clear functionality and yellow highlight styling with red wavy underline.Alternate Designs
Client-Side JavaScript NLP: Rejected due to lower accuracy than spaCy, and client-side computational load.
Real-Time Detection on Typing: Rejected due to excessive API traffic, distracting UX with constantly updating highlights, and network dependency during editing.
Background Task Processing: Rejected as unnecessary complexity for sub-2-second operations on typical report text.
Selected: On-demand server-side detection for accuracy, lean bundle size, user-controlled timing, and simple architecture.
Possible Drawbacks
Memory: spaCy model adds ~50MB per Django worker (100-200MB total for typical 2-4 worker deployments). Build Time: Adds ~30 seconds to Docker image builds for model download. Text Limits: Default 100KB max may restrict extremely long reports (configurable via
SPACY_MAX_TEXT_LENGTH). Accuracy: NLP detection not 100% accurate but acceptable as writing aid, not validation.Verification Process
Unit Tests: Created 22 tests (11 detector logic, 11 API endpoint) covering singleton pattern, passive/active detection, authentication, validation, and error handling. All tests pass with 100% detector coverage, 95% API coverage.
Manual Testing: Verified basic detection in finding editor, clear highlights functionality, mixed active/passive text handling, 50KB document performance (<2 seconds), empty editor handling, network failure error messages, authentication enforcement, and unicode character support. Tested in Chrome 131, Firefox 133, Safari 18 on Mac.
Regression: Full test suite passes (1,247 tests, +3s runtime). No regression in existing modules.
Docker: Verified model download in both local and production Dockerfiles, confirmed model loads at startup.
Release Notes
Added passive voice detection to the rich text editor with one-click scanning and visual highlighting of passive voice sentences using spaCy NLP