Automated Screenplay Annotation for Extracting Storytelling Knowledge
ScreenPy is a Python package for parsing and analyzing screenplays to extract structured narrative elements and storytelling patterns. Based on research presented at the Intelligent Narrative Technologies Workshop, it provides tools for automated screenplay annotation and knowledge extraction.
- Screenplay Parsing: Parse raw screenplays into structured elements
- Shot Heading Analysis: Extract location, shot type, subject, and time information
- Dialogue Extraction: Identify speakers and their dialogue with parentheticals
- Stage Direction Processing: Parse action descriptions and stage directions
- Verb Sense Disambiguation: Map actions to FrameNet frames and WordNet synsets
- Hierarchical Structure: Maintain scene and sub-scene relationships
- JSON Export: Export parsed screenplays in machine-readable format
This project implements the methodology described in:
"Automated Screenplay Annotation for Extracting Storytelling Knowledge" David R. Winer and R. Michael Young Intelligent Narrative Technologies Workshop (INT17), 2017
- Grammar-based parsing of shot headings following industry standards
- Hierarchical segmentation of screenplay structure
- Verb sense disambiguation for action extraction
- Large-scale corpus analysis of 1000+ IMSDb screenplays
# Clone the repository
git clone https://github.com/drwiner/ScreenPy.git
cd ScreenPy
# Install package
pip install -e .
# Or install with development dependencies
pip install -e ".[dev]"
# For NLP features
pip install -e ".[nlp]"from screenpy import ScreenplayParser
# Initialize parser
parser = ScreenplayParser()
# Parse a screenplay file
screenplay = parser.parse_file("path/to/screenplay.txt")
# Access structured elements
for segment in screenplay.master_segments:
print(f"Scene: {segment.heading.raw_text}")
if segment.heading.location_type:
print(f" Location: {' - '.join(segment.heading.locations)}")
if segment.heading.time_of_day:
print(f" Time: {segment.heading.time_of_day}")
# Export to JSON
screenplay_json = screenplay.to_json()# Parse a single screenplay
screenpy parse screenplay.txt -o output.json
# Batch process screenplays
screenpy batch data/screenplays/ -o data/outputs/
# Extract verb senses with VSD
screenpy vsd screenplay.txt --frames --synsets
# Generate statistics
screenpy stats data/outputs/ -o stats.csvScreenPy/
βββ src/screenpy/ # Main package code
β βββ parser/ # Parsing modules
β β βββ grammar.py # Shot heading grammar
β β βββ segmenter.py # Screenplay segmentation
β β βββ elements.py # Element extraction
β βββ vsd/ # Verb Sense Disambiguation
β β βββ frames.py # FrameNet integration
β β βββ synsets.py # WordNet integration
β β βββ clausie.py # Clause extraction
β βββ models.py # Data models (Pydantic)
β βββ utils.py # Utilities
β βββ cli.py # Command-line interface
βββ tests/ # Test suite
βββ data/ # Data files
β βββ screenplays/ # Raw screenplay files
β βββ outputs/ # Parsed outputs
βββ docs/ # Documentation
βββ examples/ # Example scripts
Shot headings follow a standardized grammar:
INT. LOCATION - SHOT_TYPE - SUBJECT - TIME_OF_DAY
Examples:
INT. CENTRAL PARK - DAYEXT. WHITE HOUSE - SOUTH LAWN - CLOSE ON CNN CORRESPONDENT - SUNSETWIDE SHOT - RACETRACK AND EMPTY STANDS
| Element | Description | Example |
|---|---|---|
| Master Headings | Scene beginnings with INT/EXT | INT. OFFICE - DAY |
| Shot Types | Camera shot specifications | CLOSE, WIDE, TRACKING |
| Stage Direction | Action descriptions | John enters the room. |
| Dialogue | Character speech | JOHN: Hello there! |
| Transitions | Scene changes | CUT TO:, FADE OUT |
| In-line Caps | Emphasized elements | Sound effects, character intros |
The VSD module maps verbs in stage directions to semantic frames:
from screenpy.vsd import VerbSenseAnalyzer
analyzer = VerbSenseAnalyzer()
# Analyze stage direction
text = "Indy sails through sideways and rolls to a stop"
actions = analyzer.extract_actions(text)
for action in actions:
print(f"Verb: {action.verb}")
print(f"Frames: {action.verb_sense.frames}")
print(f"Synsets: {action.verb_sense.synsets}")Analysis of 1000+ IMSDb screenplays:
| Genre | Films | Avg Segments | Avg Headings | Avg Dialogue |
|---|---|---|---|---|
| Action | 272 | 1240 | 621 | 538 |
| Comedy | 310 | 1370 | 582 | 720 |
| Drama | 541 | 1328 | 591 | 667 |
| Horror | 134 | 1150 | 632 | 451 |
| Sci-Fi | 140 | 1161 | 607 | 472 |
# Install development dependencies
pip install -e ".[dev]"
# Set up pre-commit hooks
pre-commit install
# Run tests
pytest
# Format code
black src/ tests/
# Type checking
mypy src/- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
- IMSDb - Internet Movie Script Database
- FrameNet - Frame semantic annotations
- WordNet - Lexical database
- spaCy - Industrial-strength NLP
- sense2vec - Semantic similarity
If you use ScreenPy in your research, please cite:
@inproceedings{winer2017screenpy,
title={Automated Screenplay Annotation for Extracting Storytelling Knowledge},
author={Winer, David R. and Young, R. Michael},
booktitle={Intelligent Narrative Technologies Workshop (INT17)},
year={2017},
organization={AAAI}
}This project is licensed under the MIT License - see the LICENSE file for details.
- David R. Winer - drwiner
- R. Michael Young - Advisor
- University of Utah School of Computing
- Entertainment Arts and Engineering Program
- National Science Foundation Grant No. 1654651
For questions or support, please open an issue