Build-1.0.0-Alpha
- 1. Overview
- 2. System Requirements
- 3. Input Format
- 4. Output Format
- 5. Error Handling & Validation
- 6. Step-by-Step Guide
- 7. Notes on Reliability
- 8. Source & Ownership
- 9. Development Overview
NEP (Normalized Entity Parser) is a purpose-built software designed to assist Lead Invigilators in streamlining the coordination of daily exam accommodations. It automates the conversion of detailed PDF rosters, generated by the university’s scheduling system, SARS, into a clear, structured summary of scheduled exams. By extracting and grouping key information such as course codes, exam times, and locations, NEP generates concise reports that show how many students are scheduled per course at specific times and venues. This significantly reduces the manual effort required to review rosters and enhances efficiency in exam planning and logistics.
To run this software, ensure your system meets the following minimum requirements:
- Operating System: Windows, macOS, or Linux
- Java Runtime Environment (JRE): Version 8 or higher
- RAM: 1 GB or higher
- Storage: Minimum 100 MB of free disk space
- Dependencies:
- Apache PDFBox (Bundled with the application)
- Swing GUI support (Built into standard Java)
NEP is designed to take in a specific file format.
- File Type: PDF
- Document Type: Daily Report generated by SARS
- Required Information:
- Student Name
- Student Dal ID
- Location
- Time
- Course Code
NEP generates 5 distinct types of files for every full pass through the application. All files end in .txt.
-
Normalized Objects
Generalized form of every line detected in each individual roster. Each line is saved in the format:
[DAL ID | LAST NAME, FIRST NAME | COURSE CODE | LOCATION | TIME] -
Combined Objects
All individual Normalized Object files are combined into one file for further cleaning. Entries with missing details or errors are removed. -
Removed Objects
Entries removed from the Combined Objects file are saved here. Can be re-added using the GUI.
Format:
[DalID: DAL ID | Name: LAST NAME, FIRST NAME | Code: COURSE CODE | Location: LOCATION | Time: TIME] -
Grouped Objects
Final file created for practical use. Mimics pen-and-paper booklet format.
Format:
Course Code: COURSE CODE
STUDENT COUNT – LOCATION – TIME -
Filtered Objects
Generated during Grouped Object creation for debugging.
Format:
Time: TIME, Course Code: COURSE CODE
- Student Dal ID: Starts with 'B' followed by 8 digits.
- Student Name: Alphabetic characters, spaces, and commas only.
- Location: Valid string from the exam location column (e.g., 2000 MCCAIN).
- Time: Matches
h:mm AM/PMformat (e.g., 12:00 PM). - Course Code:
[DEPT] [NUMBER]or[DEPT] [NUMBER] [SECTION], with 4-letter code, 4-digit number, and optional section.
The front-end consists of four components:
- Input Panel Section (Top Left: Exam Location, Select a Date…)
- Output Panel Section (Top Right: Selected Exams)
- Settings Tab Section (Bottom Left: Selection, Panel…)
- Files Tab Section (Bottom Right: Combined, Removed…)
- Select the exam location, date, and the SARS roster PDF. Supported locations: Mark A. Hill, G28, Alternate Location.
- Go to
Settings Tab > Selection > Add Examto queue a roster. - Repeat for all PDFs. Use
Settings Tab > Panel > Clear or Undofor corrections. - When done, press
Settings Tab > Panel > Submitto process all queued exams.
- If a popup appears:
Popup Code 1001: Removed Entries Saved, it indicates entries with missing/invalid fields. - Navigate to
Files Tab > Removed > Edit Entryto open the Removed Objects Viewer. - Edit null fields. Pressing OK moves the fixed entry back to Combined Objects.
- Confirm changes via
Files Tab > Combined > Open File. - Repeat for all entries if needed.
- After finalizing Combined Objects, press
Files Tab > Grouped > Create. - Open the file via
Files Tab > Grouped > Opento view or print. - Before printing, proceed to Task 4 for validation.
- Press
Settings Tab > DevMode > Traceto enter Fault Trace mode.- Note: Fault Trace excludes date selection as it is considered arbitrary.
- Follow Tasks 1–3 inside Fault Trace and generate Grouped Object file.
- The Cumulative Info panel refreshes every 2 seconds.
Metrics to observe:
- Rosters Added: Number of rosters selected
- Courses Found: Number of course entries
- Combined Entries: Total valid entries
- Grouped Entries: Final entry count
- Removed Entries: Discarded entries
Combined Entries must equal Grouped Entries. Discrepancies indicate issues. Diagnostic tips:
- Open Combined Objects and check for uncorrected errors.
- Manually count and verify Grouped Object file.
- Cross-check Combined Object entry total.
If problems persist, email the roster PDF to Zawad.Atif@dal.ca with a brief description.
No software is perfect. While NEP handles most cases reliably, occasional hiccups may occur due to edge cases or unexpected input formats. Human oversight remains essential to ensure accuracy, particularly for logistics use.
The application spans over 5,100 lines of code and handles numerous formatting styles and edge cases. While robust, the variability in inputs means human validation is still necessary for final outputs.
This software was developed and is maintained by Zawad Atif and Nafisah Nubah for internal use at the Dalhousie Student Accessibility Center.
- GitHub Repository: https://github.com/NepSauce/Normalized-Entity-Parser
- For issues or inquiries:
Zawad.Atif@dal.ca
Zawad Atif assumed technical leadership for the project, managing overall architecture, selecting the technology stack, and implementing developer tools for debugging and diagnostics.
Nafisah Nubah developed the backend parsing logic to accurately extract and normalize data from SARS-generated PDFs.
Both authors contributed to testing, quality assurance, and creating a reliable user experience.
Technical Stack:
- Language: Java
- UI Framework: Swing
- PDF Parsing Library: Apache PDFBox
- IDE: IntelliJ IDEA
- Packaging: Launch4j
- Version Control: Git
- Repository Hosting: GitHub