Normalized Entity Parser (NEP)

Build-1.0.0-Alpha

1. Overview

NEP (Normalized Entity Parser) is a purpose-built software designed to assist Lead Invigilators in streamlining the coordination of daily exam accommodations. It automates the conversion of detailed PDF rosters, generated by the university’s scheduling system, SARS, into a clear, structured summary of scheduled exams. By extracting and grouping key information such as course codes, exam times, and locations, NEP generates concise reports that show how many students are scheduled per course at specific times and venues. This significantly reduces the manual effort required to review rosters and enhances efficiency in exam planning and logistics.

2. System Requirements

To run this software, ensure your system meets the following minimum requirements:

Operating System: Windows, macOS, or Linux
Java Runtime Environment (JRE): Version 8 or higher
RAM: 1 GB or higher
Storage: Minimum 100 MB of free disk space
Dependencies:
- Apache PDFBox (Bundled with the application)
- Swing GUI support (Built into standard Java)

3. Input Format

NEP is designed to take in a specific file format.

File Type: PDF
Document Type: Daily Report generated by SARS
Required Information:
1. Student Name
2. Student Dal ID
3. Location
4. Time
5. Course Code

4. Output Format

NEP generates 5 distinct types of files for every full pass through the application. All files end in .txt.

Normalized Objects
Generalized form of every line detected in each individual roster. Each line is saved in the format:
[DAL ID | LAST NAME, FIRST NAME | COURSE CODE | LOCATION | TIME]
Combined Objects
All individual Normalized Object files are combined into one file for further cleaning. Entries with missing details or errors are removed.
Removed Objects
Entries removed from the Combined Objects file are saved here. Can be re-added using the GUI.
Format:
[DalID: DAL ID | Name: LAST NAME, FIRST NAME | Code: COURSE CODE | Location: LOCATION | Time: TIME]
Grouped Objects
Final file created for practical use. Mimics pen-and-paper booklet format.
Format:
Course Code: COURSE CODE
STUDENT COUNT – LOCATION – TIME
Filtered Objects
Generated during Grouped Object creation for debugging.
Format:
Time: TIME, Course Code: COURSE CODE

5. Error Handling & Validation

Student Dal ID: Starts with 'B' followed by 8 digits.
Student Name: Alphabetic characters, spaces, and commas only.
Location: Valid string from the exam location column (e.g., 2000 MCCAIN).
Time: Matches h:mm AM/PM format (e.g., 12:00 PM).
Course Code: [DEPT] [NUMBER] or [DEPT] [NUMBER] [SECTION], with 4-letter code, 4-digit number, and optional section.

6. Step-by-Step Guide

The front-end consists of four components:

Input Panel Section (Top Left: Exam Location, Select a Date…)
Output Panel Section (Top Right: Selected Exams)
Settings Tab Section (Bottom Left: Selection, Panel…)
Files Tab Section (Bottom Right: Combined, Removed…)

Task 1: Input Panel and Submission

Select the exam location, date, and the SARS roster PDF. Supported locations: Mark A. Hill, G28, Alternate Location.
Go to Settings Tab > Selection > Add Exam to queue a roster.
Repeat for all PDFs. Use Settings Tab > Panel > Clear or Undo for corrections.
When done, press Settings Tab > Panel > Submit to process all queued exams.

Task 2: Roster Validation and Object Check

If a popup appears: Popup Code 1001: Removed Entries Saved, it indicates entries with missing/invalid fields.
Navigate to Files Tab > Removed > Edit Entry to open the Removed Objects Viewer.
Edit null fields. Pressing OK moves the fixed entry back to Combined Objects.
Confirm changes via Files Tab > Combined > Open File.
Repeat for all entries if needed.

Task 3: Grouped Object and Printing

After finalizing Combined Objects, press Files Tab > Grouped > Create.
Open the file via Files Tab > Grouped > Open to view or print.
Before printing, proceed to Task 4 for validation.

Task 4: Fault Trace and Validity

Press Settings Tab > DevMode > Trace to enter Fault Trace mode.
- Note: Fault Trace excludes date selection as it is considered arbitrary.
Follow Tasks 1–3 inside Fault Trace and generate Grouped Object file.
- The Cumulative Info panel refreshes every 2 seconds.

Metrics to observe:

Rosters Added: Number of rosters selected
Courses Found: Number of course entries
Combined Entries: Total valid entries
Grouped Entries: Final entry count
Removed Entries: Discarded entries

Combined Entries must equal Grouped Entries. Discrepancies indicate issues. Diagnostic tips:

Open Combined Objects and check for uncorrected errors.
Manually count and verify Grouped Object file.
Cross-check Combined Object entry total.

If problems persist, email the roster PDF to Zawad.Atif@dal.ca with a brief description.

7. Notes on Reliability

No software is perfect. While NEP handles most cases reliably, occasional hiccups may occur due to edge cases or unexpected input formats. Human oversight remains essential to ensure accuracy, particularly for logistics use.

The application spans over 5,100 lines of code and handles numerous formatting styles and edge cases. While robust, the variability in inputs means human validation is still necessary for final outputs.

8. Source & Ownership

This software was developed and is maintained by Zawad Atif and Nafisah Nubah for internal use at the Dalhousie Student Accessibility Center.

GitHub Repository: https://github.com/NepSauce/Normalized-Entity-Parser
For issues or inquiries: Zawad.Atif@dal.ca

9. Development Overview

Zawad Atif assumed technical leadership for the project, managing overall architecture, selecting the technology stack, and implementing developer tools for debugging and diagnostics.
Nafisah Nubah developed the backend parsing logic to accurately extract and normalize data from SARS-generated PDFs.

Both authors contributed to testing, quality assurance, and creating a reliable user experience.

Technical Stack:

Language: Java
UI Framework: Swing
PDF Parsing Library: Apache PDFBox
IDE: IntelliJ IDEA
Packaging: Launch4j
Version Control: Git
Repository Hosting: GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.idea		.idea
.vscode		.vscode
META-INF		META-INF
Media		Media
NormalizedEntityParser		NormalizedEntityParser
TimedGroupedObjects		TimedGroupedObjects
app		app
gradle		gradle
out/artifacts/Normalized_Entity_Parser_jar		out/artifacts/Normalized_Entity_Parser_jar
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Normalized Entity Parser Usage Guide.pdf		Normalized Entity Parser Usage Guide.pdf
README.md		README.md
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Normalized Entity Parser (NEP)

Table of Contents

1. Overview

2. System Requirements

3. Input Format

4. Output Format

5. Error Handling & Validation

6. Step-by-Step Guide

Task 1: Input Panel and Submission

Task 2: Roster Validation and Object Check

Task 3: Grouped Object and Printing

Task 4: Fault Trace and Validity

7. Notes on Reliability

8. Source & Ownership

9. Development Overview

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

NepSauce/normalized-entity-parser

Folders and files

Latest commit

History

Repository files navigation

Normalized Entity Parser (NEP)

Table of Contents

1. Overview

2. System Requirements

3. Input Format

4. Output Format

5. Error Handling & Validation

6. Step-by-Step Guide

Task 1: Input Panel and Submission

Task 2: Roster Validation and Object Check

Task 3: Grouped Object and Printing

Task 4: Fault Trace and Validity

7. Notes on Reliability

8. Source & Ownership

9. Development Overview

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages