Skip to content

[Build-1.1.0] A Java-based tool that converts SARS exam rosters into clean, structured summaries for the Dalhousie University Accessibility Center.

License

Notifications You must be signed in to change notification settings

NepSauce/normalized-entity-parser

Repository files navigation

Normalized Entity Parser (NEP)

Build-1.0.0-Alpha


Table of Contents


1. Overview

NEP (Normalized Entity Parser) is a purpose-built software designed to assist Lead Invigilators in streamlining the coordination of daily exam accommodations. It automates the conversion of detailed PDF rosters, generated by the university’s scheduling system, SARS, into a clear, structured summary of scheduled exams. By extracting and grouping key information such as course codes, exam times, and locations, NEP generates concise reports that show how many students are scheduled per course at specific times and venues. This significantly reduces the manual effort required to review rosters and enhances efficiency in exam planning and logistics.


2. System Requirements

To run this software, ensure your system meets the following minimum requirements:

  • Operating System: Windows, macOS, or Linux
  • Java Runtime Environment (JRE): Version 8 or higher
  • RAM: 1 GB or higher
  • Storage: Minimum 100 MB of free disk space
  • Dependencies:
    • Apache PDFBox (Bundled with the application)
    • Swing GUI support (Built into standard Java)

3. Input Format

NEP is designed to take in a specific file format.

  • File Type: PDF
  • Document Type: Daily Report generated by SARS
  • Required Information:
    1. Student Name
    2. Student Dal ID
    3. Location
    4. Time
    5. Course Code

4. Output Format

NEP generates 5 distinct types of files for every full pass through the application. All files end in .txt.

  1. Normalized Objects
    Generalized form of every line detected in each individual roster. Each line is saved in the format:
    [DAL ID | LAST NAME, FIRST NAME | COURSE CODE | LOCATION | TIME]

  2. Combined Objects
    All individual Normalized Object files are combined into one file for further cleaning. Entries with missing details or errors are removed.

  3. Removed Objects
    Entries removed from the Combined Objects file are saved here. Can be re-added using the GUI.
    Format:
    [DalID: DAL ID | Name: LAST NAME, FIRST NAME | Code: COURSE CODE | Location: LOCATION | Time: TIME]

  4. Grouped Objects
    Final file created for practical use. Mimics pen-and-paper booklet format.
    Format:
    Course Code: COURSE CODE
    STUDENT COUNT – LOCATION – TIME

  5. Filtered Objects
    Generated during Grouped Object creation for debugging.
    Format:
    Time: TIME, Course Code: COURSE CODE


5. Error Handling & Validation

  1. Student Dal ID: Starts with 'B' followed by 8 digits.
  2. Student Name: Alphabetic characters, spaces, and commas only.
  3. Location: Valid string from the exam location column (e.g., 2000 MCCAIN).
  4. Time: Matches h:mm AM/PM format (e.g., 12:00 PM).
  5. Course Code: [DEPT] [NUMBER] or [DEPT] [NUMBER] [SECTION], with 4-letter code, 4-digit number, and optional section.

6. Step-by-Step Guide

The front-end consists of four components:

  1. Input Panel Section (Top Left: Exam Location, Select a Date…)
  2. Output Panel Section (Top Right: Selected Exams)
  3. Settings Tab Section (Bottom Left: Selection, Panel…)
  4. Files Tab Section (Bottom Right: Combined, Removed…)

Task 1: Input Panel and Submission

  1. Select the exam location, date, and the SARS roster PDF. Supported locations: Mark A. Hill, G28, Alternate Location.
  2. Go to Settings Tab > Selection > Add Exam to queue a roster.
  3. Repeat for all PDFs. Use Settings Tab > Panel > Clear or Undo for corrections.
  4. When done, press Settings Tab > Panel > Submit to process all queued exams.

Task 2: Roster Validation and Object Check

  1. If a popup appears: Popup Code 1001: Removed Entries Saved, it indicates entries with missing/invalid fields.
  2. Navigate to Files Tab > Removed > Edit Entry to open the Removed Objects Viewer.
  3. Edit null fields. Pressing OK moves the fixed entry back to Combined Objects.
  4. Confirm changes via Files Tab > Combined > Open File.
  5. Repeat for all entries if needed.

Task 3: Grouped Object and Printing

  1. After finalizing Combined Objects, press Files Tab > Grouped > Create.
  2. Open the file via Files Tab > Grouped > Open to view or print.
  3. Before printing, proceed to Task 4 for validation.

Task 4: Fault Trace and Validity

  1. Press Settings Tab > DevMode > Trace to enter Fault Trace mode.
    • Note: Fault Trace excludes date selection as it is considered arbitrary.
  2. Follow Tasks 1–3 inside Fault Trace and generate Grouped Object file.
    • The Cumulative Info panel refreshes every 2 seconds.

Metrics to observe:

  • Rosters Added: Number of rosters selected
  • Courses Found: Number of course entries
  • Combined Entries: Total valid entries
  • Grouped Entries: Final entry count
  • Removed Entries: Discarded entries

Combined Entries must equal Grouped Entries. Discrepancies indicate issues. Diagnostic tips:

  • Open Combined Objects and check for uncorrected errors.
  • Manually count and verify Grouped Object file.
  • Cross-check Combined Object entry total.

If problems persist, email the roster PDF to Zawad.Atif@dal.ca with a brief description.


7. Notes on Reliability

No software is perfect. While NEP handles most cases reliably, occasional hiccups may occur due to edge cases or unexpected input formats. Human oversight remains essential to ensure accuracy, particularly for logistics use.

The application spans over 5,100 lines of code and handles numerous formatting styles and edge cases. While robust, the variability in inputs means human validation is still necessary for final outputs.


8. Source & Ownership

This software was developed and is maintained by Zawad Atif and Nafisah Nubah for internal use at the Dalhousie Student Accessibility Center.


9. Development Overview

Zawad Atif assumed technical leadership for the project, managing overall architecture, selecting the technology stack, and implementing developer tools for debugging and diagnostics.
Nafisah Nubah developed the backend parsing logic to accurately extract and normalize data from SARS-generated PDFs.

Both authors contributed to testing, quality assurance, and creating a reliable user experience.

Technical Stack:

  • Language: Java
  • UI Framework: Swing
  • PDF Parsing Library: Apache PDFBox
  • IDE: IntelliJ IDEA
  • Packaging: Launch4j
  • Version Control: Git
  • Repository Hosting: GitHub

About

[Build-1.1.0] A Java-based tool that converts SARS exam rosters into clean, structured summaries for the Dalhousie University Accessibility Center.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages