String Matching Algorithms - Homework Assignment

Overview

This homework assignment focuses on implementing and analyzing string matching algorithms. You will implement two algorithms and create a pre-analysis system to intelligently choose the best algorithm for different scenarios.

Test cases are loaded from JSON files, making it easy to add, modify, and share test cases.

Project Structure

StringMatching/
├── src/
│   ├── Solution.java              - Base class for all algorithms
│   ├── Analysis.java              - Contains all algorithm implementations
│   ├── PreAnalysis.java           - Pre-analysis interface and implementations
│   ├── TestCase.java              - Test case data structure
│   ├── TestCaseLoader.java        - Loads test cases from JSON files
│   ├── TestRunner.java            - Test execution and comparison
│   ├── PreAnalysisComparison.java - Pre-analysis performance comparison
│   ├── ManualTest.java            - Main program entry point
│   └── ManualTestRunner.java      - Detailed test execution with timing
├── testcases/
│   ├── shared/                    - Test cases for students (JSON files)
│   │   ├── 01_simple_match.json
│   │   ├── 02_no_match.json
│   │   └── ... (10 test files)
│   └── hidden/                    - Test cases for grading (JSON files)
│       ├── 11_case_sensitive.json
│       └── ... (5 test files)
├── run.sh                         - Quick run script
└── test.sh                        - Main test runner script

Your Tasks

Task 1: Implement Boyer-Moore Algorithm (Required)

File: src/Analysis.java - Class BoyerMoore

Implement the Boyer-Moore string matching algorithm. This algorithm is known for its efficiency, especially with large alphabets.

Key concepts:

Bad character rule
Good suffix rule
Preprocessing phase

Current status: Throws UnsupportedOperationException

Task 2: Implement Your Own Algorithm (Not Required)

File: src/Analysis.java - Class GoCrazy

Create your own string matching algorithm! Be creative and try to optimize for specific cases.

Ideas:

Combine multiple algorithms
Optimize for specific pattern characteristics
Use heuristics to skip unnecessary comparisons

Current status: Throws UnsupportedOperationException

Task 3: Implement Pre-Analysis Logic (Required)

File: src/PreAnalysis.java - Class StudentPreAnalysis

Implement the chooseAlgorithm() method to analyze the text and pattern, then choose the best algorithm.

Goal: Maximize the number of times you choose the fastest algorithm!

Considerations:

Pattern length
Text length
Pattern characteristics (repeating characters, etc.)
Alphabet size
Algorithm time complexities

Current status: Returns null (no algorithm selection)

Already Implemented Algorithms

The following algorithms are already implemented for reference:

Naive Algorithm - Simple brute force approach
- Time Complexity: O(n*m)
- Good for: Short patterns, small texts
KMP (Knuth-Morris-Pratt) - Uses failure function
- Time Complexity: O(n+m)
- Good for: Patterns with repeating prefixes
Rabin-Karp - Uses rolling hash
- Time Complexity: O(n+m) average, O(n*m) worst case
- Good for: Multiple pattern matching, long patterns

How to Run

Quick Start:

# Run all tests with full comparison tables
./run.sh

# Or use test.sh directly
./test.sh

Run Specific Tests:

# Run specific test indices
./test.sh 0 1 2

# Run a range of tests
./test.sh 0-5

# Run only shared tests (for students)
./test.sh share

# Run only hidden tests (for grading)
./test.sh hidden

# List all available tests
./test.sh list

# Run with pre-analysis comparison
./test.sh preanalysis

Manual Compilation and Run:

# Compile all files
javac src/*.java

# Run from src directory
cd src
java ManualTest [arguments]

Output

The program generates multiple comparison tables:

Table 1: Detailed Test Results - Execution Time Comparison

Shows execution time for each algorithm on each test:

Test Case - Name of the test
Algorithm columns - Status and execution time in microseconds (μs)
✓ PASS - Algorithm solved the test correctly
✗ FAIL - Algorithm produced incorrect output
✗ ERROR - Algorithm threw an exception
N/A - Algorithm not implemented

Table 2: Summary Statistics

Shows aggregate statistics for each algorithm:

Passed/Failed - Number of tests passed and failed
Not Implemented - Number of tests where algorithm is not implemented
Avg/Min/Max - Average, minimum, and maximum execution times

Table 3: Fastest Algorithm Comparison

Shows which algorithm was fastest for each test case

Table 4: Algorithm Comparison Table

Shows which algorithms successfully solved each test case (Pass/Fail/N/A)

Table 5: Pre-Analysis Performance Comparison (when using `preanalysis` mode)

Shows the performance of your pre-analysis system:

Analysis(μs) - Time spent in pre-analysis
Exec(μs) - Time spent executing chosen algorithm
Total(μs) - Total time (Analysis + Execution)
Fastest Alg - The actually fastest algorithm
Time Diff(μs) - Time saved (positive) or lost (negative)
✓ - Pre-analysis chose the fastest algorithm
✗ - Pre-analysis did NOT choose the fastest

Test Cases

Test cases are stored as JSON files in the testcases/ directory.

Shared Tests (testcases/shared/)

These tests are shared with students for development and debugging:

Simple Match
No Match
Single Character
Pattern at End
Pattern at Beginning
Overlapping Patterns
Long Text Multiple Matches
Pattern Longer Than Text
Entire Text Match
Repeating Pattern

Hidden Tests (testcases/hidden/)

These tests are kept hidden for instructor grading: 11. Case Sensitive 12. Numbers and Special Characters

JSON Test Case Format

Each test case is a JSON file with the following structure:

{
  "name": "Test Name",
  "text": "The text to search in",
  "pattern": "pattern",
  "expected": "0,5,10"
}

Adding New Test Cases

To add a new test case:

Create a new JSON file in testcases/shared/ or testcases/hidden/
Follow the naming convention: ##_test_name.json
Fill in the name, text, pattern, and expected result
The system will automatically load it on next run

Grading Criteria

Boyer-Moore Implementation (30%)
- Correctness: Passes all test cases
- Efficiency: Proper implementation of bad character rule
- Code quality: Clean, well-commented code
Your Research (30%)
- Documentation: Document how you used the internet or LLMs to create your strategy, and share all resources you used
- Transparency: Be honest about your research process and sources
Pre-Analysis Implementation (30%)
- Accuracy: Percentage of times fastest algorithm is chosen
- Efficiency: Pre-analysis overhead is minimal
- Logic: Well-reasoned algorithm selection strategy
- Documentation: Clear explanation of strategy
Your Journey (10%) - MANDATORY
- ⚠️ IMPORTANT: This section is MANDATORY. Without it, your homework will NOT be graded.
- Be honest and share your experience - there are no wrong answers here
- Tell us what you learned, what challenges you faced, or anything you want to share
- Examples of what you can write:
  - "I learned X and Y, and found Z challenging"
  - "I only had 30 minutes because my dog died"
  - "I couldn't finish because of [reason], but I learned [something]"
  - "This homework was [your honest opinion]"
- Feedback: Any constructive comments about the homework assignment are welcome

Tips

For Boyer-Moore:

Start with the bad character rule
Test with simple cases first
Handle edge cases (pattern longer than text, empty pattern)

For GoCrazy:

Study the existing algorithms
Think about their strengths and weaknesses
Consider hybrid approaches

For Pre-Analysis:

Look at the ExamplePreAnalysis class for inspiration
Analyze the test results to understand algorithm performance
Consider multiple factors, not just pattern length
Balance accuracy with analysis overhead

Example Pre-Analysis Strategy

@Override
public String chooseAlgorithm(String text, String pattern) {
    int textLen = text.length();
    int patternLen = pattern.length();
    
    if (patternLen <= 3) {
        return "Naive"; // Fast for very short patterns
    } else if (hasRepeatingPrefix(pattern)) {
        return "KMP"; // Good for patterns with repeating prefixes
    } else if (patternLen > 10 && textLen > 1000) {
        return "RabinKarp"; // Good for long patterns in long texts
    } else {
        return "BoyerMoore"; // Generally efficient
    }
}

Submission

Submit the following files:

Analysis.java - With your Boyer-Moore and GoCrazy implementations
PreAnalysis.java - With your StudentPreAnalysis implementation
A brief report (PDF) explaining:
- Your Boyer-Moore implementation approach
- Your GoCrazy algorithm design and rationale
- Your pre-analysis strategy and why you chose it
- Analysis of your results

Resources

Boyer-Moore Algorithm
String Matching Algorithms Overview
Course lecture notes on string matching

Questions?

Contact your instructor or TA for help! Harun Yahya Öztürk harunyahozturk@gmail.com (Mail's should be sent via subject of 'String Matching') 05413763007

Good luck! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
src		src
testcases/shared		testcases/shared
.gitignore		.gitignore
README.md		README.md
StringMatching.iml		StringMatching.iml
my_journey.md		my_journey.md
run.sh		run.sh
test.sh		test.sh

ozdmrmustafa/StringMatching

Folders and files

Latest commit

History

Repository files navigation

String Matching Algorithms - Homework Assignment

Overview

Project Structure

Your Tasks

Task 1: Implement Boyer-Moore Algorithm (Required)

Task 2: Implement Your Own Algorithm (Not Required)

Task 3: Implement Pre-Analysis Logic (Required)

Already Implemented Algorithms

How to Run

Quick Start:

Run Specific Tests:

Manual Compilation and Run:

Output

Table 1: Detailed Test Results - Execution Time Comparison

Table 2: Summary Statistics

Table 3: Fastest Algorithm Comparison

Table 4: Algorithm Comparison Table

Table 5: Pre-Analysis Performance Comparison (when using preanalysis mode)

Test Cases

Shared Tests (testcases/shared/)

Hidden Tests (testcases/hidden/)

JSON Test Case Format

Adding New Test Cases

Grading Criteria

Tips

For Boyer-Moore:

For GoCrazy:

For Pre-Analysis:

Example Pre-Analysis Strategy

Submission

Resources

Questions?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Table 5: Pre-Analysis Performance Comparison (when using `preanalysis` mode)

Packages