This project is part of the CS 245 course assignment titled "Friends in a Scandal". The goal of this assignment is to demonstrate mastery of graphs, graph algorithms, and object-oriented design by efficiently handling a large dataset of Enron emails.
The program reads valid mail files from the provided Enron dataset. The path to the dataset is given as the first argument to the program. The dataset can be obtained from this link. Uncompress the dataset using:
tar -xvzf enron_mail_20150507.tar.gz
The program identifies connectors in the friendship graph constructed from the dataset. Connectors are vertices in the graph which, if removed, would increase the number of connected components. The identified connectors are printed to stdout and optionally to a file provided as the second argument.
The program can respond to user queries about individual email addresses, providing:
- The number of unique email addresses to whom the individual sent messages
- The number of unique email addresses from whom the individual received messages
- The number of email addresses in the same "team" as the individual
This file contains the main implementation of the assignment. The key functionalities include:
- Reading the Enron email dataset
- Constructing a friendship graph
- Identifying and printing connectors
- Responding to user queries about individual email addresses
To run the program, use the following command:
java A3 /path/to/enron/maildir /path/to/output/connectors.txt
The second argument is optional. If not provided, the connectors will only be printed to stdout.
Email address of the individual (or EXIT to quit): kate.symes@enron.com
* kate.symes@enron.com has sent messages to X others
* kate.symes@enron.com has received messages from X others
* kate.symes@enron.com is in a team with X individuals
Email address of the individual (or EXIT to quit): notme@usfca.edu
Email address (notme@usfca.edu) not found in the dataset.
Email address of the individual (or EXIT to quit): EXIT
For academic honesty, do not replicate or use this code for coursework or assessments.