Tag Analysis Project

Project Description

In this project, I created a search functionality for movie tags from the MovieLens Dataset. The goal is to analyze the tags used by entertainment consumers, leveraging efficient data structures and algorithms. The implementation reads data from a file of tags, allows the user to search for individual tags by popularity, and displays the results in a user-friendly manner.

Data Structures and Algorithms

Data Structures Used

Tag: A class for holding the attributes of each row of the CSV.
ArrayList: Holds a Tag for each row in the CSV.

Algorithm: Parsing CSV

For each line in the CSV:

Find the 3 delimiting commas that separate the attributes of a row.
Construct a Tag instance from those attributes.
Append the Tag instance to an ArrayList<Tag>.

Big-O Running Time: O(N) where N is the number of rows in the CSV.

List Most and Least Popular Tags

Data Structures Used

TagFrequency: A class that holds the tag and its frequency.
ArrayList: Holds TagFrequency instances for unique tags.
ArrayList sublists: Holds the most and least frequent tags.

Algorithm

Sort the tags list by name.
Create a new ArrayList<TagFrequency> frequencies.
Create a TagFrequency instance for the first tag in the tags list.
Iterate over the tags list:
- Increment the frequency of the current TagFrequency object if the tag matches.
- Otherwise, append the current TagFrequency object to frequencies and create a new TagFrequency object for the new tag.
Sort frequencies by count.
Create sublists with the first 3 objects (highest frequency) and last 3 objects (lowest frequency).
Print these sublists.

Runtime:

Sorting tags list by names: O(N log N)
Creating the frequencies list: O(N)
Sorting frequencies by count: O(N log N)
Creating sublists: O(1)

Final Runtime: O(N log N) + O(N) + O(N log N) + O(1) = O(N log N)

Find Tags by Count and Counts by Name

Data Structures Used

ArrayList frequenciesByName: Sorted by tag name.
ArrayList frequencies: Sorted by count.
ArrayList results: Stores the tags with matching frequencies for output.

Algorithm

Sort frequencies by tag name and store in frequenciesByName.
Use the sorted frequencies by count.

If searching by tag:

Perform a binary search on frequenciesByName for the given tag.
Print the tag’s frequency if found; otherwise, print that the tag wasn’t found.

If searching by count:

Validate the given count input.
Perform a binary search in frequencies for an index (idx) with the given count.
Search indices to the left and right of idx for tags with the matching count.
Return the tags with the matching count.

Runtimes:

Sorting the lists: O(N log N)
Searching frequencies: O(log N)
Searching frequenciesByName: O(log N) + O(N)

Final Runtime:

Initial sorting cost: O(N log N)
Repeated search cost: O(log N)
Worst case for count search: O(N)

Warning

For academic honesty, do not replicate or use this code for coursework or assessments.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
MovieTags.java		MovieTags.java
README.md		README.md
Tag.java		Tag.java
TagFrequency.java		TagFrequency.java

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tag Analysis Project

Project Description

Data Structures and Algorithms

Data Structures Used

Algorithm: Parsing CSV

List Most and Least Popular Tags

Data Structures Used

Algorithm

Find Tags by Count and Counts by Name

Data Structures Used

Algorithm

Warning

About

Uh oh!

Releases

Packages

Languages

riyamathur1/MovieTags

Folders and files

Latest commit

History

Repository files navigation

Tag Analysis Project

Project Description

Data Structures and Algorithms

Data Structures Used

Algorithm: Parsing CSV

List Most and Least Popular Tags

Data Structures Used

Algorithm

Find Tags by Count and Counts by Name

Data Structures Used

Algorithm

Warning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages