In this project, I created a search functionality for movie tags from the MovieLens Dataset. The goal is to analyze the tags used by entertainment consumers, leveraging efficient data structures and algorithms. The implementation reads data from a file of tags, allows the user to search for individual tags by popularity, and displays the results in a user-friendly manner.
- Tag: A class for holding the attributes of each row of the CSV.
- ArrayList: Holds a
Tagfor each row in the CSV.
For each line in the CSV:
- Find the 3 delimiting commas that separate the attributes of a row.
- Construct a
Taginstance from those attributes. - Append the
Taginstance to anArrayList<Tag>.
Big-O Running Time: O(N) where N is the number of rows in the CSV.
- TagFrequency: A class that holds the tag and its frequency.
- ArrayList: Holds
TagFrequencyinstances for unique tags. - ArrayList sublists: Holds the most and least frequent tags.
- Sort the
tagslist by name. - Create a new
ArrayList<TagFrequency> frequencies. - Create a
TagFrequencyinstance for the first tag in thetagslist. - Iterate over the
tagslist:- Increment the frequency of the current
TagFrequencyobject if the tag matches. - Otherwise, append the current
TagFrequencyobject tofrequenciesand create a newTagFrequencyobject for the new tag.
- Increment the frequency of the current
- Sort
frequenciesby count. - Create sublists with the first 3 objects (highest frequency) and last 3 objects (lowest frequency).
- Print these sublists.
Runtime:
- Sorting
tagslist by names:O(N log N) - Creating the
frequencieslist:O(N) - Sorting
frequenciesby count:O(N log N) - Creating sublists:
O(1)
Final Runtime: O(N log N) + O(N) + O(N log N) + O(1) = O(N log N)
- ArrayList frequenciesByName: Sorted by tag name.
- ArrayList frequencies: Sorted by count.
- ArrayList results: Stores the tags with matching frequencies for output.
- Sort
frequenciesby tag name and store infrequenciesByName. - Use the sorted
frequenciesby count.
If searching by tag:
- Perform a binary search on
frequenciesByNamefor the given tag. - Print the tag’s frequency if found; otherwise, print that the tag wasn’t found.
If searching by count:
- Validate the given count input.
- Perform a binary search in
frequenciesfor an index (idx) with the given count. - Search indices to the left and right of
idxfor tags with the matching count. - Return the tags with the matching count.
Runtimes:
- Sorting the lists:
O(N log N) - Searching
frequencies:O(log N) - Searching
frequenciesByName:O(log N) + O(N)
Final Runtime:
- Initial sorting cost:
O(N log N) - Repeated search cost:
O(log N) - Worst case for count search:
O(N)
For academic honesty, do not replicate or use this code for coursework or assessments.