Skip to content

Conversation

@Roj
Copy link
Member

@Roj Roj commented Mar 16, 2019

This PR organizes the project into the following directories:

  • data - as before
  • preprocessing - converting data files into text files for word2vec to use
  • models - word embedding models and recommendation systems
  • analysis - benchmarking of models and embedding analysis

Roj added 8 commits December 31, 2018 17:24
PlaylistIterator now accepts a parameter to load track metadata (artist
data only*). idomaarReader caches this data if it is loaded, so
it doesn't have to be loaded on each new instance.
Also fixed some bugs ref. to the load of session data.
Before this commit the model wouldn't actually use the metadata.
the iterator now works without hard-coding dataset values or
schema. It is less efficient as it now uses dictionaries instead of
vectors whether they are better or not. However, this allows one to
not to worry about dataset quirks when parsing metadata.
The iterator also has a new registry that servers as a lookup table
for existing entities, so if a session has some song it just keeps
the reference of the existing entity. This also allows metadata
to be preloaded into songs and artists.
An important change is that all elements are constructed and
persisted in cascade, even if you do not use them (users,
for example). It might be a good idea to keep a blacklist or
whitelist of entity types to save later. For now it's enough.
(it may not be up-to-date)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants