Skip to content

Extract dataset, Transform (add additional data, clean dataset), Load transformed data into database for further analyses

Notifications You must be signed in to change notification settings

Emaway/ETL-Group-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL

Extract - Transform - Load Summary: For this project we aimed to take a large music data base in the form of a csv (gathered from https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking) which has the top 200 songs accross the country captured for every day of 2017 and 2018. It captures the chart position of the song, track name, artist name, number of streams, the spotify url, date of the captured data, and the country code. There was some additional data that we wanted to add to this data base. This missing data includes the countries of cooresponding country codes and the genre of each song.

Description: The purpose of this project was to

•Extract the data of trending music from various data-sources,

•Collect the missing information using web-scraping & APIcalls.

•Cleanse the data by adding / removing the raw attributes

•Load the final dataset to the SQL/No-SQL database for further analysis.

Use-cases:The stored dataset can be used to analyze the trending musical numbers of recent pastacross the globe and identify the most popular genre, artist or the track.

Loading: We have stored the data in Mongo-DB, SQLite, and MySQL. The file 'music.sqlite' is the needed sqlite file. The mongoDB jupyter notebook is 'MongoDB.ipynb'.

Data-sources:

1.CSV from Kaggle https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking

2.Web Scraping https://en.wikipedia.org/wiki/ISO_3166-1

3.APIhttp://ws.audioscrobbler.com/2.0/method=track.getInfo&api_key=b848087a7bcf37ce7a1404dc164ed41d&artist=J%20Balvin&track=Safari&format=json

User Note: When downloading the music csv it is important to name the file "music_data.csv" for it to be read by our jupyter notebook. Additionally you will need to be sure to have grequests installed inorder to run the API. This can be done by by typing 'pip install grequest' into your terminal.

About

Extract dataset, Transform (add additional data, clean dataset), Load transformed data into database for further analyses

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages