ETL

Extract - Transform - Load Summary: For this project we aimed to take a large music data base in the form of a csv (gathered from https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking) which has the top 200 songs accross the country captured for every day of 2017 and 2018. It captures the chart position of the song, track name, artist name, number of streams, the spotify url, date of the captured data, and the country code. There was some additional data that we wanted to add to this data base. This missing data includes the countries of cooresponding country codes and the genre of each song.

Description: The purpose of this project was to

•Extract the data of trending music from various data-sources,

•Collect the missing information using web-scraping & APIcalls.

•Cleanse the data by adding / removing the raw attributes

•Load the final dataset to the SQL/No-SQL database for further analysis.

Use-cases:The stored dataset can be used to analyze the trending musical numbers of recent pastacross the globe and identify the most popular genre, artist or the track.

Loading: We have stored the data in Mongo-DB, SQLite, and MySQL. The file 'music.sqlite' is the needed sqlite file. The mongoDB jupyter notebook is 'MongoDB.ipynb'.

Data-sources:

1.CSV from Kaggle https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking

2.Web Scraping https://en.wikipedia.org/wiki/ISO_3166-1

3.APIhttp://ws.audioscrobbler.com/2.0/method=track.getInfo&api_key=b848087a7bcf37ce7a1404dc164ed41d&artist=J%20Balvin&track=Safari&format=json

User Note: When downloading the music csv it is important to name the file "music_data.csv" for it to be read by our jupyter notebook. Additionally you will need to be sure to have grequests installed inorder to run the API. This can be done by by typing 'pip install grequest' into your terminal.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
A MILLION ROWS_ ETL Report.pdf		A MILLION ROWS_ ETL Report.pdf
ETL Project Proposal.docx		ETL Project Proposal.docx
ETL Project Proposal.pdf		ETL Project Proposal.pdf
ETL project.ipynb		ETL project.ipynb
MongoDB.ipynb		MongoDB.ipynb
README.md		README.md
config.py		config.py
country table.PNG		country table.PNG
etl group.sql		etl group.sql
music data.ipynb		music data.ipynb
music db.PNG		music db.PNG
music table.PNG		music table.PNG
music.sqlite		music.sqlite
music_data_final.csv		music_data_final.csv
music_db_final_sql.sql		music_db_final_sql.sql
~$L Project Proposal.docx		~$L Project Proposal.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Emaway/ETL-Group-project

Folders and files

Latest commit

History

Repository files navigation

ETL

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages