Skip to content

DataSciencePros/datascicon.tech

Repository files navigation

datascicon.tech "Intro to Data Science with Jupyter" Workshop

This is a broad review of how to

  • do initial data analysis using pandas, and
  • create predictive models using scikit-learn and evaluating them

Prerequisites to install before coming to the Workshop

Recommended: Docker (or Python and All Dependencies and Jupyter Notebook)

We will give you access to a Jupyter Notebook Server running in a virtual machine on the cloud. However, we recommend you to have this notebook server running in a docker image locally on your own development machine, so you will be able study this material later more easily.

  1. Please make sure you have Docker Community Edition running on your machine. https://store.docker.com/search?offering=community&type=edition

  2. Since this is a 2GB download, preinstall the scipy-notebook, run:

docker pull jupyter/scipy-notebook:e8613d84128b

or docker pull jupyter/scipy-notebook:latest if you want to get always the latest version. (Using "latest" is not recommended, since you want to know which version of the platform your code was successfully running on...)
If you would need to modify this image, like adding additional libraries, look at README.md in https://github.com/DataSciencePros/scipy-notebook for instructions about how to modify this docker image using Dockerfile, build and run it, and push it to dockerhub repo.

For the workshop, you will need to run one docker command (below).
Learning docker is useful, you may want to attend the preceding workshop, or self-study: https://github.com/DataSciencePros/docker_workshop

Download This Repo (Data, Code and Tasks)

If you are not familiar with Git, You can browse to https://github.com/DataSciencePros/data_science_workshop, and click on green "Clone or Download" button, and download the zip file.

Recommended Way: Get Them Using "git"

If new to git, you may follow this guide to install it and study: http://rogerdudler.github.io/git-guide/

Make sure a git client is running on your machine, and checkout the needed repos. if you use the commandline client:

git clone https://github.com/DataSciencePros/data_science_workshop.git

Running the Docker Image and Viewing the Jupyter Notebooks

If you are using Docker

Copy this repo locally in your workspace, go to (data_science_workshop) repo folder:

 # this line for recommended way above to get this repo:
 git clone https://github.com/DataSciencePros/data_science_workshop.git
 cd datascicon.tech

Start jupyter docker instance from this folder.

On bash on MacOS

If using bash:

docker run --rm -p 8888:8888 -e JUPYTER_LAB_ENABLE=yes \
--mount 'type=bind,src='"$(pwd)"'/app,target=/home/jovyan/work' jupyter/scipy-notebook:e8613d84128b
# alternative way to mount
# mounts to a new folder, only managed by docker
# -v "$PWD":/app

On Windows

These are for running Linux Containers on Windows. If you have selected to use Windows Containers, switch to Linux Containers by right clicking the Docker Whale in system tray: https://docs.microsoft.com/en-us/virtualization/windowscontainers/quick-start/quick-start-windows-10

Then, you need to share the drive explicitly, going to the settings by right clicking the Docker Whale in system tray. For screenshots, please see: https://rominirani.com/docker-on-windows-mounting-host-directories-d96f3f056a2c

After sharing, on powershell:

docker run --rm -p 8888:8888 -e JUPYTER_LAB_ENABLE=yes --mount type=bind,src=$(pwd)/app,target=/home/jovyan/work jupyter/scipy-notebook:e8613d84128b

or on Windows command line:

docker run --rm -p 8889:8888 -e JUPYTER_LAB_ENABLE=yes --mount type=bind,src=%cd%/app,target=/home/jovyan/work jupyter/scipy-notebook:e8613d84128b

On ubuntu

Installed docker using default "Ubuntu Software Center", try the docker pull command:

docker pull jupyter/scipy-notebook:e8613d84128b

You may get this error: "Got permission denied while trying to connect to the Docker daemon socket..." Run this command in your favourite shell and then completely log out of your account and log back in (if in doubt, reboot!):

sudo usermod -a -G docker $USER

Then follow the MacOS section above

Describing the Command

Jupyter org defined this image to start notebook server serving files in /home/jovyan/work, that is why "app" folder is mapped into that folder. "$(pwd)" or %cd% means "print working directory" or "current directory" in shell.

  • This command will (download image if local copy is not found and) create instace from image and start execution.
  • It will print how to connect to the jupyter environment and load notebooks, or create new notebooks. Paste the link to your browser. (you may need to replace the root with localhost)
  • Jupyter will show you notebooks and other files in the app folder. Clicking on one starts the notebook.
    • For the Machine Learning workshop click on scikit_learn_workshop.ipynb
  • In the notebook, clicking on one cell (code box), and shift-enter, executes code in that box.

Visiting http://localhost:8888/?token= in a browser loads JupyterLab, where hostname is the name of the computer running docker and token is the secret token printed in the console. Docker destroys the container after notebook server exit, but any files written to ~/work in the container remain intact on the host.

Running Code Snippets (cells) in the Notebook

  • Select cell, click Shift-Enter
  • or use "Cell" menu

Productionizing the Model

If you want to create an web API, which will receive input, apply model to the data, and return the prediction, you can use Flask. You may want to check use https://github.com/DrOzturk/FlaskZipLookupApiExample

Credits

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •