Apache Spark standalone cluster with spatial analysis capabilities

Working Apache Spark standalone cluster with Jupiter Notebook as a driver.

Dataset to work on Description:https://opensky-network.org/datasets/states/README.txt Datasource:https://opensky-network.org/datasets/states/

Jupiter Notebook

Password:

my-password

Token:

easy

Apache Sedona (https://sedona.apache.org/) is used to perform spatial operations.

Proper functioning require specific notebook core based on python 3.6. Whole list of requirements is in requirements.txt Command to create environment is in

conda create -p jupiter-notebook/venv36 --file requirements.txt

To run Apache Spark cluster with 5 standalone workers

docker-compose up -d --build --scale spark-worker=5

US States borders spatial data URL is

https://github.com/datasets/geo-admin1-us/blob/master/data/admin1-us.geojson

Jupiter notebook:

localhost:8888

Apache Spark master UI:

localhost:8080

Apache Spark driver UI:

localhost:4040

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
jupiter-notebook		jupiter-notebook
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Apache Spark standalone cluster with spatial analysis capabilities

About

Uh oh!

Releases

Packages

Languages

DLukash/spark-docker

Folders and files

Latest commit

History

Repository files navigation

Apache Spark standalone cluster with spatial analysis capabilities

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages