Introduction to the Krembil Centre for Neuroinformatics (KCNI) and the Scientific Computing Working Group (SCWG) Python Workshop
This workshop is designed for beginners who want to learn Python for data analysis and scientific computing. By the end of this session, you'll be comfortable with Python basics and ready to work with real-world datasets.
- Arithmetic operations (addition, subtraction, multiplication, division)
- Floor division and modulo operators
- Variable assignment and naming conventions
- String operations and methods
- Lists (ordered, mutable collections)
- Dictionaries (key-value pairs)
- Tuples (immutable sequences)
- Sets (unique collections)
- Understanding mutable vs immutable types
- For loops and iteration
- Booleans and logical operators
- Installing packages with pip
- Importing modules and packages
- Introduction to NumPy (numerical computing)
- Introduction to Pandas (data manipulation)
- Introduction to Scikit-learn (scientific computing)
- Loading data from CSV, TSV, and TXT files
- Selecting columns and rows
- Filtering data with conditions
- Handling missing values (drop NA)
- Dropping unwanted columns
- Grouping and aggregation
- Joining DataFrames (left, right, inner, outer joins)
- Summary statistics
- Data visualization with Matplotlib and Seaborn
- Correlation analysis
- Building linear regression models
- Model evaluation and interpretation
This workshop is optimized for GitHub Codespaces, which provides a complete containderized Python environment in your browser.
- Open in Codespaces: Click the "Code" button and select "Create codespace on main"
- Wait for setup: The environment will automatically install all required packages
- Open notebooks: Navigate to the
notebooks/folder and start with01_basics_and_syntax.ipynb
If you prefer to work locally:
# Clone the repository
git clone <repository-url>
cd python_workshop
# Create a virtual environment
python -m venv venv
# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Launch Jupyter
jupyter notebookpython_workshop/
├── .devcontainer/
│ └── devcontainer.json # GitHub Codespaces configuration
├── notebooks/
│ ├── 01_basics_and_syntax.ipynb
│ ├── 02_data_structures_and_control_flow.ipynb
│ ├── 03_packages_and_libraries.ipynb
│ ├── 04_pandas_data_manipulation.ipynb
│ └── 05_analysis_and_modeling.ipynb
├── requirements.txt # Python package dependencies
└── README.md # This file
- Work through notebooks sequentially: Each notebook builds on concepts from the previous one
- Run all code cells: Execute each cell to see the output and understand the behavior
- Complete exercises: Each notebook includes practice exercises at the end
- Experiment: Modify the code and try your own variations
- Ask questions: Don't hesitate to explore beyond the provided examples
All packages are pre-installed in the Codespace. The workshop uses:
- pandas (>=2.0.0): Data manipulation and analysis
- numpy (>=1.24.0): Numerical computing
- scikit-learn (>=1.3.0): Machine learning
- matplotlib (>=3.7.0): Plotting and visualization
- seaborn (>=0.12.0): Statistical data visualization
- jupyter (>=1.0.0): Interactive notebooks
The dataset we are using for this course has been generated randomly using different python scripts. All data is simulated and not representative of any real or actual dataset
- Official Documentation:
- Build small projects: Apply what you learned to real problems
- Learn more advanced topics:
- Advanced Pandas operations
- More machine learning algorithms
- Data visualization with Plotly
- Web scraping with BeautifulSoup
- HarvardX: CS50's Introduction to Programming with Python
- Google Crash Course on Python
- Corey Schafer Pandas Tutorials
If you have any questions, feel free to reach out:
- Hassan Abdurasul - Hassan.Abdulrasul@camh.ca
- Bailey Ng - bailey.ng@mail.utoronto.ca
This workshop material is provided for educational purposes. Feel free to use and modify for your learning.