Skip to content

Hands-on exercise for Topic 1 - Code Review and Collaborative Software Development

License

Notifications You must be signed in to change notification settings

snadi-teaching/SA_CollabDev_Handson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In-Class Exercise: Mining Collaborative Development Artifacts

Technical Requirements

Before starting this exercise, ensure you have the following:

Software

Skills

  • Basic Python programming (functions, API calls, data structures)
  • Basic understanding of GitHub PRs and Issues

Setup Verification

Run these commands to verify your setup:

python --version    # Should show Python 3.11 or higher

Overview

Collaborative code development is about multiple developers working together on a codebase. It uses tools like Git and follows practices like pair programming and code review. It also includes DevOps practices like Continuous Integration/Continuous Deployment (CI/CD). The goal is to make development within a team easier while ensuring code quality. Platforms like GitHub produce many artifacts during development that can be mined to understand how a codebase evolves. These artifacts include issues, pull requests, commits, and code reviews.

This exercise will focus on investigating these collaborative development artifacts. You will learn to:

  • Manually explore a GitHub repository to extract information about PRs and Issues
  • Write a Python script to automate repository mining using the GitHub API
  • Compare manual findings with automated results

Repository for Investigation (a.k.a. Target Repository)

For this exercise, you will investigate the Marksafe library repository:


GenAI Usage Policy

The use of Generative AI tools (e.g., ChatGPT, Cursor, GitHub Copilot, Claude) is permitted for this exercise with the following guidelines:

Allowed Uses

  • Understanding GitHub API documentation
  • Debugging error messages
  • Learning Python syntax for API calls
  • Clarifying concepts about GitHub PRs, Issues, and API responses

Not Allowed

  • Generating the complete mine_repo.py script (you must research how to mine a GitHub repository yourself by reading the API documentation)
  • Using AI-enabled IDEs to generate the entire mining script
  • Having AI write your investigation findings

Requirements

  • You must be able to explain any code you submit
  • Document any AI assistance in your submission (brief note at the end of your PDF)

Exercise Instructions

Total Time: 75 minutes


Set Up Your Repository (5 minutes)

Create your own repository from this template:

  1. Click "Use this template" button (green button at the top of the repo)
  2. Select "Create a new repository"
  3. Name it appropriately (e.g., SAhandons-topic1-yourname)

Clone your repository:

git clone <your-repo-url>
cd <repo-name>

Note: Do NOT fork or clone this template directly. Always use the "Use this template" button to create your own copy.


Part 1: Manual Investigation (20 minutes)

Goal: Manually explore the target repository using GitHub's web interface and collect specific information about its collaborative development artifacts.

Task 1: Repository Statistics

Go to the GitHub repository and investigate it to answer the following questions:

  1. How many open Pull Requests are there?

  2. How many closed Pull Requests are there?

  3. How many merged (closed) Pull Requests are there? (Hint: Use filters to find merged PRs - these are a subset of closed PRs)

  4. How many total Issues are there? How many of those are closed vs. open?

  5. How many contributors have contributed to this repository?

  6. How many total commits are in this repository?

Task 2: Example PR with Passing Build

Find one example of a PR that has a passing build (green checkmark ✅) and record:

  • PR number and title
  • PR URL
  • Screenshot showing the passing CI status

Task 3: Example PR with Failing Build

Find one example of a PR that has a failing build (red X ❌ or failed status) and record:

  • PR number and title
  • PR URL
  • Screenshot showing the failing CI status

Task 4: Example PR with Inline Comments

Find one example of a PR that has inline code review comments (comments on specific lines of code) and record:

  • PR number and title
  • PR URL
  • Screenshot showing the inline comments
  • Brief description of what the comment discusses

Document your findings:

Create a document with:

  1. Repository statistics findings
  2. Links and screenshots for the three example PRs (passing build, failing build, inline comments)
  3. Brief record of any GenAI tools used during the investigation (if applicable)

Part 2: Automate with a Mining Script (40 minutes)

Goal: Complete the Python script (mine_repo.py) that automatically extracts the repository statistics you collected manually in Part 1.

Note: Starter code is provided in mine_repo.py. You need to implement the mining logic using the GitHub API.

Script Requirements:

  • Accept repository owner and name as command-line arguments
  • Select your desired approach to access GitHub data (e.g., via PyGithub)
  • Extract the following statistics:
    • Number of open PRs
    • Number of closed PRs (total closed, including merged and closed without merging)
    • Total number of issues (closed vs. open)
    • Number of contributors
    • Number of commits
  • Display results in a clear, readable format. Example is provided below.

Steps:

  1. Create a GitHub Personal Access Token:

    • Go to GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)
    • Click "Generate new token (classic)"
    • Name: "Repository Mining Script"
    • Select scope: public_repo (or repo if accessing private repos)
    • Generate and copy the token
  2. Create a .env file:

    GITHUB_TOKEN=your_token_here
    

    Important: Add .env to your .gitignore file!

  3. Choose a library for GitHub mining:

    Research and select an approach for accessing GitHub data (e.g., PyGithub, PyDriller, or GitHub REST API with requests) and add the chosen library to requirements.txt.

  4. Write your script:

    Implement the mining logic in mine_repo.py to extract and display the same 6 statistics you collected manually in Part 1:

    • Number of open Pull Requests
    • Number of closed Pull Requests (total closed, including merged vs. not merged)
    • Number of total Issues (both open and closed)
    • Number of Contributors
    • Number of Commits
  5. Create a Virtual Environment and Install dependencies:

    python -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt

Script Output Format:

Example:

$ python mine_repo.py pallets markupsafe

============================================================
REPOSITORY MINING RESULTS: pallets/markupsafe
============================================================

- Open Pull Requests: [NUMBER]
- Closed Pull Requests: [NUMBER]
--- Merged Pull Requests: [NUMBER]
--- Pull Requests closed without merging: [NUMBER]
- Total Issues: [NUMBER]
--- Open Issues: [NUMBER]
--- Closed Issues: [NUMBER]
- Contributors: [NUMBER]
- Total Commits: [NUMBER]

============================================================

Resources for Research:

Research the documentation for your chosen library to learn how to access repository data:

Note: Some API calls may require pagination to get complete results. Check your library's documentation to see if it already handles paginated responses or if you have to do it yourself.


Part 3: Compare and Validate (10 minutes)

Goal: Compare and document your manual findings from the web interface with the automated script results.

  • Compare each statistic
  • Are the numbers matching?
  • If there are discrepancies, make sure there are no bugs in your script. If the discrepency still persists, note them down in your report

Part 4: Apply to Another Repository (Optional, 5 minutes)

Goal: Test your script on a different repository to ensure it works generically.

  1. Run your script on the Lizard repository (terryyin/lizard):

    python mine_repo.py terryyin lizard 
  2. Report the findings - briefly document what the script discovered about this second repository.


Submission Instructions

Submit the following to Brightspace:

Required Submissions

Total Points: 10/10

  1. Manual Investigation Results (2.5 points)

    • Required Repository statistics
    • Three example PRs with:
      • Links to each PR
      • Screenshots showing:
        • PR with passing build
        • PR with failing build
        • PR with inline comments
  2. Mining Script Code (5 points)

    • Your mine_repo.py file with implemented mining logic
    • Your requirements.txt file with any dependencies needed to run your script
  3. Comparison Report (2.5 points)

    • Screenshot of your script output
    • Side-by-side comparison showing:
      • Your manual statistics
      • Your script output
    • Report of any discrepancies
  4. Optional Repository Results (if completed)

    • Output from running your script on a second repository
  5. GenAI Disclosure (if applicable)

    • Brief note describing any AI tools used and how they assisted you

Submission Format

  • Combine all documents/screenshots into a single PDF
  • Attach your mine_repo.py and requirements.txt files separately
  • Name your files: LastName_FirstName_Mining.pdf, LastName_FirstName_mine_repo.py, and LastName_FirstName_requirements.txt

Resources


Acknowledgement

This exercise was developed with the assistance of Cursor, an AI-powered code editor. Cursor was used to:

  • Brainstorm ideas for the exercise structure and tasks
  • Draft and refine this README documentation

License

MIT License - See LICENSE file for details.

About

Hands-on exercise for Topic 1 - Code Review and Collaborative Software Development

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages