In-Class Exercise: Mining Collaborative Development Artifacts

Technical Requirements

Before starting this exercise, ensure you have the following:

Software

Python 3.11+ - Download Python
GitHub Account - Sign up for GitHub

Skills

Basic Python programming (functions, API calls, data structures)
Basic understanding of GitHub PRs and Issues

Setup Verification

Run these commands to verify your setup:

python --version    # Should show Python 3.11 or higher

Overview

Collaborative code development is about multiple developers working together on a codebase. It uses tools like Git and follows practices like pair programming and code review. It also includes DevOps practices like Continuous Integration/Continuous Deployment (CI/CD). The goal is to make development within a team easier while ensuring code quality. Platforms like GitHub produce many artifacts during development that can be mined to understand how a codebase evolves. These artifacts include issues, pull requests, commits, and code reviews.

This exercise will focus on investigating these collaborative development artifacts. You will learn to:

Manually explore a GitHub repository to extract information about PRs and Issues
Write a Python script to automate repository mining using the GitHub API
Compare manual findings with automated results

Repository for Investigation (a.k.a. Target Repository)

For this exercise, you will investigate the Marksafe library repository:

Repository: pallets/markupsafe
URL: https://github.com/pallets/markupsafe
Description: Safely add untrusted strings to HTML/XML markup.

GenAI Usage Policy

The use of Generative AI tools (e.g., ChatGPT, Cursor, GitHub Copilot, Claude) is permitted for this exercise with the following guidelines:

Allowed Uses

Understanding GitHub API documentation
Debugging error messages
Learning Python syntax for API calls
Clarifying concepts about GitHub PRs, Issues, and API responses

Not Allowed

Generating the complete mine_repo.py script (you must research how to mine a GitHub repository yourself by reading the API documentation)
Using AI-enabled IDEs to generate the entire mining script
Having AI write your investigation findings

Requirements

You must be able to explain any code you submit
Document any AI assistance in your submission (brief note at the end of your PDF)

Exercise Instructions

Total Time: 75 minutes

Set Up Your Repository (5 minutes)

Create your own repository from this template:

Click "Use this template" button (green button at the top of the repo)
Select "Create a new repository"
Name it appropriately (e.g., SAhandons-topic1-yourname)

Clone your repository:

git clone <your-repo-url>
cd <repo-name>

Note: Do NOT fork or clone this template directly. Always use the "Use this template" button to create your own copy.

Part 1: Manual Investigation (20 minutes)

Goal: Manually explore the target repository using GitHub's web interface and collect specific information about its collaborative development artifacts.

Task 1: Repository Statistics

Go to the GitHub repository and investigate it to answer the following questions:

How many open Pull Requests are there?
How many closed Pull Requests are there?
How many merged (closed) Pull Requests are there? (Hint: Use filters to find merged PRs - these are a subset of closed PRs)
How many total Issues are there? How many of those are closed vs. open?
How many contributors have contributed to this repository?
How many total commits are in this repository?

Task 2: Example PR with Passing Build

Find one example of a PR that has a passing build (green checkmark ✅) and record:

PR number and title
PR URL
Screenshot showing the passing CI status

Task 3: Example PR with Failing Build

Find one example of a PR that has a failing build (red X ❌ or failed status) and record:

PR number and title
PR URL
Screenshot showing the failing CI status

Task 4: Example PR with Inline Comments

Find one example of a PR that has inline code review comments (comments on specific lines of code) and record:

PR number and title
PR URL
Screenshot showing the inline comments
Brief description of what the comment discusses

Document your findings:

Create a document with:

Repository statistics findings
Links and screenshots for the three example PRs (passing build, failing build, inline comments)
Brief record of any GenAI tools used during the investigation (if applicable)

Part 2: Automate with a Mining Script (40 minutes)

Goal: Complete the Python script (mine_repo.py) that automatically extracts the repository statistics you collected manually in Part 1.

Note: Starter code is provided in mine_repo.py. You need to implement the mining logic using the GitHub API.

Script Requirements:

Accept repository owner and name as command-line arguments
Select your desired approach to access GitHub data (e.g., via PyGithub)
Extract the following statistics:
- Number of open PRs
- Number of closed PRs (total closed, including merged and closed without merging)
- Total number of issues (closed vs. open)
- Number of contributors
- Number of commits
Display results in a clear, readable format. Example is provided below.

Steps:

Create a GitHub Personal Access Token:
- Go to GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)
- Click "Generate new token (classic)"
- Name: "Repository Mining Script"
- Select scope: public_repo (or repo if accessing private repos)
- Generate and copy the token
Create a .env file:
```
GITHUB_TOKEN=your_token_here
```
Important: Add .env to your .gitignore file!
Choose a library for GitHub mining:

Research and select an approach for accessing GitHub data (e.g., PyGithub, PyDriller, or GitHub REST API with requests) and add the chosen library to requirements.txt.
Write your script:

Implement the mining logic in mine_repo.py to extract and display the same 6 statistics you collected manually in Part 1:
- Number of open Pull Requests
- Number of closed Pull Requests (total closed, including merged vs. not merged)
- Number of total Issues (both open and closed)
- Number of Contributors
- Number of Commits

Create a Virtual Environment and Install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Script Output Format:

Example:

$ python mine_repo.py pallets markupsafe

============================================================
REPOSITORY MINING RESULTS: pallets/markupsafe
============================================================

- Open Pull Requests: [NUMBER]
- Closed Pull Requests: [NUMBER]
--- Merged Pull Requests: [NUMBER]
--- Pull Requests closed without merging: [NUMBER]
- Total Issues: [NUMBER]
--- Open Issues: [NUMBER]
--- Closed Issues: [NUMBER]
- Contributors: [NUMBER]
- Total Commits: [NUMBER]

============================================================

Resources for Research:

Research the documentation for your chosen library to learn how to access repository data:

PyGithub: Documentation | GitHub Repository
PyDriller: Documentation | GitHub Repository
GitHub REST API: API Documentation | Pull Requests API | Issues API | Repositories API
Requests library: Documentation

Note: Some API calls may require pagination to get complete results. Check your library's documentation to see if it already handles paginated responses or if you have to do it yourself.

Part 3: Compare and Validate (10 minutes)

Goal: Compare and document your manual findings from the web interface with the automated script results.

Compare each statistic
Are the numbers matching?
If there are discrepancies, make sure there are no bugs in your script. If the discrepency still persists, note them down in your report

Part 4: Apply to Another Repository (Optional, 5 minutes)

Goal: Test your script on a different repository to ensure it works generically.

Run your script on the Lizard repository (terryyin/lizard):
```
python mine_repo.py terryyin lizard 
```
Report the findings - briefly document what the script discovered about this second repository.

Submission Instructions

Submit the following to Brightspace:

Required Submissions

Total Points: 10/10

Manual Investigation Results (2.5 points)
- Required Repository statistics
- Three example PRs with:
  - Links to each PR
  - Screenshots showing:
    - PR with passing build
    - PR with failing build
    - PR with inline comments
Mining Script Code (5 points)
- Your mine_repo.py file with implemented mining logic
- Your requirements.txt file with any dependencies needed to run your script
Comparison Report (2.5 points)
- Screenshot of your script output
- Side-by-side comparison showing:
  - Your manual statistics
  - Your script output
- Report of any discrepancies
Optional Repository Results (if completed)
- Output from running your script on a second repository
GenAI Disclosure (if applicable)
- Brief note describing any AI tools used and how they assisted you

Submission Format

Combine all documents/screenshots into a single PDF
Attach your mine_repo.py and requirements.txt files separately
Name your files: LastName_FirstName_Mining.pdf, LastName_FirstName_mine_repo.py, and LastName_FirstName_requirements.txt

Resources

Acknowledgement

This exercise was developed with the assistance of Cursor, an AI-powered code editor. Cursor was used to:

Brainstorm ideas for the exercise structure and tasks
Draft and refine this README documentation

License

MIT License - See LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In-Class Exercise: Mining Collaborative Development Artifacts

Technical Requirements

Software

Skills

Setup Verification

Overview

Repository for Investigation (a.k.a. Target Repository)

GenAI Usage Policy

Allowed Uses

Not Allowed

Requirements

Exercise Instructions

Set Up Your Repository (5 minutes)

Part 1: Manual Investigation (20 minutes)

Part 2: Automate with a Mining Script (40 minutes)

Part 3: Compare and Validate (10 minutes)

Part 4: Apply to Another Repository (Optional, 5 minutes)

Submission Instructions

Required Submissions

Submission Format

Resources

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mine_repo.py		mine_repo.py
requirements.txt		requirements.txt

License

snadi-teaching/SA_CollabDev_Handson

Folders and files

Latest commit

History

Repository files navigation

In-Class Exercise: Mining Collaborative Development Artifacts

Technical Requirements

Software

Skills

Setup Verification

Overview

Repository for Investigation (a.k.a. Target Repository)

GenAI Usage Policy

Allowed Uses

Not Allowed

Requirements

Exercise Instructions

Set Up Your Repository (5 minutes)

Part 1: Manual Investigation (20 minutes)

Part 2: Automate with a Mining Script (40 minutes)

Part 3: Compare and Validate (10 minutes)

Part 4: Apply to Another Repository (Optional, 5 minutes)

Submission Instructions

Required Submissions

Submission Format

Resources

Acknowledgement

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages