Using Python for Data Mining

Python code to extract data from an Excel spreadsheet or csv file to produce reports. Original code by Neil Aitken, Digital Scholarship in the Arts, UBC

Summary

Often we will get data in the form of an Excel spreadsheet (.xlsx) or a csv file (.csv). Although some queries and research questions can be answered using just pandas (Python library for data parsing and analytics), sometimes we have more complicated questions to ask about the dataset that might require a more sophisticated approach. This toolkit turns each line entry in the dataform into a Record and then collects all the Records into a Catalog object. The Catalog provides prebuilt functions that handle common types of questions we might have about the dataset, as well as more intuitive ways to access and compare the data to create reports.

The example code is designed for working with a mock course reserve data dump (either csv or xlsx), but the approach should work or be easy to modify to meet the needs of other projects and datasets.

Features

The provided files include:

example_datamining.py: a code template that defines the Record and Catalog objects, then provides an example of how to use them for handling a dataset pulled from an Excel or csv file.
course_reserve_dataset.csv: a dummy dataset mimicking what an online course reserve data dump might look like
example_datamining_course_reserves.ipynb: an example of the code as embedded in a Google/Jupityr Notebook

You can create your own mock dataset using the pyDatasetGen toolkit - this might be useful if you need a dataset that meets different requirements (different elements, field names, etc).

Instructions (Google Colab or Jupityr Notebook)

If you want to test this process out without installing anything new, you can use the Notebook file (for either Google Colab or Jupityr Notebook).

Download a copy of example_datamining_course_reserves.ipynb
Download a copy of course_reserve_dataset.csv
Import the notebook into Jupityr or Google Collab
Follow the instructions in the notebook and upload the csv file

Instructions (Local Install)

If you prefer to run Python on your local machine, create a new folder for this project and follow these steps

Download a copy of example_datamining.py
Download a copy of course_reserve_dataset.csv
Open the folder as a workspace in VS Code (or whatever your code editor/IDE is).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Example_Datamining_Course_Reserves.ipynb		Example_Datamining_Course_Reserves.ipynb
LICENSE		LICENSE
README.md		README.md
course_reserve_dataset.csv		course_reserve_dataset.csv
example_datamining.py		example_datamining.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Python for Data Mining

Summary

Features

Instructions (Google Colab or Jupityr Notebook)

Instructions (Local Install)

About

Uh oh!

Releases

Packages

Languages

License

DiSA-Projects/example-datamining

Folders and files

Latest commit

History

Repository files navigation

Using Python for Data Mining

Summary

Features

Instructions (Google Colab or Jupityr Notebook)

Instructions (Local Install)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages