Skip to content

fdac25/news

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 

Repository files navigation

News CS445/545 09:45AM-11:00AM MKB-524

Dec 2

  • Tennis
  • Brawlhalla
  • Bird Calling
  • 2-AMP Automated Trading
  • Spotify Visualization
  • Football
  • Movie Ratings
  • PII
  • Cricket
  • Baseball
  • NASA
  • Spotify Horoscope
  • NBA
  • Reasoning Imputation

Class on Nov 25, Dec 2

  • Covid Analysis
  • Github Keys
  • spTrends
  • Life Expectancy
  • EPASites
  • Roomba
  • Cybersecurity
  • Finish MP3 Part 3 (Nov 25)
  • Final project presentations: please sign up

Class on Nov 18, 20

  • Work on class projects

Class on Nov 13: no remote participation

  • Work in groups on MP3 Part 2
  • Finish MP3 Part 2
  • Start on Part3

Class on Nov 11

  • Complete MP3 Part 1
  • Ensure data is ready for Part 2

Class on Nov 6: no remote participation

  • Introduce MP3 Part 1, 2, 3

Nov 4 no class, election day

Class on Oct 30: no remote participation

  • Complete MP2 Task 4 by Nov 2
  • Qualitative analysis primer

Class on Oct 28: optional in-person

  • Work on MP2 Task 4
  • Work on team project: Please make sure you document your work via issues/artifacts in your team repo
  • Questions, consultations

Class on Oct 23: no remote participation, participation will be tracked!

  • Introduce MP2 Task 4
  • Complete MP2 Task 3
  • Final project status updates: 2min max per project

Class on Oct 21: optional in-person

  • Work on MP2 Task 3
  • Work on team project: Please make sure you document your work via issues/artifacts in your team repo
  • Questions, consultations

Class on Oct 16: participation will be tracked!

  • Introduce MP2 Task 3
  • Complete MP2 Task 2
  • Final project status updates: 2min max per project

Class on Oct 14: no lecture, no in-person

  • Work on MP2 Task 2
  • Work on team project: Please make sure you document your work via issues/artifacts in your team repo

Class on Oct 9: no remote participation

  • Introduce MP2 Task 2
  • Complete MP2 Task 1

Class on Oct 2: no remote participation

Class on Sept 30: no lecture, no in-person

  • Work on class project proposal

Class on Sep 25: Optional in-person

  • I'll be responding to any questions, grading inqueries in person and over zoom
  • Work on class project proposal
  • Questions on class project proposal

Class on Sep 23: Optional in-person

  • Work on class project proposal
  • Questions on class project proposal

Class on Sep 18: no remote participation

  • Class project participation Only 74 students are listed. Need to know projects for the remaining 15 students (afeyerhe asalasva ggill5 hbhidya him2 jbrow327 jchen122 jcordwel jmalinen jdisalvo mbyest mcao12 mphan2 nupadhy3 suppalap xhu48)
  • finish presentations
    • Group Number: 13 : Vinni0627
    • Group Number: 14 : Cameronr11
    • Group number: 15 : Drowsystudent
    • Group Number: 16 : zmcknigh
  • Data Storage
  • Cloud computing

Class on Sep 16: no remote participation

  • Class project participation Only 66 students are listed. Need to know projects for the remaining students.
  • Presenting MP1 results by the representatives of each group for the entire class
    • the presentations will go in group order: 5m max!
    • Group Number: 1 : Kv-Lam
    • Group Number: 2
    • Group Number: 3 : jcordwel
    • Group Number: 4 : melonchomp
    • Group Number: 5 : Criley71
    • Group Number: 6 : dtwilkey
    • Group Number: 7 : brodiekovach
    • Group Number: 8 : meitantei63
    • Group Number: 9 : gcarson1
    • Group Number: 10 : xhu48
    • Group Number: 11 : frozensriracha
    • Group Number: 12 : DanielJoy6

Class on Sep 11: no remote participation

  • Finish LLM supply chains
  • Still no MP1 forks for: sean-ward034 (GR3) tyblue18 (GR8)
  • Presenting MP1 results within the assigned groups and select one presentatio via this issue
  • Work on class project proposal

Class on Sep 9: no remote participation

  • Boasters for class project
  • Final day to form teams for the class project
  • Continue work on on MP1
  • Still six forks missing for MP1!

Class on Sep 4: no remote participation

  • Boasters for class project
  • Finish Data discovery
  • Software Supply Chains and World of Code dataset
  • Work on MP1, including discussing with your assigned peer
  • Make sure you have
    1. Forked fdac23/Miniproject1
    2. Posted the idea for your analysis on your peer's fork
    3. Responded to the idea that was posted by your peer

Class on Sep 2

  • Question regarding MP1
  • Boasters for class project
  • World of Code dataset
  • Sep 1 10AM: outstanding invitations @bbamyi, @meitantei63,@minhcaooo34,@tlatawiec,@jakobDallas,@AgustinSV,@CTucker01: please accept ASAP
  • glakshma vjoshi2 are still not using gh: and hfh: markings correctly

15 Groups for MP1

  1. tlatawie klam5 asahoo dpate125 jprebola kylwboma
  2. kmahajan tduckwor jmalinen suppalap him2
  3. jcordwel rchennai sjayapra gmorale2 sward47 cdamron2
  4. cwhit163 aghazi2 teisenba pbhatt1 jaktseat spaladu1
  5. criley16 vjoshi2 jchen122 smohyud1 spate200 wdougla4
  6. asmit494 jql794 ctucke24 eyang7 bpatel40 jweil
  7. rperry21 mcao12 mmirusma bkovach spate201 mbyest
  8. tsomani ashittu gevans16 jdisalvo iweaver2 ndawson2
  9. jbrow327 therren2 tolson4 rsanz gcarson1
  10. xhu48 cramosme wsessoms lfarthi1 yhb368 tbissaho
  11. jdodd8 mphan2 mpatriki afeyerhe jprater8 jshastid
  12. danrjoy dmoon4 jtiemey2 jhowar72 rpate112 wsv346
  13. rpatel92 rtrenner glakshma vkonjet1 jvenkat1
  14. ggill5 sshriva2 lsd728 txh512 crader6
  15. bturne50 bcurry8 dchupryn ajoshi21 dpate122
  16. nupadhy3 asalasva hbhidya ahuang16 zmcknigh

Class on Aug 28

  • See the simple text analysis of your descriptions
  • Introducing the MiniProject1 process and template
  • Think about selecting the course project (see course projects for the prior years at fdac2[0-3], fdac1[6-9], fdac2[0-4], fdac for inspiration)
  • Boasters for class project (if you have an idea for the class project, please commit to fdac25/FinalProjectPitches)

Class on Aug 26

  • Boasters for class project (if you have an idea for the class project, please commit to fdac25/FinalProjectPitches)
  • Work on fdac25/Practice0: due before class on Sep 4
  • as of Aug 27, 06:40AM EST
    • still issues with: crader6 glakshma jbrow327 jdisalvo rsanz spate201 vjoshi2 (please make gh: and hfh: on separate lines, exactly these prefixes, and submit new PR)

Class on Aug 21

  • *** This is a graded assignment: will take points off if not completed by Monday, Aug 25 --- no exceptions *** If you have not received your invitation please:
    • make sure you submitted the PR and your netid.md ifile is in fdac25/students
    • the file has correct entries (if not, fix by editing your file and submit another PR):
   gh: githubid
   hfh: huggingfaceid

Class on Aug 19

  • Create a HuggingFace account at https://huggingface.co/
    • search for organisations by name (fdac25) in the search bar on the hub. the name will appear under the “Organizations” section
    • request to join fdac25 by clicking on the button shown in the screenshot below or
    • here
  • Create your github account
    • fork repo students
    • create your netid.md file providing your name and interests and what you want to get out of the course (at least a full paragraph, see example): see per fdac25/students/README.md, and
    • include your hugging face id like this on a separate line:
    • Example: hfh: Audris
      
    • include your github id like this on a separate line:
    • Example: gh: audrism
      
    • [upload your your public ssh key to your account on github](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account}. Once done, please
    • submit a pull request to fdac25/students - Submit the URL of your Pull Request in Canvas.
  • Make sure you do it a day before the next class so we can start ready

Information for remote participation via Zoom / Discord

Syllabus for "Fundamentals of Digital Archeology"

Simple rules:

  1. There are no stupid questions. However, it may be worth going over the following steps:
  2. Think of what the right answer may be.
  3. Search online: stack overflow, etc.
  4. Look through issues
  5. Post the question as an issue.
  6. Ask instructor: email for 1-on-1 help, or to set up a time to meet

Objectives

The course will combine theoretical underpinning of big data with intense practice. In particular, approaches to ethical concerns, reproducibility of the results, absence of context, missing data, and incorrect data will be both discussed and practiced by writing programs to discover the data in the cloud, to retrieve it by scraping the deep web, and by structuring, storing, and sampling it in a way suitable for subsequent decision making. At the end of the course students will be able to discover, collect, and clean digital traces, to use such traces to construct meaningful measures, and to create tools that help with decision making.

Expected Outcomes

Upon completion, students will be able to discover, gather, and analyze digital traces, will learn how to avoid mistakes common in the analysis of low-quality data, and will have produced a working analytics application.

In particular, in addition to practicing critical thinking, students will acquire the following skills:

  • Use Python and other tools to discover, retrieve, and process data.

  • Use data management techniques to store data locally and in the cloud.

  • Use data analysis methods to explore data and to make predictions.

Course Description

A great volume of complex data is generated as a result of human activities, including both work and play. To exploit that data for decision making it is necessary to create software that discovers, collects, and integrates the data.

Digital archeology relies on traces that are left over in the course of ordinary activities, for example the logs generated by sensors in mobile phones, the commits in version control systems, or the email sent and the documents edited by a knowledge worker. Understanding such traces is complicated in contrast to data collected using traditional measurement approaches.

Traditional approaches rely on a highly controlled and well-designed measurement system. In meteorology, for example, the temperature is taken in specially designed and carefully selected locations to avoid direct sunlight and to be at a fixed distance from the ground. Such measurement can then be trusted to represent these controlled conditions and the analysis of such data is, consequently, fairly straightforward.

The measurements from geolocation or other sensors in mobile phones are affected by numerous (yet not recorded) factors: was the phone kept in the pocket, was it indoors or outside? The devices are not calibrated or may not work properly, so the corresponding measurements would be inaccurate. Locations (without mobile phones) may not have any measurement, yet may be of the greatest interest. This lack of context and inaccurate or missing data necessitates fundamentally new approaches that rely on patterns of behavior to correct the data, to fill in missing observations, and to elucidate unrecorded context factors. These steps are needed to obtain meaningful results from a subsequent analysis.

The course will cover basic principles and effective practices to increase the integrity of the results obtained from voluminous but highly unreliable sources.

  • Ethics: legal aspects, privacy, confidentiality, governance

  • Reproducibility: version control, ipython notebook

  • Fundamentals of big data analysis: extreme distributions, transformations, quantiles, sampling strategies, and logistic regression

  • The nature of digital traces: lack of context, missing values, and incorrect data

Prerequisites

Students are expected to have basic programming skills, in particular, be able to use regular expressions, programming concepts such as variables, functions, loops, and data structures like lists and dictionaries (for example, COSC 365)

Being familiar with version control systems (e.g., COSC 340), Python (e.g., COSC 370), and introductory level probability (e.g., ECE 313) and statistics, such as, random variables, distributions and regression would be beneficial but is not expected. Everyone is expected, however, to be willing and highly motivated to catch up in the areas where they have gaps in the relevant skills.

All the assignments and projects for this class will use github and Python. Knowledge of Python is not a prerequisite for this course, provided you are comfortable learning on your own as needed. While we have strived to make the programming component of this course straightforward, we will not devote much time to teaching programming, Python syntax, or any of the libraries and APIs. You should feel comfortable with:

  1. How to look up Python syntax on Google and StackOverflow.
  2. Basic programming concepts like functions, loops, arrays, dictionaries, strings, and if statements.
  3. How to learn new libraries by reading documentation and reusing examples
  4. Asking questions on StackOverflow or as a GitHub issue.

Requirements

These apply to real life, as well.

  • Must apply "good programming style" learned in class
    • Optimize for readability
  • Bonus points for:
    • Creativity (as long as requirements are fulfilled)

Teaming Tips

  • Agree on an editor and environment that you're comfortable with
  • The person who's less experienced/comfortable should have more keyboard time
  • Switch who's "driving" regularly
  • Make sure to save the code and send it to others on the team

Evaluation

  • Class Participation – 15%: students are expected to read all material covered in a week and come to class prepared to take part in the classroom discussions (online). Asking and responding to other student questions (issues) counts as a key factor for classroom participation. With online format and collaborative nature of the projects, this should not be hard to accomplish.

  • Assignments - 40%: Each assignment will involve writing (or modifying a template of) a small Python program.

  • Project - 45%: one original project done alone or in a group of 2 or 3 students. The project will explore one or more of the themes covered in the course that students find particularly compelling. The group needs to submit a project proposal (2 pages IEEE format) approximately 1.5 months before the end of term. The proposal should provide a brief motivation of the project, detailed discussion of the data that will be obtained or used in the project, along with a time-line of milestones, and expected outcome.

  • Scale

letter percent
a 95
a- 93
b+ 90
b 88
b- 85
c+ 83
c 79
c- 75

Other considerations

As a programmer you will never write anything from scratch, but will reuse code, frameworks, or ideas. You are encouraged to learn from the work of your peers. However, if you don't try to do it yourself, you will not learn. deliberate-practice (activities designed for the sole purpose of effectively improving specific aspects of an individual's performance) is the only way to reach perfection.

Please respect the terms of use and/or license of any code you find, and if you re-implement or duplicate an algorithm or code from elsewhere, credit the original source with an inline comment.

Resources

Materials

This class assumes you are confident with this material, but in case you need a brush-up...

Other

Databases
  • A MongoDB Schema Analyzer. One JavaScript file that you run with the mongo shell command on a database collection and it attempts to come up with a generalized schema of the datastore. It was also written about on the official MongoDB blog.
R and data analysis
  • Modern Applied Statistics with S (4th Edition) by William N. Venables, Brian D. Ripley. ISBN0387954570
  • R
  • Code School
  • Quick-R
Tutorials written as ipython-notebooks

GitHub

Final Project Report outline

Similar to proposals, but note additional sections:

  • Objective (research question)
  • Data that was used: how obtained, how processed, integrated, and validated
  • What models or algorithms were used
  • Results: A description of the results
  • Primary issues encountered during the project
  • Future work: ideas generated, improvements that would make sense, etc
  • Org chart: rough timeline and responsibilities for each member

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •