Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions 2013-09-07.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<h3>Weekly Reflections for the week 10/20-10/26</h3>

<h4>The Importance of Reproducibility in Data Science</h4>

The most important concept I learned this week is the reproducibility.

Reproducibility is the key in addressing any issue that is not throughly-studied. Without reproducibility, it is simply impossible to convince anyone, even yourself that what you claimed is correct.
For example, someone is developing some code and a bug was caught during testing. Naturally people would like to modify the source code and fix it. However, if the bug itself is not reproducible with the original code, how can anyone tell if it has been fixed with the new code.
Similar story can be found in pretty any field that can not be described with an accurate modeling. And that's why we need to leverage data science to explore the mechanism behind what we saw. The bottom line is that any conclusion without reproducibility is not convincing, and the conclusion could be misleading or incompleted.

To achieve reproducibility, it is necessary to make sure following information is available to anyone who is interested in reproducing this research.

1. Make all original data, records, or logs available for anyone else to conducted independant investigation. It is also necessary to reveal the source of data so people can verify the conclusion with data from same or equivalent source.

2. Step-by-step description of data processing, including how to extract useful information from original data, records or logs; how to clean it up; how to organize the data such as groupping, sorting etc; how to conduct calculation or evaluation with the data; and how to generate the visualization.

3. Detailed explanation of how to interperte the visualization, what to expect and how to tell the difference.