From 4aa3f3aa097a785ccb18bebc30af440a826e0a3e Mon Sep 17 00:00:00 2001 From: qi-zhang Date: Wed, 11 Sep 2013 22:45:57 -0700 Subject: [PATCH 1/6] For 1st week. --- 2013-09-07.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/2013-09-07.md b/2013-09-07.md index e69de29..e18a077 100644 --- a/2013-09-07.md +++ b/2013-09-07.md @@ -0,0 +1,11 @@ +Reflection for the week 9/1 to 9/7. + +It was pretty challenging to get ssh access to the Ubuntu server running on my Virtual box. + +By default there is only one network adaptor, known as “Adaptor 1” and it was attached to “NAT”. Such kind of configuration makes it impossible to access the box with its IP address, not to mention establishing ssh access. + +I tried to re-configure the network adaptor to attach to “bridged adaptor” and I’m able to establish ssh access to it. Soon I found that was not a good idea since the IP address of the box keep changing every time I restarted it. + +Finally I configured the “Adaptor 1” back to attached to “NAT” and enabled “Adaptor 2” and let it attached to “Host-Only Adapter”. With help from server guide I found on Ubuntu’s website (https://help.ubuntu.com/13.04/serverguide/network-configuration.html), I finally get the box to support “Adaptor 2” and now I can establish ssh access to the box. + +It was pretty challenging for me since this is the first time I learn some terms in networking. But I’m so glad I did it. From aba31eeafe43e1c3662b191234c42b4d2ea79fb5 Mon Sep 17 00:00:00 2001 From: qi-zhang Date: Wed, 11 Sep 2013 22:49:58 -0700 Subject: [PATCH 2/6] Update 2013-09-07.md --- 2013-09-07.md | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/2013-09-07.md b/2013-09-07.md index e18a077..831a521 100644 --- a/2013-09-07.md +++ b/2013-09-07.md @@ -1,11 +1 @@ -Reflection for the week 9/1 to 9/7. - -It was pretty challenging to get ssh access to the Ubuntu server running on my Virtual box. - -By default there is only one network adaptor, known as “Adaptor 1” and it was attached to “NAT”. Such kind of configuration makes it impossible to access the box with its IP address, not to mention establishing ssh access. - -I tried to re-configure the network adaptor to attach to “bridged adaptor” and I’m able to establish ssh access to it. Soon I found that was not a good idea since the IP address of the box keep changing every time I restarted it. - -Finally I configured the “Adaptor 1” back to attached to “NAT” and enabled “Adaptor 2” and let it attached to “Host-Only Adapter”. With help from server guide I found on Ubuntu’s website (https://help.ubuntu.com/13.04/serverguide/network-configuration.html), I finally get the box to support “Adaptor 2” and now I can establish ssh access to the box. - -It was pretty challenging for me since this is the first time I learn some terms in networking. But I’m so glad I did it. +Empty Template From cfff69f2796dfd9d9dab896972b34abdbc4a7716 Mon Sep 17 00:00:00 2001 From: qi-zhang Date: Sat, 19 Oct 2013 19:43:03 -0700 Subject: [PATCH 3/6] Create week7 --- week7 | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 week7 diff --git a/week7 b/week7 new file mode 100644 index 0000000..635ad14 --- /dev/null +++ b/week7 @@ -0,0 +1,12 @@ +

Weekly Reflections for the week 10/13-10/19

+ +

Working with JSON in Python

+ +JSON is format of hierarchical data file. The format of JSON file is similar to a flexible combination of dict and list object in Python. Hierarchical data file means same field name is reusable across the database as long as these field names are nested under different hierarch. That's why it is not a good idea to search for certain information with regular expression in hierarchical data files. To find certain information of a hierarchical file, it is necessary to figure the structure of a JSON file. + +In terms of file format, JSON file is similar to a dict object in Python, but it's value could be flexible combination of dict object and list object. Naturally, we can leverage concept of dict object and list object to parse a JSON file. + +The first step is to create a dict object from a JSON file with json.loads which is a function defined in json module. json.loads() returns a dict object. For example, + + +detail=json.loads(urllib.urlopen("http://earthquake.usgs.gov/product/nearby-cities/ci11380834/us/1382197630296/nearby-cities.json").read()) From ad4654b2dd592c15068b90f4b57eee3b49ec0d7a Mon Sep 17 00:00:00 2001 From: qi-zhang Date: Sat, 19 Oct 2013 20:23:43 -0700 Subject: [PATCH 4/6] Delete week7 --- week7 | 12 ------------ 1 file changed, 12 deletions(-) delete mode 100644 week7 diff --git a/week7 b/week7 deleted file mode 100644 index 635ad14..0000000 --- a/week7 +++ /dev/null @@ -1,12 +0,0 @@ -

Weekly Reflections for the week 10/13-10/19

- -

Working with JSON in Python

- -JSON is format of hierarchical data file. The format of JSON file is similar to a flexible combination of dict and list object in Python. Hierarchical data file means same field name is reusable across the database as long as these field names are nested under different hierarch. That's why it is not a good idea to search for certain information with regular expression in hierarchical data files. To find certain information of a hierarchical file, it is necessary to figure the structure of a JSON file. - -In terms of file format, JSON file is similar to a dict object in Python, but it's value could be flexible combination of dict object and list object. Naturally, we can leverage concept of dict object and list object to parse a JSON file. - -The first step is to create a dict object from a JSON file with json.loads which is a function defined in json module. json.loads() returns a dict object. For example, - - -detail=json.loads(urllib.urlopen("http://earthquake.usgs.gov/product/nearby-cities/ci11380834/us/1382197630296/nearby-cities.json").read()) From 67c5925b9fd6a359297aa43242e12c10ffe972d1 Mon Sep 17 00:00:00 2001 From: qi-zhang Date: Sat, 26 Oct 2013 20:06:39 -0700 Subject: [PATCH 5/6] Update 2013-09-07.md --- 2013-09-07.md | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/2013-09-07.md b/2013-09-07.md index 831a521..5c1d20b 100644 --- a/2013-09-07.md +++ b/2013-09-07.md @@ -1 +1,20 @@ -Empty Template +

Weekly Reflections for the week 10/20-10/26

+ +

The Importance of Reproducibility in Data Science

+ +The most important concept I learned this week is the reproducibility. + +Reproducibility is the key in addressing any issue that is not throughly-studied. Without reproducibility, it is simply impossible to convince anyone, even yourself that what you claimed is correct. +For example, someone is developing some code and a bug was caught during testing. Naturally people would like to modify the source code and fix it. However, if the bug itself is not reproducible with the original code, how can anyone tell if it has been fixed with the new code. +Similar story can be found in pretty any field that can not be described with an accurate modeling. And that's why we need to leverage data science to explore the mechanism behind what we saw. The bottom line is that any conclusion without reproducibility is not convincing, and the conclusion could be misleading or incompleted. + +To achieve reproducibility, it is necessary to make sure following information is available to anyone who is interested in reproducing this research. + +1. Make all original data, records, or logs available for anyone else to conducted independant investigation. It is also necessary to reveal the source of data so people can verify the conclusion with data from same or equivalent source. + +2. Step-by-step description of data processing, including how to extract useful information from original data, records or logs; how to clean it up; how to organize the data such as groupping, sorting etc; how to conduct calculation or evaluation with the data; and how to generate the visualization. + +3. Detailed explanation of how to interperte the visualization, what to expect and how to tell the difference. + + + From 859070245351704ca81ea091ebd47a8cb816feb5 Mon Sep 17 00:00:00 2001 From: qi-zhang Date: Sat, 2 Nov 2013 16:35:52 -0700 Subject: [PATCH 6/6] Update 2013-09-07.md --- 2013-09-07.md | 22 ++++------------------ 1 file changed, 4 insertions(+), 18 deletions(-) diff --git a/2013-09-07.md b/2013-09-07.md index 5c1d20b..9c99400 100644 --- a/2013-09-07.md +++ b/2013-09-07.md @@ -1,20 +1,6 @@ -

Weekly Reflections for the week 10/20-10/26

- -

The Importance of Reproducibility in Data Science

- -The most important concept I learned this week is the reproducibility. - -Reproducibility is the key in addressing any issue that is not throughly-studied. Without reproducibility, it is simply impossible to convince anyone, even yourself that what you claimed is correct. -For example, someone is developing some code and a bug was caught during testing. Naturally people would like to modify the source code and fix it. However, if the bug itself is not reproducible with the original code, how can anyone tell if it has been fixed with the new code. -Similar story can be found in pretty any field that can not be described with an accurate modeling. And that's why we need to leverage data science to explore the mechanism behind what we saw. The bottom line is that any conclusion without reproducibility is not convincing, and the conclusion could be misleading or incompleted. - -To achieve reproducibility, it is necessary to make sure following information is available to anyone who is interested in reproducing this research. - -1. Make all original data, records, or logs available for anyone else to conducted independant investigation. It is also necessary to reveal the source of data so people can verify the conclusion with data from same or equivalent source. - -2. Step-by-step description of data processing, including how to extract useful information from original data, records or logs; how to clean it up; how to organize the data such as groupping, sorting etc; how to conduct calculation or evaluation with the data; and how to generate the visualization. - -3. Detailed explanation of how to interperte the visualization, what to expect and how to tell the difference. - +

Weekly Reflections for the week 10/27-11/2

+

Data Format

+For this week, we've clarified what each group should do for the final group project. For my part, an analyzer, I should collaborate with team mate to create a model simpler than EATS model but still make the same or better preiction. Since I'll deal with different data formats such as CSV, JSON, XML or KML and I have little background in CS, I spent time on details of these formats. CSV is the most straight-forward format which is pretty much like a dataframe and all information are arranged in a flat manner. JSON is based on hierarchical data structure and the grammar almost identical to that of list and dict object in PYthon. XML is also based on hierarchical data structure but the grammar is similar to the HTML language. KML is a popular variant of XML and the cool facts is that Google Earth is capable of understanding KML files and bring you to the location of the event descriped by a KML file. +CSV files are pretty easy to read, and JSON files is almost native to Python, KML has derived all benefits of XML and it can be presented in Google Earth in a perfect way. It is possible that we get the data in one format and the following application prefer to get information in another format, as long as there is a clean definition of how those data is organized and accessed, it is possible to convert the original data to any other format.