From a4da092f1437057620a14fcffa85ca9cc7e3b953 Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Sat, 7 Sep 2013 10:52:40 -0700 Subject: [PATCH 01/12] 2013-09-07.md weekly reflection --- 2013-09-07.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/2013-09-07.md b/2013-09-07.md index e69de29..2030513 100644 --- a/2013-09-07.md +++ b/2013-09-07.md @@ -0,0 +1,3 @@ +1. This weeks material has been mainly about setting up the environment. Some instructions are unclear and it was confused at time to set up. Google and youtube video helps a lot and I was successful to set up the softwares that we need for the class +2. I realized how windows will not cut it. It is so much easier to do thing with terminal commands than trace things through windows GUI. The commands are not the easiest thing in the world, but it is definitely worth learning. +3. I would read books about basic UNIX and basic computer knowledge. I dont know how you can be in this class if you dont know what a terminal is, or knowing basic navigation like cd and ls. From 61968b511422b098aff9f3ce7db2d4bfe07beee1 Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Sat, 7 Sep 2013 20:22:46 -0700 Subject: [PATCH 02/12] Create README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..cd9727e --- /dev/null +++ b/README.md @@ -0,0 +1,2 @@ +reflections +=========== From 06746f60a839a7bb079d5a81171c85d8a60f8477 Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Fri, 27 Sep 2013 16:20:46 -0700 Subject: [PATCH 03/12] 9-27-13 --- 9-27-13 | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 9-27-13 diff --git a/9-27-13 b/9-27-13 new file mode 100644 index 0000000..358c6e3 --- /dev/null +++ b/9-27-13 @@ -0,0 +1,15 @@ +weekly refelction 9.27.13 +1. I dont think i have enough reserach experience to really reflect on the topic. Obviously the result needs to be able to be reproduced to have credibility, + but to what extend is it become too much? How much error can we tolerate before it becomes a burden? FOr example, the FAA requires all the + air plain to have the same cpu/software bindin. How much infrastructure is created to reduce the potential of crashing when its already pretty low? + are we really better off to spend this much resources to get that extra perk? would it be economically better off to allow for a larger marigin of error if we are + doing some sort of cost benefit analysis. And for science in general. I think that too much attention to this reprocibility idea is slowing down the pace of technology advancement. + Just look at how far we have come before we really focus on reproducibility and similar idea. I guess reproducibility provides a consistent platform + across the team when colaborating, which is pretty cool. How about generalization on the result tho? If one can only provide the same result under some strict condition, then how can + we generalized this result. since the world is kinda not in a stable environment? +2. general thoughts on the class: this is going too slow. We are 5 weeks in and we have not learned anything. Even the idea of reproducing, we have not learned that yet. + We spent like 8 class period configuring software and talking about team work, but these are just some abstract idea and its all talk. There is no concrete process at all. + Werent we suppose to be doing things with sesmetic reserach? Nothing we have done that has contribute to the goal. Also, we do have guide lines and step by steps on the set up, but somehow + no one knows how to ste up the software. There is a thing called google..its not that hard. + +3. Just saw the course map. It is very exciting and there are a lot of things I would like to learn. Hopefully we can get the class finally started. From e77d4a86ee90217a1d4ad9328b2a317f3c2b19d9 Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Sun, 6 Oct 2013 22:38:59 -0700 Subject: [PATCH 04/12] Create 10-6-13 --- 10-6-13 | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 10-6-13 diff --git a/10-6-13 b/10-6-13 new file mode 100644 index 0000000..fbfecbc --- /dev/null +++ b/10-6-13 @@ -0,0 +1,7 @@ +This week's of material is pretty cool. The class is finally back on track (although nothing that we did previously matters kinda..) +The challenging part is that what we do in class has nothing to do with the project that we are doing...We spend time learning about random stuff about collaboration and how to do data science..but we only talk about it. What? We didnt learn anything concrete about how data is processed and now we have this project? I know there needs to be progress but this progress is too fast that I cant really adjust it. From doing nothing to doing things that we have never learned. It seems like not many poeple have experience in programming and word processing, but now we are assigned this project and only has like what a weekend to do it? there is not much guideline to it either. Being thrown into the deep end of the pool is not so fun... And also I do not like the division of the labor. I think it is important to see the whole process of data science, not just a small portion of it. I think at least at the level of this class we should be able to see the whole process, that everyone should be involved in doing data mining, doing analysis, visualization. And the presentation part is stupid that they dont really do anything. This is not a public speaking class I dont see how these people can benefit from doing stupid shit like just talking about some graphs and stuff. We all needs to get our hands dirty. + +If I could start over this week, I would prepare myself to code. I would fresh up on R, python, and my statistical analysis knowledge. I need to brush up on different tupes of hypothesis testing and stuff. Learn how to clean up data. + + +but seriesly. This is a stat class, but nothing we do in class has anything to do with statistics really, and that needs to be changed. From 6fe5cfeebd52c16df1c3ca0e9250a6c245e75155 Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Tue, 15 Oct 2013 01:24:57 -0700 Subject: [PATCH 05/12] Create 10-15-13 --- 10-15-13 | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 10-15-13 diff --git a/10-15-13 b/10-15-13 new file mode 100644 index 0000000..7555066 --- /dev/null +++ b/10-15-13 @@ -0,0 +1,5 @@ +The presentation from different groups were pretty sweet. We were all thrown into the deep end of the pool, and somehow all the groups figured something out. While i was the coder and wasnt thinking so much about reproducibility, it was amazing how most of the groups were able to acheive that and reproduce each otehr's result. I have learned a lot from the ecercise: not only did i get some hands on experience from data cleaning and mining, I also see the importance of reproducibility and how it is essential to data science. The road blocks that I faced were mostly programming chellenge for not being familar with the syntax. For example, I wasnt sure how to use regex in python to grep the info i wanted, and I wasnt sure how to create a csv file for my teammates. Stackoverflow and other coding forums hekped out tremendously and I was able to resolve most issues. + +My ahha happened when we actually worked as a team. Through out my education so far, all the projects were basically one person, or everybody doing every thing. Not much team work were envolved. However, from this project, I didnt need to worry about statistical analysis, which i was not so familiar with, or the final presentation, which i really dont know how. We could each do our own part and pipeline the process to generate some pretty good result, without really needing to do every part of the project. + +I would start the project earilier. We basically started the day before it was due and kinda just put everything together. That includes looking at other group's result and copy their data for pipeline purposes. i.e. we used the data generated from other group for statistical analysis, while i work on data cleaning to get that part of the project. Looking at some example is obviously not the same as creating everything from scrach. But it did speed up the process a lot. My advice to other group would be get everyone to what they do best for the best productivity. From 51a16f1a523929ec4ddc8574ebef1f9ae938bf08 Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Mon, 21 Oct 2013 01:37:08 -0700 Subject: [PATCH 06/12] Create 10-19-13 --- 10-19-13 | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 10-19-13 diff --git a/10-19-13 b/10-19-13 new file mode 100644 index 0000000..2a9c59e --- /dev/null +++ b/10-19-13 @@ -0,0 +1,5 @@ +The data cleaning is not fun...I am a cs guy, not a seismology dude. I should not need to know what the datas are, what each of those abbriviations are... hard to figure out what data is what. Kinda hard to clean data when you dont really know what info you want. And also, the data is really messy..i guess this is the point of this exerise? We spent way too much time trying to figure out what the data actually is..trying to figure out the format and stuff. Distionary on dictionary on dictionary..Its kinda bumming that the actual programming part is so little. Once you figure out the data structure, parsing them is actually really easy. The tough part is getting the installation. i.e. what version of python/pandas we are using. How to work with json files. After that, its a piece of cake. + +The ahh moment is that data science is kinda boring..at least at the cleaning part. It is really just recodnize parrtern, like the format and stuff. Not so exciting. I guess i always see data science as like machining learning and inference and cool stuff like that, but there are a lot of ground work that is basicallt grinding numbers and letters. Not the most exciting thing in the world. I mean it couldnt be that hard to have nice data tables when you are collecting data right... and version/software control. software update cycle is super fast. New things/feature rolls out super quickly. And it is a mess to try to update stuff. I mean i spent like 2 hours trying to configure the python addons to acomplish the very fist task..getting the data from the site. It should not be that hard... + +Nothing really new this week. Pretty cool to practice making data table in python tho. Good refreshment on Series, dataframe...etc. From f1ed1d7f5fbd0fc23eb0b62786cdd9ca51ecb474 Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Sat, 26 Oct 2013 17:56:55 -0700 Subject: [PATCH 07/12] Create 10-26 --- 10-26 | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 10-26 diff --git a/10-26 b/10-26 new file mode 100644 index 0000000..bc83e53 --- /dev/null +++ b/10-26 @@ -0,0 +1,4 @@ +Nothing really happening this week. Went over code and i just remember how to write code. gotta be more functional. Our code has zero reproducibility. Like there was no funtion, everything was kept as a varaiable. I guess it kinda is a function..but was really not structured as what was being presented in the class. It was imporessive that terrisa somehow turn it into a legic program. I need more incapsulation. + + +About open source: i have no problem with people stealing my code. This is how science procress. I have benefited from stack over flow, git, forums..etc countless time. I am more than willing to return the favor and let other people se my result. The code and syntax is the smallest part of science. These are just tools, there is no reason to see it as personal property and not allow other peopel to use it. I mean what is important is the idea, the idea to use the tools that we have. The whole thing about cs is not to reinvent the wheels. So if i have some result, and someone else benefit from it, im completly ok. From 2e3a594b40b9780c50ed64ce844f9c2223cee92a Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Sat, 2 Nov 2013 19:20:38 -0700 Subject: [PATCH 08/12] Create 11-2-13 --- 11-2-13 | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 11-2-13 diff --git a/11-2-13 b/11-2-13 new file mode 100644 index 0000000..fc53efa --- /dev/null +++ b/11-2-13 @@ -0,0 +1,6 @@ +This week we finally get to start working with real project that we said we would be working on, the earthquake model. My group is under data curation, so our main focus is to extract data and clean up for further analysis. The data, of course, is messy and the format is wierd. That was the biggest road block, that we had pretty hard time to figure out just exactly what to do and how to read the data format. It required a lot of google searching and trial and error. Luckily we figure it out at the the and the next time would be much easier to work with this kind of format. + +It was kinda of unfortunate that Im only seeing the Data curation. I was really insterested in statistial model and the math behind it. I want to see the analysis part of this project and to be able to figure out what analysis needs to be done. I am a cs major and im pretty familir with data curation, i would like extra experience in the stat part. I guess i will try to switch group next time. + + +Not much ahha moment as we have done the similar things before. From a18671c7d207a50d11f30077403f437f204fc83c Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Tue, 5 Nov 2013 10:58:44 -0800 Subject: [PATCH 09/12] Create 11-5-13 --- 11-5-13 | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 11-5-13 diff --git a/11-5-13 b/11-5-13 new file mode 100644 index 0000000..40d57a4 --- /dev/null +++ b/11-5-13 @@ -0,0 +1,3 @@ +This week we sync up what every group is doing. There are some really useful information for everybody to know. We have a clearer view of what is going on, where is everything going, and what everyone is doing what. Carl's presentation clears up a lot about what the models are and what we should be working on. Theres a lot of ahha moments. + +The class is super effecive too. Cut through the bs and going straight to the meat of the project and delicaate tasks for all the groups. We defined are problem to solve and went straight to solve it. From 6c6f782eb1f9ea1e000768a74e0bf4f6bf3575ca Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Sat, 16 Nov 2013 11:02:42 -0800 Subject: [PATCH 10/12] Create 11-16-13 --- 11-16-13 | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 11-16-13 diff --git a/11-16-13 b/11-16-13 new file mode 100644 index 0000000..119e3f3 --- /dev/null +++ b/11-16-13 @@ -0,0 +1,3 @@ +Great to see that this project is going in the right direction. In the class we met up and now we are trying to undertand D3 and other R tools that can help us with the project. There is a miscommunication, however, that Carl did what out assigned task is. We were assigned to do the MDA model and the error diagram, but it seems like that has already been done using Luen's code in Carl's presentation. Id be great if he had given us the code and we can take over. He has done a lot for this project, and id be great if we can contribute too. + +A suggestion is that we start putting things together. The individual code, the results, and everything that everyone is going so far. The end product is this paper right? And i think we are close to it, and itd be a great idea to start wrapping things up since we only have few weeks left. And based on the communication between people in this class..I suspect we will need multiple class period to sync everyone up. From edad64217fe3402b9883d63b260ca3cc4aeb1650 Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Sat, 23 Nov 2013 15:11:33 -0800 Subject: [PATCH 11/12] Create 11-23 --- 11-23 | 1 + 1 file changed, 1 insertion(+) create mode 100644 11-23 diff --git a/11-23 b/11-23 new file mode 100644 index 0000000..063e89a --- /dev/null +++ b/11-23 @@ -0,0 +1 @@ +Am i doing the reflections right? From c09b6409edb6227ece1417b9cc166ee6179099d2 Mon Sep 17 00:00:00 2001 From: wxadqcze Date: Fri, 29 Nov 2013 17:50:38 -0800 Subject: [PATCH 12/12] Create 11.28 --- 11.28 | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 11.28 diff --git a/11.28 b/11.28 new file mode 100644 index 0000000..8956aaf --- /dev/null +++ b/11.28 @@ -0,0 +1,3 @@ +The response from the course survey is much appreciated. Im glad that my hard work did not go to waste because of my lack of contribution on github; it was becasue we havea designated github pusher for our group. Excited to see the end result of our groups and everyone elses hard work. It has been bumpy the entire way, and Im glad to see something finally happening and that Im proud to be part of. The google docs and all the software we used for colaboration is extremely helpful, as if else we would have no way to incorporate the hard work from all 40 people in this class. We were able to sync up and figure out whose group is going what, and see everyone else's time line so that we can have expectation on what result people will give us and we can have plans on tackling our assigned problem. + +We have the rest of the weeks figure out. We are on track tp finish the rest of the project and make something presentable by the end of the course. However, we will probably face some serious roadblock as we have no idea how to use D3, and i don see the D3 team will contribute much... We will probably end up using R for visulization. Also, the math part of MDA model is still unclear: how do we figure out the parameters, the error diagrams, how to improve, is it really going to be better the ETAS?