diff --git a/2013-09-07.md b/2013-09-07.md index e69de29..4a6a831 100644 --- a/2013-09-07.md +++ b/2013-09-07.md @@ -0,0 +1,7 @@ +This week, I have been introduced to a lot of new stuffs that I have never met. Being in the "R-dominating" world of statistics for too long results me in the situation of knowing nothing else but R. That is kind of sad to realize that in my last year of college. The desire to know more about other technologies has driven me to stay in this class which is full of unfamiliar things. I have met many "technical difficulties" this very first week. Some of them are as small as not knowing how to folk an updated repository or how to register a username on IRC. These small questions were quickly solved during the group time on Tuesday. Actually some of my groupmates had the same confusion as I had. I think those problems occor because of we are still new to those settings, once we try some more times , we will get familiar and become proficient. However, there are some bigger questions. For me, the most crtitical challenges are all about the virtual machine. + +I met the first ostacle when I had to enable hardware-level virtualization in BIOS. It sounds silly that I had tried a couple of times turning my computer on and off trying the F-bottoms to get me to my BIOS, and I ended up going online and google the key to access BIOS for sony computers. I will remember from now on that F2 gets vaio's to the BIOS settings. + +Then my next obstacle came up when I failed to command the virtual machine to do anything. I first brought this up in my group during class and we found out that something must be wrong because I had never been asked to set up login name and password during the whole process of setting the machine. Then I turned this question to Aaron, and I was told that my virtual machine is not fully installed. Besides emailing Aaron for an appointment to get the installation done, I also tried many other ways to figure out what I had missed when it is supposed to be done. I googled many virtualbox installation illustration steps as well as FQAs. At the same time, I also went back and traced the procedure posted in the homework-01.md again. But I still had no clue whether I should uninstall everything and start over again since it was not fully installed. What I googled did not help that much. Thinking uninstall and reinstall might take too much work and might not be necessary, I thought of maybe removing the old VM and recreating it again may be worth-trying to reduce the pool of possible mistakes. So I tried and succeeded! I had my "Aha!" moment when I figured out I did not choose to install the server when I was setting the machine... Though when I realized this time and I follow the steps for installing, I failed to install all the softwares when we was asked to choose what software I want to install besides the core of the system; I went back and unchecked the "manual package selection" then it succeeded to install completely. Everything looks "normal" now--same as everybody else in my group. And I tried to run the command we used in class on Thursday about ipython notebooks and the VM reacts normally. So I assume I have made it work! Of course, I will still check with others or the instructors that I have actually succeeded. + +Looking back at this week, I did a lot of trying and testing--trying different approached and testing if they works. I have noticed that sometimes it is a good thing to pause and step back for a while to think about it from another starting point then try a different approach instead of keep trying over and over again and get the same error. diff --git a/2013-09-14.md b/2013-09-14.md new file mode 100644 index 0000000..9d3a737 --- /dev/null +++ b/2013-09-14.md @@ -0,0 +1,30 @@ +reflections-1 +============= +2013-09-14 + +This week, we have been working on installing vagrant. Actually I am not quit sure about what we are trying to get +'vagrant up' to work for. I have some vague feeling that Aaron wants us to do something with tht ipython notebook with +this 'vagrant' the the command processor. To be more honest, I have been having this vague feeling ever since we had +touched on the virtual machine. Because I am not a CS major, and I think many of us aren't either, I don't have enough +basic knowledge about computers. So sometimes I was not able to follow the ideas fluently, such as how to use ** software +to help us build a *** environment which allows group contributions towards the same project. I know this sentence sounds +non-sense, but that is the best I tried to catch. Regarding to my difficulty of catching the objectives, I hope we can +have more clearly explained objectives so that I can have basic ideas about what we are aiming for and what a specific +step will help us achieve the objective. And I also think it would be helpful if we could have more clear instructions +on each step. If so, we will be more clear about whether we have done the stpes correstly, even if we haven't we can still +follow the later steps as soon as we get previous roadblocks solved. + +Something that I like regarding this week is the way that we have our groups set up. Within each group, we have our +technical lead and our operational lead. I think it is pretty efficient that we can have some small problems solved within +our group first before asking for GSIs' help. And we actually have the opportunity to share our progress and roadblocks +with our groupmates. In such a way, we don't feel lost and fallen too behind anymore. Breaking into groups makes the goal +of keeping everybody on the same page more achievable. On Thursday, we had discussed many issues that we think should be +improved. We all think we need more communications outside of class, especially when IRC does not really help when we are +all offlined. We have similar feelings that we should have more clear agenda for everything. + +Regarding to more off-class communications, our group decided to try with facebook groups. We already have our facebook +group active and started to keep track of each other. As a operational lead of the group, I asked our group to briefly +report how far they had gone successfully. We have slightly different stages around getting 'vagrant up' to work and further +attempts to access ipython on the browser. I think I have 'vagrant up' run and virtualbox ready (according to GSI Christ). +But I don't know what further steps I should be working on...So I am still waiting for further instructions, and so is my +group. diff --git a/2013-09-21.md b/2013-09-21.md new file mode 100644 index 0000000..2500830 --- /dev/null +++ b/2013-09-21.md @@ -0,0 +1,12 @@ +2013-09-21--Siyang Zeng +============= + +This week is mainly about research experiences and how statistics plays a role in analyzising research results, shared by the two guest lecturers. Aside of that, we are asked to keep ourselves on track--tackle vagrant related problems and have ipython notebook opened successfully on the web browser through virtual box. + +Windows system: +I have windows on my computer originally. I followed everything for running vagrant, however I had problem running 'vagrant ssh' as it returned an error message saying that I did not have SSH installed in my computer. Then I spent a lot of time figure out which ssh client to install and how to install and configure before the GSI announced that we are going to run vagrant on linux system for every windows users instead. I think there's a problem with the gap between what the professor expected us to have already known and what background konwledge about computers stat-students actually have. As we are expected to have ssh installed in our computers(or ssh may be automatically installed for certain systems) but we do not even know what ssh is. Before I got linux system installed, I had succeeded accessing ipython notebook from a command terminal and will be directed to the web browser but not through the virtual box. + +Linux system: +The GSIs had scheduled a session for helping us install linux system and get everything on track. I think that session was sufficiently helpful. During that seesion, we could pair up or group with people that had the same system and were at the similar stage of installing. In such a way, we could discuss and help each other with problems that one may faced before and solved by the GSI. I really liked the way Chris summarized all the steps from starting vagrant to access ipython notebook from the web browser. The list of steps not only helped us keep track of what steps we had done and what had not, but also helped us to be able to reproduce and perform the procedures on our own instead of just following and not knowing what the steps are for. I have successfully completed all the steps and am able to do it on my own. I think my next thing to do is to get familiar with linux system. + +Another thing I am concerning is the correct way of 'submitting' our reflections. The way I, and my group, have done is creating a new repository "Reflections" and starting to post new files for each week's reflection. However I have heard that we should fork the professor's 'reflection' repository and edit by posting our reflections, then we should pull request to the professor. I did not know this until Tuesday. And I saw that only a few of our classmates actually did pull requested their refections. GSI said she will ask the professor again for detail instructions about submitting reflections. But I think I should address this with my group, and suggest my group mates to fork and pull request for now while waiting for further instructions. diff --git a/2013-09-28.md b/2013-09-28.md new file mode 100644 index 0000000..85acfbc --- /dev/null +++ b/2013-09-28.md @@ -0,0 +1,12 @@ +2013-09-28(Siyang Zeng) +============= + +About reproducibility: +This week we talked about reproducibility. Through lectures, ideas and experiences shared by Chris and Kristina, I started to have some sense about the point of reproducibility. To be honestly, I have never heard or thought about reproducibility before. Because this topic seems not popular and widely influencing, I have never really thought about the meaning and importance behind this. + +However, after this week, I gained many thoughts about it. In my opinions, I think the point of emphasizing reproducibility is about accuracy and reliability of data used to support any conclusions. This idea could be blended into a statistics sense. As we usually model simulations and experiments with independent subjects and run various tests to try to find out if a specific phenomenon appeared by chance or it is meant to be. + +So the fact that if one could reproduce the whole modelling process and get the same result, which could be thought as if a phenomenon would happened if an experiment is to be repeated under the same condition, would be necessary to verify that a result is reliable. This is the way I understand reproducibility. But it is the only point I see why reproducibility is important in data science up to this point. + +About the course: +As we have went through all the preparation procedures(installing everything), I am much clearer about what we are heading to now. Especially after Aaron had explained why virtual machine is necessary for our course, I finally see the point of installing linux system just to get virtual machine to work well. And the way Chris illustated the relations between all the different stages of a research study (observations-->data-->model) also helped us see what our roadmaps should be. diff --git a/2013-10-05.md b/2013-10-05.md new file mode 100644 index 0000000..81d38b4 --- /dev/null +++ b/2013-10-05.md @@ -0,0 +1,21 @@ +Reflection of the week 2013.09.30-2013.10.04 +------------------------- +On Tuesday, we are categorized according to our leanring styles as data curators, data analyzers, visualizers, and presenters. And we are assigned into groups of 4 to finish a project. Project explanation is uploaded on github. + +On Thursday, Prof. Stark continued his presentation about earthquake predictions. +Notes are on: https://github.com/SunnySunnia/Group7/blob/master/2013-10-03.md + +Some concerns or feedbacks on this week: +I think the way that the project is just uploaded on github without a clear explanation directly to us makes us felt confused for many things: +For example, what are we supposed to do with the example.cfg file; where should we input our real bConnected key whereas we should not. +How to import the gspread into ipython notebook. +We even dont know who will be working with us in a group until the end of Thursday. +What is a github push? +etc. + +Fortunately, we can post issures on the questionnaire repository. That is helpful in the way that we can see what problems other classmates are facing and what are the solutions to those problems. +But still, I think it would be more efficient if we can first meet our group in class first and get some consents on the plan of everything (deadlines for each role, when to meet with instructors, when to meet for final check with everything, etc). + +Furthermore, many of us are new to python. It takes us really long to figure out how to graph on the ipython-notebook by ourself. I think it would be much easier if we could have a tutorial session together. + +Right now at this moment, our group's data is not yet ready for further analysis. And I, the visualizer, just figured out how to plot functions, but still need to learn how to plot histograms or other statistically plots. diff --git a/2013-10-12.md b/2013-10-12.md new file mode 100644 index 0000000..7e70ce0 --- /dev/null +++ b/2013-10-12.md @@ -0,0 +1,22 @@ +Reflection on the project: +------------------------------------------------------- +* I am the visualizer of Group1. +- On our way to finish the project, we had met many obstacles, most influencial ones are all technical ones: having problem saving .cfg file and fluently access spread sheet on ipynb, having no experiences on data processing and graphing on ipynb. +- we spent too long on just data curating, we got our final dataset on Saturday night. +- We had set our goal as making our project reproducible from the very beginning, so we decided to try everything on ipynb. I am not sure what exactly my other groupmates had been through, I had spent a long hard time figuring out how to graph on ipynb, from scratch. Luckily I figured out at least enough for us to present our findings. +- There was a small problem that one of our groupmates was out of town and basically too busy with other scheduled goals, we could not have a chance to meet everybody to discuss what each of us should be expected to complete. But luckily, again, we have pretty efficient conversations via facebook and google handout. +- We have a facebook group within the horizontal visualizers' group, which I think is a good way to communicate; however, each of us is in a different vertical group with different speeds and defferent approaches on visualizing, we in fact can only discuss as gerneral as what kind of plots are being made for the vertical groups. Also, due to matter of time, I became too busy to check out how everybody else was doing in our horizontal group as later approaching the presentation date. + ++ (within vertical group) +------- +- efficient communications. +- understood role of each member. +- though there are a lot left for us to figure out from scratch by ourselves, we did not give up on searching and trying. +- We had met Aaron and the GSI's, meeting and talking to them made us much clear on everything. We had many technical problems solved and become not as panic as before. + + +delta +----------------- +- should start earlier. +- should spend some time on explainning to groupmates how each part was completed, for instance, I should explain to my groupmates how my codes contribute to the plots so that I would not be the only one who can make modifications on the graphs; same for data curator and analyzer. One of our goals should also be learning, so we should all know what to do with each part. +- should plan ahead what are the expectations for each step: what final format of the data the analyzer and visualizer want from the curator, what kinds of plots the analyzer want to illustrate his findings with, how detail should all previous parts be explained so that the presenter knows which step to emphasize on, etc. diff --git a/2013-10-19.md b/2013-10-19.md new file mode 100644 index 0000000..d8c3f33 --- /dev/null +++ b/2013-10-19.md @@ -0,0 +1,14 @@ +Reflection 2013-10-19 +---- + +This week we are assigned a new homework focusing on parsing json data and parametrizing visual functions. +I think the focus of the week is how do you make sure people in the future can access the data that you used to drive your conclusion no matter how many years later. + +Regarding this homework, my role is to modify the codes so that we can visualize the earthquakes of whichever states we want to focus on. I feel more comfortable with this project, maybe because we are more clear about everything. + +Something that I am working on and trying to improve is to finish up setting up the windows of the map for each state. + +I have many "aha" moments whenever I figured out something new on plotting on ipynb. +--One specific one is when I figure out a way to combine everything into a function (including setting the corner coordinates) for every state but alaska! And I found out with Aaron that the reason is because Alaska is the state crosses both east and west atmosphere: it longitude runs from 130°W to 172°E. + +So now I have to figure out how to solve this bug. diff --git a/2013-10-26.md b/2013-10-26.md new file mode 100644 index 0000000..fc9d676 --- /dev/null +++ b/2013-10-26.md @@ -0,0 +1,26 @@ +Reflection +--- + +This week, we discussed a rough roadmap for this class for the rest of the semester. +For the rest of the semester, we have at least 3 things to do: +* Program (reproducible) +---data: where from? what formaat? what inpu to model? what output? +---model: ETAS, 'simple/starks', poisson +---Github, AWS + +* Paper +---abstract-summary,ETAS +---intro +---methods-how did you do it? steps +---esults(negative result?) +---citation-format? how many? + +* Public understanding of science +---for people dont want to read the paper in detail +---update wikipedia? +---short presentation? + +I think having some time during a week to talk about future goals and agendas will be a good way to keep things on track. Doing so keeps us from feeling lost about why we did something or what to do next. + +However, we still need more ideas about how to divide groups and tasks. I think one thing that helps us figure those out is the issue tracker in the new repo. That collects all of our questions about everything, most of the questions posted are still waiting for answers from Aaron. As we will be split in to groups not only depending on roles but also depending on tasks, I think we need more instructions or explanations on how and when to slip. For example, do we discuss and come up with different tasks in class then we sign up to the tasks? Or could we form our own groups and Aaron assigns us different tasks? Or a mix of the previous? + diff --git a/2013-11-02.md b/2013-11-02.md new file mode 100644 index 0000000..6e9191f --- /dev/null +++ b/2013-11-02.md @@ -0,0 +1,17 @@ +I think we have been discussing and giving feedbacks all the time on lectures during this week. + +As a visualizer, my goals are: +* Be clear about each step of this project, ie, what each of the horizontal groups' role. +* Be familiar with plotting tools. +* Have rough ideas about what plots we will need. +* Figure out what we need from analyzers and/or data curators. + +Some questions to consider: +* What format do I want the data in? +* How should I start working if I havent received the data yet? +* How has the data been visualized in Luen's paper? +* What modifications should be done on top of Luen's visualization? +* Would visualizations be different between the paper vs. presentation? + +What I am working on now: +* use the earthquake data from last homework, plot some different graphs to see if I can find some patterns on the occurances of the earthquakes. diff --git a/2013-11-09.md b/2013-11-09.md new file mode 100644 index 0000000..c0306fa --- /dev/null +++ b/2013-11-09.md @@ -0,0 +1,7 @@ +For this week's reflection, first I would like to say we have made a lot of progress in terms of making clear of our final goals, decomposing project into tasks for groups, seting clear timeline for the whole project, and completing tasks for last stage. + +By now, we have our detailed proposal for each week which I think will save us a lot of time when we are collaborating within our own subgroups. + +As much as I am looking forward to get this project finished successfully, I also regularly check out how other groups are doing and how may their work help ours. I found that github sharing and massaging becomes more and more helpful. + +For the sake of hoping to advance our final product, I am also thinking some creative visualization of not only illustrating our results(error diagram) but also explaining our model(how are we picking and tuning the parameters) and our process of reaching our results in terms of the whole class. I think there are many we could do in terms of visualizations. I will go ahead and discuss with my group once we have completed our current task--ploting ECDF's of interarrival times of earthquakes of magnitude in sub-ranges(say 4.5-5) for the analyzers to better observe a rough trend for estimating a window function. diff --git a/2013-11-16.md b/2013-11-16.md new file mode 100644 index 0000000..4396a3a --- /dev/null +++ b/2013-11-16.md @@ -0,0 +1,7 @@ +This week, our group presented the plots of ECDF's of each magnitude range and tested on tuning the parameter u in Luen's code. We also produced plots of successors' arrival times after each event. + +I also went to Prof. Stark's OH and gained many thoughts on guessing a function for window length depended on magnitude. I will first share with my group and try to go a little bit deeper on that, then we will present on Tuesday what we have done. + +In terms of the whole course, I think it is getting more and more interesting because I started to see real progress on this project. However, I feel like there is a part of the class still doesn't know what exactly is going on and another part does not care how it goes. For our group, we basically find ourselves something to dig into for each week. But that seems not the case for other groups. I am not saying it is anything wrong with that. Because the project now is heavily relying on the analyzer groups, inevitablly, other functional groups would tend to step aside. Then even when the other groups want to give a hand on analyzing or testing models, they have been left too far and could not catch up and contribute at once. + +I do not have any ideas how to change this either. People are too adapted to their functional groups and the role that they have been classified to. A problem is that there seems not enough analyzer groups but you cannot made a new analyzer group from the rest of the groups as it may take too long to get them to understand everything and what have been acomplished. I personally really want to finish this project, even if we could not beat the other models. I think my group has the same goal. Though we are a visualizers' group, we already steped into analyzing and trying out some models. Hope we can make some good shots. diff --git a/2013-11-23.md b/2013-11-23.md new file mode 100644 index 0000000..6460b10 --- /dev/null +++ b/2013-11-23.md @@ -0,0 +1,7 @@ +Our group had discussed our goals and tasks for the rest of the semester, as we were asked to fill out the group evaluation sheet. For this week, since we were still not completely clear about Luen's code of how to compute error diagrams, we had decided that we would figure out our own algorithm to compare the models. Our thought was even though we will still try to figure out Luen's trick, we should not rely completely on waiting for Luen's expanation. So we did start trying and also asked if the Analyzer group 2 had developed the algorithm already. While waiting for analyzers' response, we found out that we were not able to ask our algorithm to vary tao on the x-axis of the error diagram for one specific window function. We had been trapped into this roadblock until the conference call with Luen. He said once he looked back on his code and clear everything out, he will write us back. So hopefully we will be able to vary window functions on top of Luen's code and compare between models with error diagrams and area under those curves. + +While we are still waiting, our group figure out a new direction. Similar to the optimization approach, we simply gather all the times of first successors of earthquakes that are larger that magnitude 3 within an increment of 0.1, then we find thier 90 percentile cut points as our window length for earthquakes within that magnitude range. Or to be a little more rigorous, we may fit a window function on the cut points. + +I am taking some IEOR course, up to my understanding about linear programing and optimization, one bigest roadblock will be to define the error function that we are minimizing. And I am not sure if AMPL(the software that deals with optimization) can handle intermediate computations or for loops. Comparing to that, the other approach, basing on Luen's code and try with different window functions more complicated than Luen's MDA but less complicated than ETAS, will sound much more doable. + +BUT, before everything, we need Luen's explanation on his code about how he inplement K as in Ku^M in his code, namely, how did he vary different tao for the error diagram. diff --git a/2013-11-30.md b/2013-11-30.md new file mode 100644 index 0000000..478e88e --- /dev/null +++ b/2013-11-30.md @@ -0,0 +1,7 @@ +This week is the Thanksgiving week, our group, the Quakers, is still working on developing new models! + +Since we only met once on Tues, and we were the only group presented, I cannt say anything else. Our group plans to make a presentation on what approaches we had tried and results to them, so that we can help sum up what we have tried and what we found for the presentation group. + +At this point, we are getting some small achievements. We will have detail results ready to present later. + +There arent a lot of things left to do with one is no taking part in testing models. I think it is time to have every group to do their task-report summarizing everything they have done.