Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions 2013-09-07.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<h3>Weekly Reflections for the week 10/6-10/12</h3>

<h4>Regular Expression in Python</h4>

From time to time, people try to find some detail from enormous raw informaiton. Find these information by human is not productive or not feasible, not to mention possiblity of reproduce. Regular expression is a powerful tool for such scenarios and widely supported by most popular scripting languages. You can find any strings that matches pre-specified pattern with regular expression and this pattern is pretty flexible.

Python also supports regular expression as one of its extension modules. To take advantage of regular expression in Python, it is necessary to import "re" first.

For the first project, I found re.findall() is extreamly helpful in my data cleanup. It takes two arguments, pattern and the raw data. It returns a list of matched strings if there is at least one match, or an empty list otherwise.

For example, the learning style information in the questionnaire are encoded in the form of keyword-delimiter-value. The delimiter could be '-' or ':' plus one ore more spaces while the value consist of one or more consecutive digits. So at first the pattern I used was "Keyword.*[0-9]+" which meant to match a string started with Keyword and ended with at least one digit. I thought the keyword should be pretty straight-forward but it turned out that there was one exception. The keyword for read and write for most records are "Read/Write" but in one record I found "Reading/Writing" which cannot be matched by "Read/Write". It reminds me that how important it is to make the pattern flexible yet precise in regluar expression applications. This is what I think is the most valueable thing I've learned in the past week.