Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions 2013-09-07.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<h3>Weekly Reflections for the week 10/13-10/19</h3>

<h4>Working with JSON in Python</h4>

At first, I tried to find the information of interest with regular expression. However it turns out that there are many matches and some of them does not make sense at all. So I tried to do some research on the format of JSON file and realized that it is a hierarchical data file. Hierarchical means same field name is reusable across the data file as long as these field names are nested under different hierarch. That's why regular expression might not work with hierarchical data files. To find certain information of a hierarchical file, it is necessary to figure out how was information organized in a JSON file.

Fortunately, JSON file is similar to a dict or list object in Python, but it's value or element could be flexible combination of other dict or list objects. Naturally, I tried to treat it as dict or list object and use divide-and-conquer method to study how was information organized.

The first step is to create a object from a JSON file with json.loads which is a function defined in json module. json.loads() might return a dict object or a list object. For example:

cities=json.loads(urllib.urlopen("http://earthquake.usgs.gov/product/nearby-cities/ci11380834/us/1382197630296/nearby-cities.json").read())

returns a list object cities. This is a simple case so we can review the contents of the first element.

{u'direction': u'ESE',
u'distance': 3,
u'latitude': 33.91752,
u'longitude': -116.78724,
u'name': u'Cabazon, California',
u'population': 2535}

Apparently the element of the list object is a dict object with unicode keys. For example value of the key which equals to unicode(name) is unicode version of city name. To to get the population information of this city, we can get it simply by cities[0][unicode("population")].

This is a very simple example, but it shows how to get information as long as you know the structure of the JSON file.

Sometime the JSON file could be pretty complicated and there is no any background information about the structure of the file. In that case, we might do it manually like this:

1.Create a object from json file.

2.Apply type() to verify the data type of this object.

3.If it is a list, apply len function to find number of elements in it. Otherwise apply keys method to get more details about its subfields.

4.Choose a element or a subfield as new object and repeat step 2,3 until you reach the leaf node or find the field of interest.