Flatten data from JSON to a relational format

As it currently stands, the data is spread out across multiple folders in JSON format in the following structure:

```
    out_dir
    └── user
        └── spec_id
            └── key
                └── {start_ts}_{end_ts}.json
```

Note that all of the PhoneView-related data is in the following format:
```
{
        "_id": {
            "$oid": "..."
        },
        "metadata": {
            "key": "...",
            "platform": "...",
            "read_ts": ...,
            "time_zone": "...",
            "type": "...",
            "write_ts": ...
        },
        "user_id": {
            "$uuid": "..."
        },
        "data": {
            <keys are dependent on key specified in metadata>
        }
    }
```

Thus, I propose a series of Pandas DataFrames with the following columns:

1) A DataFrame consisting of metadata
- `user`
- `spec_id`
- `key`
- `start_ts`
- `end_ts`
- `_id`
- `platform`
- `read_ts`
- `time_zone`
- `type`
- `write_ts`
- `user_id`

2) A DataFrame for each key. The fields here correspond to the fields in the `data` sub-object of the PhoneView data files. For instance, a DataFrame for the `background/battery` key for `android` devices can have these columns:
- `_id`
- `android_health`
- `android_plugged`
- `android_technology`
- `android_temperature`
- `android_voltage`
- `battery_level_pct`
- `battery_status`
- `ts`
- `write_ts`

With the amount of data there is, though, Pandas DataFrames might be insufficient and clunky. A SQLite database might be more appropriate, as things such as primary/foreign keys (which `_id` can take the role of) can be used to better organize the data as a whole.

When all is said and done, we would end up with these tables:
- `Metadata`
- `BackgroundBattery`
- `BackgroundFilteredLocation`
- `BackgroundLocation`
- `BackgroundMotionActivity`
- `ManualEvaluationTransition`
- `StatemachineTransition`

that would encompass all our data.

Thoughts? @shankari 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flatten data from JSON to a relational format #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flatten data from JSON to a relational format #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions