Skip to content

Flatten data from JSON to a relational format #26

@singhish

Description

@singhish

As it currently stands, the data is spread out across multiple folders in JSON format in the following structure:

    out_dir
    └── user
        └── spec_id
            └── key
                └── {start_ts}_{end_ts}.json

Note that all of the PhoneView-related data is in the following format:

{
        "_id": {
            "$oid": "..."
        },
        "metadata": {
            "key": "...",
            "platform": "...",
            "read_ts": ...,
            "time_zone": "...",
            "type": "...",
            "write_ts": ...
        },
        "user_id": {
            "$uuid": "..."
        },
        "data": {
            <keys are dependent on key specified in metadata>
        }
    }

Thus, I propose a series of Pandas DataFrames with the following columns:

  1. A DataFrame consisting of metadata
  • user
  • spec_id
  • key
  • start_ts
  • end_ts
  • _id
  • platform
  • read_ts
  • time_zone
  • type
  • write_ts
  • user_id
  1. A DataFrame for each key. The fields here correspond to the fields in the data sub-object of the PhoneView data files. For instance, a DataFrame for the background/battery key for android devices can have these columns:
  • _id
  • android_health
  • android_plugged
  • android_technology
  • android_temperature
  • android_voltage
  • battery_level_pct
  • battery_status
  • ts
  • write_ts

With the amount of data there is, though, Pandas DataFrames might be insufficient and clunky. A SQLite database might be more appropriate, as things such as primary/foreign keys (which _id can take the role of) can be used to better organize the data as a whole.

When all is said and done, we would end up with these tables:

  • Metadata
  • BackgroundBattery
  • BackgroundFilteredLocation
  • BackgroundLocation
  • BackgroundMotionActivity
  • ManualEvaluationTransition
  • StatemachineTransition

that would encompass all our data.

Thoughts? @shankari

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions