-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
As it currently stands, the data is spread out across multiple folders in JSON format in the following structure:
out_dir
└── user
└── spec_id
└── key
└── {start_ts}_{end_ts}.json
Note that all of the PhoneView-related data is in the following format:
{
"_id": {
"$oid": "..."
},
"metadata": {
"key": "...",
"platform": "...",
"read_ts": ...,
"time_zone": "...",
"type": "...",
"write_ts": ...
},
"user_id": {
"$uuid": "..."
},
"data": {
<keys are dependent on key specified in metadata>
}
}
Thus, I propose a series of Pandas DataFrames with the following columns:
- A DataFrame consisting of metadata
userspec_idkeystart_tsend_ts_idplatformread_tstime_zonetypewrite_tsuser_id
- A DataFrame for each key. The fields here correspond to the fields in the
datasub-object of the PhoneView data files. For instance, a DataFrame for thebackground/batterykey forandroiddevices can have these columns:
_idandroid_healthandroid_pluggedandroid_technologyandroid_temperatureandroid_voltagebattery_level_pctbattery_statustswrite_ts
With the amount of data there is, though, Pandas DataFrames might be insufficient and clunky. A SQLite database might be more appropriate, as things such as primary/foreign keys (which _id can take the role of) can be used to better organize the data as a whole.
When all is said and done, we would end up with these tables:
MetadataBackgroundBatteryBackgroundFilteredLocationBackgroundLocationBackgroundMotionActivityManualEvaluationTransitionStatemachineTransition
that would encompass all our data.
Thoughts? @shankari
Metadata
Metadata
Assignees
Labels
No labels