Data Management

Page Editor: @allopole
To request edits to this page, open an issue, and tag @allopole

Introduction to data management

In our lab, we collect data for our projects from many different sources and experiments but the way we manage and store data should be uniform to some degree. This page will hopefully help us figure out our individual practices data management practices and figure out ways to record the best ones.

Data camp's data management page is really helpful and in line with suggested Drake Lab practices: https://www.statmethods.net/management/index.html

Raw data

Data can be copied from hand-written notes into Excel, data scraped from online resources, or simulation-based data. These data are considered to be 'raw data' and should be backed up and saved in their rawest form. It's tempting to overwrite with cleaner data but resist.

If your data are from simulations, the files might be too large to manage this way. In this case, make sure you save the exact way that the data were generated including version number of R packages (see the software packrat), and back up your source code on GitHub.

When naming data files, use google style guide. Other notes:

If data are originally in Excel, save data as CSV - easier to read into R, doesn't require proprietary software
If variables are not intuitive, figure out what they represent and change them to something you easily understand
If NA values are coded as something weird, take note and consider changing to NA so R understands
Name your data file as something people can easily understand 2016-05-alaska-b.csv and save meta-data with the same name

Sharing your dissertation research

Once projects are published or you leave the lab, save it within the DrakeLab GitHub. Before research is published, the Github repo should be kept private. Naming the project repo on Drake-Lab GitHub should follow these tips to increase standardization:

where relevant use surname or grant name/nickname
separate words by hyphens
use descriptive, meaningful words
e.g., "evans-mosq-field-study"

Published works should be archived in a public repository with a DOI. The associated repo can then be kept private or made public, depending on the aims of the research. The definitive, public archive of lab research should be submitted to Figshare, Dryad, or Zenodo for permanent deposition of research/data rather than just making GitHub repos public. Doing so makes sure what you publish is cleaner and doesn’t have info that you may not want everyone to be able to see (such as rejected journal submissions/protocols/etc).

Resources:

Lab Links

journal-club doc
google-sites lab manual
index of all Drake-lab google sites
lab-meeting--minutes doc Contact John if you are having trouble accessing google docs or websites.
repository of public domain images

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Management

Introduction to data management

Raw data

Sharing your dissertation research

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lab Links

Clone this wiki locally