-
Notifications
You must be signed in to change notification settings - Fork 0
Data Management
Page Editor: @allopole
To request edits to this page, open an issue, and tag @allopole
In our lab, we collect data for our projects from many different sources and experiments but the way we manage and store data should be uniform to some degree. This page will hopefully help us figure out our individual practices data management practices and figure out ways to record the best ones.
Data camp's data management page is really helpful and in line with suggested Drake Lab practices: https://www.statmethods.net/management/index.html
Data can be copied from hand-written notes into Excel, data scraped from online resources, or simulation-based data. These data are considered to be 'raw data' and should be backed up and saved in their rawest form. It's tempting to overwrite with cleaner data but resist.
If your data are from simulations, the files might be too large to manage this way. In this case, make sure you save the exact way that the data were generated including version number of R packages (see the software packrat), and back up your source code on GitHub.
When naming data files, use google style guide. Other notes:
- If data are originally in Excel, save data as CSV - easier to read into R, doesn't require proprietary software
- If variables are not intuitive, figure out what they represent and change them to something you easily understand
- If NA values are coded as something weird, take note and consider changing to NA so R understands
- Name your data file as something people can easily understand
2016-05-alaska-b.csvand save meta-data with the same name
Once projects are published or you leave the lab, save it within the DrakeLab GitHub. Before research is published, the Github repo should be kept private. Naming the project repo on Drake-Lab GitHub should follow these tips to increase standardization:
- where relevant use surname or grant name/nickname
- separate words by hyphens
- use descriptive, meaningful words
- e.g., "evans-mosq-field-study"
Published works should be archived in a public repository with a DOI. The associated repo can then be kept private or made public, depending on the aims of the research. The definitive, public archive of lab research should be submitted to Figshare, Dryad, or Zenodo for permanent deposition of research/data rather than just making GitHub repos public. Doing so makes sure what you publish is cleaner and doesn’t have info that you may not want everyone to be able to see (such as rejected journal submissions/protocols/etc).
Resources:
- journal-club doc
- google-sites lab manual
- index of all Drake-lab google sites
- lab-meeting--minutes doc Contact John if you are having trouble accessing google docs or websites.
- repository of public domain images