This repository contains code necessary to reproduce results from the Bayesian Level-Set Clustering article by David Buch, Miheer Dewaskar, and David Dunson.
The repository is organized into four top-level directories: data, R, scripts, and src.
-
When cloned, the
datafolder will be nearly empty, containing only a copy of the t-SNE embedded RNA sequencing data found on Renesh Bedre's website and the raw cosmological sky survey data, subsampled from the Edinburgh-Durham Southern Galaxy Catalogue, provided to us via personal correspondence from Woncheol Jang. The various other datasets which appear in our article are simulated by the scripts in this repository, especiallyscripts/data_processing/toy_data_processing.R. -
The top-level directory
RcontainsRfunction definitions and other helper code that is used in the course of our BALLET data analyses. -
Similar to the
Rdirectory, thesrcdirectory contains code for functions which are used to perform the data analyses. However, the functions defined insrcare written in C++, and bound toRfunctions using theRcpppackage.
Together, the R and src directories contain a complete collection of functions needed to carry out a BALLET analysis of a new dataset, should a user be interested. We may use these two folders as the foundation for an open-source BALLET software package in R.
- The
scriptsdirectory contains the mainRscripts which use the functions defined inRandsrcto carry out BALLET analyses and produce the output figures and tables seen in the main article and supplement. For ease of reading, the scripts in this top-level directory are grouped into four sub-folders:data_processing,illustrations,toy_challenge_analysis, andsky_survey_analysis.