We thank you for your time in reviewing our submission. We understand that you are performing a service to the community. In order to ensure that review of this code is as easy as possible, we have:
- Included a one-line script to recreate all results (
repro_results.sh) - Heavily commented and documented the code
- Included an overview of the codebase and the files.
Thank you for reviewing our submission!
The files are organized as follows:
data_structurescontains all of the data structures used in our experiments, e.g., forests, trees, nodes, and histograms.- the
wrapperssubdirectory contains convenience classes to instantiate models of various types, e.g., aRandomForestClassifieris a forest classifier withbootstrap=True(indicating to draw a bootstrap sample of thendatapoints for each tree),feature_subsampling=SQRT(indicating to consider onlysqrt(F)features of the originalFfeatures at each node split), etc.
- the
- the
experimentssubdirectory contains all the code for our core experimentsexperiments/runtime_expscontains the script (compare_runtimes.py) to reproduce the results of Tables 1 and 2, as well as the results of running that script (the files ending in_profileor_dict)experiments/budget_expscontains the script (compare_budgets.py) to reproduce the results of Tables 3 and 4, as well as the results of running that script (the files ending in_dict)experiments/sklearn_expscontains the script (compare_baseline_implementations.py) to reproduce the results of Table 6 in Appendix 4experiments/scaling_expscontains the scripts (investigate_scaling.pyandmake_scaling_plot.py) to reproduce Appendix Figure 1 in Appendix 2
- the
testssubdirectory tests that we wrote to verify the correctness of our implementationstests/feature_importance_tests.pyis also used to regenerate the results in Table 5- You can reproduce the results for just Table 5 by running
tests/feature_importance_tests.py. The results will be stored in the first 4 lines oftests/stat_test_stability_log/reproduce_stability.csvfile.
- You can reproduce the results for just Table 5 by running
- the
utilsdirectory contains helper code for training forest-based modelsutils/solvers.pyincludes the core implementation of MABSplit in thesolve_mab()function
- To reproduce the results in all the tables, and to reproduce the figure in Appendix 2, please run
repro_script.sh. This may take many hours.