Code of the Sequential Feature Forest Flow Model by AngeClementAkazan · Pull Request #16 · SamsungSAILMontreal/ForestDiffusion

AngeClementAkazan · 2024-08-03T16:59:01Z

Summary

Added a new python file denoted Feature_Forest_Flow.py that contains my models, I also imported the main function from this file in your script_generation.py file.

Changes

Added Feature_Forest_Flow.py
Imported feature_forest_flow from Feature_Forest_Flow in script_generation.py

Additional Information

The training by batch finally works well now ( I used objective: multi:softprob as we discussed). However, I noticed that for learning rate of classifier lr>0.2, it seems to output class probabilities whose sum is not equal to 1.

…I have added it to the script_ generation.py file

- I cleaned some things and tried to make it more like the original code - I removed random_state

AlexiaJM · 2024-08-06T15:08:17Z

I cleaned some things and tried to make it more like the original code.
I put Euler vs Rk4 solver into generation only.

Comments:

Can you add comments before each function to explain what they do
in IterForDMatrix, you are missing a case, please fill it. See "# MISSING CASE HERE".
Do you ever use the option one_hot_encoding=True? Isn't this dummifying the categorical data? If so, doesn't it make your method the same as the forest_flow baseline? If so, you can remove it.
cat_y=True, could you remove this option and instead verify from label_y if its categorical or binary? Just to streamline the code to be more like the original one.
What is self.ngen and why do you use this? Please clarify, this is important for the user.
Can you remove self.prediction_type since this should be implicit from whether you use n_batch=0 (xgb.XGBClassifier and xgb.XGBRegressor) vs n_batch>=1 (xgb.train)
Can maybe switch to RK4 as the default solver since its better from your results
There is a lot of cases, so it would take me a lot of time to verify that everything is good, but if everything works properly in tests, I would be okay with it. So please test the method on synthetic data (2 random Gaussian as continuous variables and 4 categorical variables: 2 with 0/1, 2 with 0/1/2) with some missing data (np.nan) on both continuous and categorical data with the following settings:
[Report let say just one or two metric(s) like W2 on all settings just to verify that all they all run without bug and they return reasonable metric (e.g., W2).]
label_y=None or with y(2 categories), y(3 categories);
model_type=HS3F or CS3F;
model=xgboost or random_forest (you can remove random_forest if needed, its not important);
one_hot_encoding=True or False;
remove_miss=True or False;
p_in_one=True or False;
try with only continuous data;
try with only categorical data;
-> For all the settings above, please try with n_batch=0 vs n_batch = 10 because every setting needs to work on both cases which are very different code-wise

Ange-Clement Akazan and others added 3 commits August 3, 2024 18:25

I have added a Feature_Forest_Flow.py which containt my methods, and …

81cd0b1

…I have added it to the script_ generation.py file

I have imported the right function from Feature_Forest_Flow.py

a4c8b40

Update Feature_Forest_Flow.py

f794cae

- I cleaned some things and tried to make it more like the original code - I removed random_state

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code of the Sequential Feature Forest Flow Model#16

Code of the Sequential Feature Forest Flow Model#16
AngeClementAkazan wants to merge 3 commits intoSamsungSAILMontreal:mainfrom
AngeClementAkazan:H3SF

AngeClementAkazan commented Aug 3, 2024

Uh oh!

AlexiaJM commented Aug 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

AngeClementAkazan commented Aug 3, 2024

Summary

Changes

Additional Information

Uh oh!

AlexiaJM commented Aug 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments