Skip to content

Code of the Sequential Feature Forest Flow Model#16

Open
AngeClementAkazan wants to merge 3 commits intoSamsungSAILMontreal:mainfrom
AngeClementAkazan:H3SF
Open

Code of the Sequential Feature Forest Flow Model#16
AngeClementAkazan wants to merge 3 commits intoSamsungSAILMontreal:mainfrom
AngeClementAkazan:H3SF

Conversation

@AngeClementAkazan
Copy link

Summary

Added a new python file denoted Feature_Forest_Flow.py that contains my models, I also imported the main function from this file in your script_generation.py file.

Changes

  • Added Feature_Forest_Flow.py
  • Imported feature_forest_flow from Feature_Forest_Flow in script_generation.py

Additional Information

  • The training by batch finally works well now ( I used objective: multi:softprob as we discussed). However, I noticed that for learning rate of classifier lr>0.2, it seems to output class probabilities whose sum is not equal to 1.

Ange-Clement Akazan and others added 3 commits August 3, 2024 18:25
…I have added it to the script_ generation.py file
- I cleaned some things and tried to make it more like the original code
- I removed random_state
@AlexiaJM
Copy link
Collaborator

AlexiaJM commented Aug 6, 2024

  • I cleaned some things and tried to make it more like the original code.
  • I put Euler vs Rk4 solver into generation only.

Comments:

  • Can you add comments before each function to explain what they do

  • in IterForDMatrix, you are missing a case, please fill it. See "# MISSING CASE HERE".

  • Do you ever use the option one_hot_encoding=True? Isn't this dummifying the categorical data? If so, doesn't it make your method the same as the forest_flow baseline? If so, you can remove it.

  • cat_y=True, could you remove this option and instead verify from label_y if its categorical or binary? Just to streamline the code to be more like the original one.

  • What is self.ngen and why do you use this? Please clarify, this is important for the user.

  • Can you remove self.prediction_type since this should be implicit from whether you use n_batch=0 (xgb.XGBClassifier and xgb.XGBRegressor) vs n_batch>=1 (xgb.train)

  • Can maybe switch to RK4 as the default solver since its better from your results

  • There is a lot of cases, so it would take me a lot of time to verify that everything is good, but if everything works properly in tests, I would be okay with it. So please test the method on synthetic data (2 random Gaussian as continuous variables and 4 categorical variables: 2 with 0/1, 2 with 0/1/2) with some missing data (np.nan) on both continuous and categorical data with the following settings:
    [Report let say just one or two metric(s) like W2 on all settings just to verify that all they all run without bug and they return reasonable metric (e.g., W2).]
    label_y=None or with y(2 categories), y(3 categories);
    model_type=HS3F or CS3F;
    model=xgboost or random_forest (you can remove random_forest if needed, its not important);
    one_hot_encoding=True or False;
    remove_miss=True or False;
    p_in_one=True or False;
    try with only continuous data;
    try with only categorical data;
    -> For all the settings above, please try with n_batch=0 vs n_batch = 10 because every setting needs to work on both cases which are very different code-wise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments