Skip to content

KMN: Results generated by CDF function does not sum to 1 #17

@antemooo

Description

@antemooo

Dears,
For a CDE task, I am trying to use your package to estimate the error distribution on a dependent variable y based on a set of features in a variable X. My first trial was by using the KMN model created and trained on default parameters.
Creating, fitting, other functions all works without errors. However, when looking at the results generated by the CDF function, I noticed that they do not converge near to 1 (they do not sum to 1) no matter how much I try. I have generated CDF with different resolutions and range limits (number of instance to condition on and the start and end of the linespace) but still the same issue. Below are 3 screenshots generated on different resolutions. You can also see the results of the PDF which is actually matching what I am expecting the conditional distribution to look like. Eventually the part of the code related to this issue is included as well.

btw: I have also tried to check the mixture parameters, and the weights seem to sum to 1 (idk if that is relevant).

Two questions in here:

  • The KMN model that you are using seems to be different than what is described in the original paper of the KMN, especially in the case of the number of gaussians to use in the output, True?
  • Did you ever come across such a problem when you were testing or evaluating the model?

Remarks not related to the issue:

  • The predict_density() does not exist although it is generated in the documentation.
  • What is the effect of n_centers? is not that the output is calculated on the mixture of kernels/gaussians that are set using Kmeans? can one still build that out put on a specific subset of these gaussians? like 1 or 2, etc?

CDF Results

Screenshot 2021-02-11 at 15 54 39
Screenshot 2021-02-11 at 15 57 58
Screenshot 2021-02-11 at 16 49 25

PDF Results

Screenshot 2021-02-11 at 17 00 33

The model and the function to generate the CDF/PDF results

model = KernelMixtureNetwork("KDE_1", ndim_x=21, ndim_y=1, n_centers=50,
                             x_noise_std=0.2, y_noise_std=0.1, random_seed=22)
model.fit(X_train, y_train, eval_set=(X_test,y_test))

def get_instance_to_draw(instance, lower_limit=-14,upper_limit=14, resolution=1000):
    x_dist = np.array([instance for _ in range(resolution)])
    y_dist = np.linspace(lower_limit, upper_limit, num=resolution)
    pred_dist = model.pdf(x_dist,y_dist)
    mean, std = model.mean_std(x_dist[0].reshape(1,-1))
    return x_dist, y_dist,pred_dist, mean, std

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions