To execute the code:
- Place your datasets as CSV files in the
datasets/folder - Ensure your CSV files follow this format:
- Column headers:
var-0,var-1,var-2,var-3,...,var-n,ETIQ - Values should be normalized between [0,1]
ETIQis the target column for classification
- Column headers:
- Run the main script:
python main.py
After execution, results will be generated in the results/ folder. The script will automatically process all CSV files in the datasets/ folder and generate analysis results including clustering metrics and visualizations.
datasets/- Input CSV files for analysisresults/- Generated output files and visualizationssrc/- Source code modulesbuild_test_genetic_2.py- Main test buildergenetic2_parallel.py- Parallel genetic algorithm implementationalg_clustering.py- Clustering algorithms and metricsanalysis_weighted_variables_num_cluster.py- Variable analysis toolsdata_generators- Methods related to synthetic data generation. The file can be executed to generate an example of a corners dataset on ./datasets/synthetic_data_corners.csv
main.py- Main execution script for GAUFS algorithm/comparison/— Files used for comparison with the AutoUFS tool.dataset-papers/— Input CSV files for the comparison.results-papers/— Output results, containing one folder for each compared dataset.AutoUFSTool-main/— Folder cloned from the AutoUFS-tool GitHub repository.main-comparison.m— MATLAB script to run the comparison.
alg_clustering.py— Python module with utility functions for clustering.automate_v2.py— Python script to run the automatic comparison process.datasets_mat.ipynb— Jupyter notebook with tools for converting CSV files into MATLAB structures.