Analysis_of_Key_Variables_in_Data_Sets

Go to the application ¶

Overview¶

Interactive Streamlit app that loads a user‑supplied dataset, automatically detects whether the target column is for classification or regression, and runs PyCaret experiments to compare models, visualize feature importance, confusion matrices, and ROC curves.

Main functionalities¶

Upload CSV/JSON/XLSX files with custom separator selection.
Random sampling of a user‑defined percentage of the dataframe for training.
Automatic detection of problem type (classification vs regression).
Interactive tabs: random rows, missing data analysis, and model setup.
PyCaret ClassificationExperiment or RegressionExperiment setup with options to ignore columns, balance classes, normalize, and transform features.
Model comparison (compare_models) with selectable model sets.
Generation of feature importance plots, confusion matrices, and ROC curves.
Save and load pre‑generated plot images for later use.

Technologies & skills¶

Python 3.x
Streamlit
Pandas, NumPy
PyCaret (classification, regression)
Matplotlib, Seaborn
Scikit‑learn metrics

Project Report¶

The app demonstrates end‑to‑end machine learning workflow: data ingestion → preprocessing → model training → evaluation.
It is designed for non‑technical users to quickly assess which algorithm works best on their dataset.