Skip to content

Analysis_of_Key_Variables_in_Data_Sets


Go to the application

Overview

Interactive Streamlit app that loads a user‑supplied dataset, automatically detects whether the target column is for classification or regression, and runs PyCaret experiments to compare models, visualize feature importance, confusion matrices, and ROC curves.

Main functionalities

  • Upload CSV/JSON/XLSX files with custom separator selection.
  • Random sampling of a user‑defined percentage of the dataframe for training.
  • Automatic detection of problem type (classification vs regression).
  • Interactive tabs: random rows, missing data analysis, and model setup.
  • PyCaret ClassificationExperiment or RegressionExperiment setup with options to ignore columns, balance classes, normalize, and transform features.
  • Model comparison (compare_models) with selectable model sets.
  • Generation of feature importance plots, confusion matrices, and ROC curves.
  • Save and load pre‑generated plot images for later use.

Technologies & skills

  • Python 3.x
  • Streamlit
  • Pandas, NumPy
  • PyCaret (classification, regression)
  • Matplotlib, Seaborn
  • Scikit‑learn metrics

Project Report

  • The app demonstrates end‑to‑end machine learning workflow: data ingestion → preprocessing → model training → evaluation.
  • It is designed for non‑technical users to quickly assess which algorithm works best on their dataset.

Sample photos

<figcaption>built‑in dataset</figcaption>
built‑in dataset
<figcaption>you can load your own data in the initial setup</figcaption>
you can load your own data in the initial setup
<figcaption>preliminary data analysis</figcaption>
preliminary data analysis
<figcaption>detection of a problem and model comparison range</figcaption>
detection of a problem and model comparison range
<figcaption>Feature Importance</figcaption>
Feature Importance
<figcaption>Confusion Matrix</figcaption>
Confusion Matrix
<figcaption>ROC Curves</figcaption>
ROC Curves

Application usage

  • Upload a dataset.
  • Choose separator and target column.
  • Inspect random rows and missing data.
  • Run model comparison and view evaluation plots.

Go to the application