Skip to content

Tayyabah-Rehman/Hyperparameter-Optimization

Repository files navigation

🏠 Linear-Regression

Linear Regression Development & Evaluation Pipeline | Implements scikit-learn LinearRegression with comprehensive regression metrics calculation | Features predicted vs actual scatter plot with y=x reference line for visual model assessment.

Python Scikit-learn Pandas NumPy Matplotlib Status


📌 About

This project implements a complete Linear Regression pipeline on the California Housing dataset to predict median house values. It covers data loading, train/test splitting, model training, evaluation using three standard metrics, and a scatter plot visualization of predicted vs. actual values.

Course: Machine Learning — Task 3 Author: Tayyabah Rehman Date: May 2026


📊 Results

Metric Score Meaning
RMSE 0.7456 Average error ~$74,560 per prediction
MAE 0.5332 Typical prediction error ~$53,320
0.5758 Model explains 57.6% of variance

📁 Project Structure

Linear-Regression/
│
├── Task3_ML_Tayyabah_Rehman.ipynb   # Main notebook
├── predicted_vs_actual.png          # Output scatter plot
└── README.md                        # This file

⚙️ Setup & Installation

1. Clone the repository

git clone https://github.com/your-username/Linear-Regression.git
cd Linear-Regression

2. Install dependencies

pip install pandas numpy scikit-learn matplotlib jupyter

▶️ How to Run

Jupyter Notebook

jupyter notebook Task3_ML_Tayyabah_Rehman.ipynb

Then: Kernel → Restart & Run All

Google Colab

Open In Colab

No data files needed — dataset loads automatically from Scikit-learn.


🔢 Dataset

Property Value
Source sklearn.datasets.fetch_california_housing
Rows 20,640
Features 8 numerical
Target MedHouseVal (median house value in $100,000s)
Missing values 0
Train / Test 16,512 / 4,128 (80/20, random_state=42)

🔧 Pipeline Steps

# 1. Load dataset
housing = fetch_california_housing()
df = pd.DataFrame(housing.data, columns=housing.feature_names)
df['MedHouseVal'] = housing.target

# 2. Split 80/20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Train model
model = LinearRegression()
model.fit(X_train, y_train)

# 4. Evaluate
y_pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
mae  = mean_absolute_error(y_test, y_pred)
r2   = r2_score(y_test, y_pred)

# 5. Plot
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.savefig('predicted_vs_actual.png', dpi=300, bbox_inches='tight')

🧰 Libraries Used

Library Purpose
pandas Data loading and handling
numpy RMSE calculation
scikit-learn Model, splitting, metrics
matplotlib Scatter plot visualization

⭐ Star this repo if you found it useful!

About

Hyperparameter Optimization: Grid Search vs Randomized Search | Systematic tuning of Titanic survival classifier with cross-validation, comparing default, grid-optimized, and randomized-optimized model performance metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors