A Brief Introduction to Hyper Parameter Optimization(learners at Medium level)

Vishnu vardhan Varapalli
Analytics Vidhya
Published in
4 min readMar 1, 2020

--

From google images

First let’s understand what is a Hyperparameter:-

Hyperparameters are the parameters of the algorithm, where values of them are directly proportional to the behavior and performance of the model.

Don’t get confused about the difference between hyperparameters and model parameters.

Model parameters are the parameters that are internal things of a model, these internal things(parameters) can be calculated from given data(Training data we can say).

You’ll get more clarity on this difference when you saw some examples of these 2 things,

So, here I’ll show examples by considering Linear Regression, Random Forests, K-Means clustering algorithms(Assuming that you worked with these algorithms).

Hyperparameters:-

Linear regression:-

In linear regression, fit_intercept is a Hyperparameter,

which means,

fit_intercept = whether to include or not the term beta in the functional form or not.

Random Forests:-

In Random forests, Hyperparameters are

  1. n_estimators
  2. criterion

Which explains about,

n_estimators = Number of trees in the forest,

criterion = “Gini” or “Information gain”(assuming that you worked with these algorithms).

K-means Clustering:-

In K-Means, hyperparameter is

  1. init

which means,

init = initialization method for the centroids.

Model Parameters:-

Linear Regression:-

In linear regression, when we trained the model with the training data, the best fit line will be formed which is,

Y = alpha + Beta*X

where X = input data,

Y = Labelled data(while training)

if you want to know more about Linear regression, go through this link.

so here Beta is one of the Model parameters.

and unfortunately, there are no model parameters for Random forests, and K-Means Clustering to explain to you😢.

For SVMs(Support Vector Machines), Support Vectors are the model parameters.

So for now, I hope that you got an idea about what is the difference between Model parameters and Hyperparameters.

let’s get to the main part now,

  1. What is the use of the optimization of Hyperparameters of an algorithm?

A) As we discussed above, Hyperparameters are the parameters that have great control over the performance of an algorithm. so, by optimizing the Hyperparameters in such a way that it’ll fit our data to the algorithm in an optimized way, such that it’ll increase the accuracy of our algorithm.

we can define Hyperparameter Optimization as ‘the process of assigning the optimized values of hyperparameters to the algorithms, to get better accuracy’.

2. How can we apply Hyperparameter optimization to an algorithm?

A) here we’ll discuss some of the most popular techniques of Hyperparameter optimization, i.e., Random Search, Grid search.

a. Random Search:-

In this technique, Random Search will set up a grid of hyperparameter values and select the random combinations of that hyperparameter values to train and score the model.

just go through the code snippet below to understand the Random Search:-

from sklearn.model_selection import RandomizedSearchCV
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import uniform, truncnorm, randint
# get iris data
iris = datasets.load_iris()
X = iris.data
y = iris.target
model_params = {
'n_estimators': randint(4,200),
'max_features': truncnorm(a=0, b=1, loc=0.25, scale=0.1),
'min_samples_split': uniform(0.01, 0.199)
}
rf_model = RandomForestClassifier()
clf = RandomizedSearchCV(rf_model, model_params, n_iter=100, cv=5, random_state=1)
model = clf.fit(X, y)
from pprint import pprint
pprint(model.best_estimator_.get_params())

don’t bother about the code like ‘what is that pprint?’ and things like that. because of all those things, you’ll get while practicing.

Grid search:-

In this technique, the grid search will train the algorithm with all the possible combinations of values of hyperparameters.

We can measure the performance by using the Cross-Validation technique here.

One of the best methods to implement cross-validation using K-Fold Cross-Validation

This technique makes us ensure that the trained model is got trained by all the patterns and behavior of the data in the dataset.

Just go through the code snippet below to understand the Grid Search:-

import pandas as pd 
import numpy as np
dataset = pd.read_csv(r"D:/Datasets/winequality-red.csv", sep=';')
X = dataset.iloc[:, 0:11].values
y = dataset.iloc[:, 11].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0, random_state=0)
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators=300, random_state=0)
grid_param = {
'n_estimators': [100, 300, 500, 800, 1000],
'criterion': ['gini', 'entropy'],
}
gd_sr = GridSearchCV(
estimator=classifier,
param_grid=grid_param,
scoring='accuracy',
cv=5)
gd_sr.fit(X_train, y_train)
best_parameters = gd_sr.best_params_
print(best_parameters)
best_result = gd_sr.best_score_
print(best_result)

I am saying once again, don’t bother about code first, try to learn the core concept first .while coding, you will get familiar with these things easily.

some important notes:-

→ If you are okay with your computational power and cost, then go to the grid search. because grid search needs more computational power than Random search.

Random search will just select random combinations of values of hyperparameters, whereas grid search will select optimal hyperparameter values by including all possible combinations. That’s why grid search needs more computation power than random search.

→ Even though the grid search takes more computational power, it will give more accurate result than random search

→ But the random search will give accurate results better with low computational power and cost only.

happy reading!✌✌

--

--

Vishnu vardhan Varapalli
Analytics Vidhya

Software Engineer, working on real-time problems to minimize the errors.