Intro

Experience

Tech

Project

Blog

L2 Regularization (Weight Decay)

L2 Regularization (Weight Decay)

author's face

Kenneth Wong

2024-09-12

10 min read

A brief introduction to regularization and weight decay. Specifically L2 regularization and how it helps with model training

ML

deep learning

regularization


 
formula 
 
 

Quadratic MSE Approximation 

quadratic MSE approximation 
This analysis requires us to build a quadratic approximation of the loss we would expect using weight w w.r.t optimal weight w*. 
 
I tried my best to reason the variables as it was not explained in the book so I would not be surprised if it's wrong. I am especially unconfident with the reasoning behind the Hessian and the 1/2 
 

Add Weight Regularization 

weight regularization 
After the previous step of finding the derivative w.r.t to w, we add the weight regularization and find out that the eigenvectors of the Hessian has been scaled. 
 

Visualization 

visualization 
According to our previous analysis, the "weaker" eigenvalue gets scaled down, meaning that the regularization prioritizes more important eigenvectors. So from our contour graph, we see that w1 gets pulled closer to 0 whereas w2 does not get change a lot from its original value. 
 

Undetermined Problem 

undetermined problem 
L2 regularization helps with underdetermined problems where the covariance matrix is non-invertible, which is crucial for closed-form solutions for PCA or linear regression. Hence with the additional alpha term L2 reg can turn the matrix into an invertible matrix.

Let's Connect 🍵

You made it to the end of my blog! I hope you enjoyed reading it and got something out of it. If you are interested in connecting with me, feel free to reach out to me on LinkedIn . I'm always up for a chat/or to work on exciting projects together!