## Training and validation
Training a machine learning model involves the use of data. However, we need to test the effectiveness of the model, this is called validation. Hence we need to split the data into training and testing sets.
### Bias vs Variance
- Bias: level of inability for the model to fit the true nature of the data. High bias model cannot fit the true nature
- Variance: is the amount which our predictions will change due to a different training data set. This can lead to worse fit on the test set. Formally, its the expected divergence of the estimated prediction from its average value.
![](https://i.imgur.com/oZIN13H.png)
Our intuition may tell:
- The presence of bias indicates something basically wrong with the model and algorithm...
- Variance is also bad, but a model with high variance could at least predict well on average...
So the model should minimize bias even at the expense of variance?? Not really!
Bias and variance are equally important as we are always dealing with a single realization of the data set.
#### Bias and variance decomposition
- True function: $f(x)$
- Prediction function estimated with data D: $\hat{f_D}(x)$
- Average of prediction models: $E_D[\hat{f_D}(x)]$
$$
\begin{align}
Variance=E_D[(E_D[\hat{f_D}(x)]-E_D[f(x)])^2]\\\ Bias=E_D[\hat{f_D}(x)]-f(x)
\end{align}
$$
#### Overfitting
When the learned models are overly specialized for the training samples, leading to low bias and high variance.
![](https://i.imgur.com/ULGnGSl.png)
### Cross Validation
How do we know how much % to split between test and train-set data? Cross validation will attempt many different combinations to find the best split.
## Interpretability
### Shrinking the number of variables
Among a large number of variables the model there are generally many that have little (or no) effect on Y
- Leaving these variables in the model makes it harder to see the big picture, i.e. the effect of the “important variables”
- Would be easier to interpret the model by removing unimportant variables (setting the coefficients to zero)
### Occam's Razor
A principle about choosing the simplest explanation for the observed data, which can involve the number of model parameters, data points and fit to data.