calculate the average of each variable (column) and substract it from each value.
unit variance (UV) scaling
- if variables are measured in different units, data are scaled to give each variable equal chance to influence the model.
- divide each variable by its standard deviation, variance of scaled variables =1
What happens if big features dominate, but we know medium features are also important?
Ctr (mean-centering only)
— RISK: Medium peaks masked by large peaks
UV (mean-centering and unit variance)
— RISK: Baseline noise may be inflated
The alternative is Pareto scaling
- Divide each variable by the square root of its SD
- Intermediate between no scaling (Ctr) and UV
- Weights up medium features without inflating baseline noise
- Recommended option (NMR & MS metabonomics, Gene chip & proteomics data)