Data pre-processing

波比 / 2016-12-13

Center data

calculate the average of each variable (column) and substract it from each value.

 Unit variance (UV) scaling

Pareto(PAR) scaling

What happens if big features dominate, but we know medium features are also important?

 Ctr (mean-centering only)

 – RISK: Medium peaks masked by large peaks

UV (mean-centering and unit variance)

-- RISK: Baseline noise may be inflated

The alternative is Pareto scaling