Dimensionality Reduction

Dimensionality reduction refers to techniques that reduce the number of input

variables in a dataset. More input features often make a predictive modeling

task more challenging to model, more generally referred to as the curse of

dimensionality.

By only keeping the most relevant variables from the original dataset (this technique is called feature selection)
By finding a smaller set of new variables, each being a combination of the input variables, containing basically the same information as the input variables (this technique is called dimensionality reduction)

Imagine you have a big bag of marbles. Each marble represents a different feature of something,

like a person, place, or thing.

For example, if you're trying to describe a person, you might have marbles for their height,

weight, eye color, hair color, and so on.

If you have a lot of marbles, it can be hard to keep track of them all. And if you're trying to use the marbles to learn

something about the people, places, or things they represent, it can be hard to figure out which marbles are the most

important.

Dimensionality reduction is a way to reduce the number of marbles in your bag without losing too much information.

It does this by combining some of the marbles together into new marbles.

For example, you could combine the marbles for height and weight into a new marble for overall size.

Or you could combine the marbles for eye color and hair color into a new marble for overall appearance.

Once you've reduced the number of marbles, it's easier to keep track of them and to use them to learn about the people,

places, or things they represent.

Here's a simple analogy for a 5-year-old:

Imagine you're trying to sort a pile of mixed-up Legos. You could sort them by color, by size, by shape, or by any other

feature you want.

If you have a lot of Legos, it can be hard to sort them by all of their features at the same time. But if you reduce the

dimensionality of the problem by only sorting by one feature at a time, it becomes much easier.

For example, you could first sort the Legos by color. Then, you could sort each color group by size. And then, you could

sort each size group by shape.

By reducing the dimensionality of the problem, you've made it much easier to sort the Legos.

Dimensionality reduction is a powerful technique that can be used in many different fields, such as machine learning,

data science, and image processing.

It can help us to learn from data more effectively and to make better decisions.

Py Data