Data Preprocessing(Machine Learning)

3 min readJun 15, 2021

Data Preprocessing is considered one of the most important step in making a Making Learning model function properly.

We can easily get tons of Data in form of various Datasets, but to make that data fit for deriving various insights from it, requires a lot of observation, modification, manipulation and numerous other steps.

What is it?

When we freshly download a Dataset for our project or some other work, the Data it contains is random(most of the time) i.e. not arranged or not filled in the way we need it to be.

Sometimes, it might have

NULL Values
Unnecessary Features
Datatypes not in a proper format.

etc…

So, to treat all these shortcomings, we go through a process which is popularly known as “Data Preprocessing’’.

Applications

Data Preprocessing in some or the other way is used in almost each and every Machine Learning problem. It has a very wide application spectrum.

How do we do Preprocessing?

There are numerous ways to preprocess the data, depending upon our need, we proceed further.

Example 1- If we have NULL values in our Dataset.

We can simply drop our NULL values if they aren’t much in number & if dropping them won’t affect our dataset.
We can also treat NULL values by replacing them with Mean, Median or Mode of that column. It depends on our need.

Example 2- If we do not have date & time in correct format.