Data cleaning is one those things that everyone does but no one really talks about. Sure, it’s not the "sexiest" part of machine learning. And no, there aren’t hidden tricks and secrets to uncover.
However, proper data cleaning can make or break your project. Professional data scientists usually spend a very large portion of their time on this step.
Why? Because of a simple truth in machine learning:
Better data beats fancier algorithms.
In other words... garbage in gets you garbage out. Even if you forget everything else from this course, please remember this point.
In fact, if you have a properly cleaned dataset, even simple algorithms can learn impressive insights from the data!
Obviously, different types of data will require different types of cleaning. However, the systematic approach laid out in this lesson can always serve as a good starting point.