Next, it can be very enlightening to plot the distributions of your numeric features.
Often, a quick and dirty grid of histograms is enough to understand the distributions.
Here are a few things to look out for:
- Distributions that are unexpected
- Potential outliers that don't make sense
- Features that should be binary (i.e. "wannabe indicator variables")
- Boundaries that don't make sense
- Potential measurement errors
At this point, you should start making notes about potential fixes you'd like to make. If something looks out of place, such as a potential outlier in one of your features, now's a good time to ask the client/key stakeholder, or to dig a bit deeper.
However, we'll wait until Data Cleaning to make fixes so that we can keep our steps organized.