Datasets for Data Science and Machine Learning

These days, we have the opposite problem we had 5-10 years ago… Back then, it was actually difficult to find datasets for data science and machine learning projects. Since then, we’ve been flooded with lists and lists of datasets. Today, the problem is not finding datasets, but rather sifting through them to keep the relevant…

Continue Reading →

How to Learn Python for Data Science in 2017 (Updated)

In this guide, we’ll cover how to learn Python for data science, including our favorite curriculum for self-study. You see, data science is about problem solving, exploration, and extracting valuable information from data. To do so effectively, you’ll need to wrangle datasets, train machine learning models, visualize results, and much more. Enter Python. This is the…

Continue Reading →

Best Practices for Feature Engineering

Feature engineering, the process creating new input features for machine learning, is one of the most effective ways to improve predictive models. Coming up with features is difficult, time-consuming, requires expert knowledge. “Applied machine learning” is basically feature engineering. ~ Andrew Ng Through feature engineering, you can isolate key information, highlight patterns, and bring in…

Continue Reading →

The Beginner’s Guide to Kaggle

Kaggle, a popular platform for data science competitions, can be intimidating for beginners to get into. After all, some of the listed competitions have over $1,000,000 prize pools and hundreds of competitors. Top teams boast decades of combined experience, tackling ambitious problems such as improving airport security or analyzing satellite data. It’s no surprise that some beginners hesitate to…

Continue Reading →

How to Handle Imbalanced Classes in Machine Learning

Imbalanced classes put “accuracy” out of business. This is a surprisingly common problem in machine learning (specifically in classification), occurring in datasets with a disproportionate ratio of observations in each class. Standard accuracy no longer reliably measures performance, which makes model training much trickier. Imbalanced classes appear in many domains, including: Fraud detection Spam filtering Disease screening SaaS subscription churn Advertising click-throughs In this…

Continue Reading →

9 Mistakes to Avoid When Starting Your Career in Data Science

If you wish to begin a career in data science, you can save yourself days, weeks, or even months of frustration by avoiding these 9 costly beginner mistakes. If you’re not careful, these mistakes will eat away at your most valuable resources: your time, energy, and motivation. We’ve broken them into three categories: Mistakes while…

Continue Reading →

WTF is the Bias-Variance Tradeoff? (Infographic)

Overheard after class: “doesn’t the Bias-Variance Tradeoff sound like the name of a treaty from a history documentary?” Ok, that’s fair… but it’s also one of the most important concepts to understand for supervised machine learning and predictive modeling. Unfortunately, because it’s often taught through dense math formulas, it’s earned a tough reputation. But as you’ll…

Continue Reading →

Free Data Science Resources for Beginners

In this guide, we’ll share 65 free data science resources that we’ve hand-picked and annotated for beginners. To become data scientist, you have a formidable challenge ahead. You’ll need to master a variety of skills, ranging from machine learning to business analytics. However, the rewards are worth it. Organizations will prize alchemists who can turn raw data into smarter decisions,…

Continue Reading →

Dimensionality Reduction Algorithms: Strengths and Weaknesses

Welcome to Part 2 of our tour through modern machine learning algorithms. In this part, we’ll cover methods for Dimensionality Reduction, further broken into Feature Selection and Feature Extraction. In general, these tasks are rarely performed in isolation. Instead, they’re often preprocessing steps to support other tasks. If you missed Part 1, you can check it out…

Continue Reading →

Modern Machine Learning Algorithms: Strengths and Weaknesses

In this guide, we’ll take a practical, concise tour through modern machine learning algorithms. While other such lists exist, they don’t really explain the practical tradeoffs of each algorithm, which we hope to do here. We’ll discuss the advantages and disadvantages of each algorithm based on our experience. Categorizing machine learning algorithms is tricky, and there are several reasonable…

Continue Reading →

Page 1 of 3