The Beginner’s Guide to Kaggle

Kaggle, a popular platform for data science competitions, can be intimidating for beginners to get into. After all, some of the listed competitions have over $1,000,000 prize pools and hundreds of competitors. Top teams boast decades of combined experience, tackling ambitious problems such as improving airport security or analyzing satellite data. It’s no surprise that some beginners hesitate to…

Continue Reading →

How to Handle Imbalanced Classes in Machine Learning

Imbalanced classes put “accuracy” out of business. This is a surprisingly common problem in machine learning (specifically in classification), occurring in datasets with a disproportionate ratio of observations in each class. Standard accuracy no longer reliably measures performance, which makes model training much trickier. Imbalanced classes appear in many domains, including: Fraud detection Spam filtering Disease screening SaaS subscription churn Advertising click-throughs In this…

Continue Reading →

9 Mistakes to Avoid When Starting Your Career in Data Science

If you wish to begin a career in data science, you can save yourself days, weeks, or even months of frustration by avoiding these 9 costly beginner mistakes. If you’re not careful, these mistakes will eat away at your most valuable resources: your time, energy, and motivation. We’ve broken them into three categories: Mistakes while…

Continue Reading →

WTF is the Bias-Variance Tradeoff? (Infographic)

Overheard after class: “doesn’t the Bias-Variance Tradeoff sound like the name of a treaty from a history documentary?” Ok, that’s fair… but it’s also one of the most important concepts to understand for supervised machine learning and predictive modeling. Unfortunately, because it’s often taught through dense math formulas, it’s earned a tough reputation. But as you’ll…

Continue Reading →

Free Data Science Resources for Beginners

In this guide, we’ll share 65 free data science resources that we’ve hand-picked and annotated for beginners. To become data scientist, you have a formidable challenge ahead. You’ll need to master a variety of skills, ranging from machine learning to business analytics. However, the rewards are worth it. Organizations will prize alchemists who can turn raw data into smarter decisions,…

Continue Reading →

Dimensionality Reduction Algorithms: Strengths and Weaknesses

Welcome to Part 2 of our tour through modern machine learning algorithms. In this part, we’ll cover methods for Dimensionality Reduction, further broken into Feature Selection and Feature Extraction. In general, these tasks are rarely performed in isolation. Instead, they’re often preprocessing steps to support other tasks. If you missed Part 1, you can check it out…

Continue Reading →

Modern Machine Learning Algorithms: Strengths and Weaknesses

In this guide, we’ll take a practical, concise tour through modern machine learning algorithms. While other such lists exist, they don’t really explain the practical tradeoffs of each algorithm, which we hope to do here. We’ll discuss the advantages and disadvantages of each algorithm based on our experience. Categorizing machine learning algorithms is tricky, and there are several reasonable…

Continue Reading →

The Ultimate Python Seaborn Tutorial: Gotta Catch ‘Em All

In this step-by-step Seaborn tutorial, you’ll learn how to use one of Python’s most convenient libraries for data visualization. For those who’ve tinkered with Matplotlib before, you may have wondered, “why does it take me 10 lines of code just to make a decent-looking histogram?” Well, if you’re looking for a simpler way to plot attractive charts, then…

Continue Reading →

The 5 Levels of Machine Learning Iteration

Can you guess the answer to this riddle? If you’ve studied machine learning, you’ve seen this everywhere… If you’re a programmer, you’ve done this a thousand times… If you’ve practiced any skill, this is already second-nature for you… Nope, it’s not overdosing on coffee… It’s… iteration! Yes, iteration as in repeating a set of tasks to achieve a result. Wait, isn’t that just… the dictionary…

Continue Reading →

R vs Python for Data Science: Summary of Modern Advances

Recently, some of our readers have been asking us about the best programming language for data science. Immediately, R and Python both come to mind… but which of these two giants to choose? We felt that this was a good time to address this question because we recently watched an excellent presentation on recent advances of…

Continue Reading →

Page 2 of 3