6 Fun Machine Learning Projects for Beginners

In this guide, we’ll be walking through 6 fun machine learning projects for beginners. Projects are some of the best investments of your time. You’ll enjoy learning more, stay motivated, and make faster progress.

6 Fun Machine Learning Projects for Beginners

You see, no amount of theory can replace hands-on practice. Textbooks and lessons can lull you into a false belief of mastery because the material is there in front of you. But once you go try to apply it, you might find that it's harder than it looks.

Projects help you elevate your applied ML skills. They also give you the chance to explore an area that interests you.

Plus, you can add projects you complete into your personal portfolio. They make it easier to land a job, find cool career opportunities, and even negotiate a higher salary.

Here are 6 fun machine learning projects for beginners. You can complete any of them in a single weekend, or expand them into longer projects if you enjoy them.

Table of Contents

  1. Machine Learning Gladiator
  2. Play Money Ball
  3. Predict Stock Prices
  4. Teach a Neural Network to Read Handwriting
  5. Investigate Enron
  6. Write ML Algorithms from Scratch

1. Machine Learning Gladiator

We're affectionately calling this "machine learning gladiator," but it's not new. This is one of the fastest ways to build practical intuition around machine learning.

The goal is to take out-of-the-box models and apply them to different datasets. This project is awesome for 3 main reasons:

First, you'll build intuition for model-to-problem fit. Which models are robust to missing data? Which models handle categorical features well? Yes, you can dig through textbooks to find the answers, but you'll learn better by seeing it in action.

Second, this project will teach you the invaluable skill of prototyping models quickly. In the real world, it's often difficult to know which model will perform best without simply trying them.

Finally, this exercise helps you master the workflow of model building. For example, you'll get to practice...

  • Importing data
  • Cleaning data
  • Splitting it into train/test or cross-validation sets
  • Pre-processing
  • Transformations
  • Feature-engineering

Because you'll use out-of-the-box models, you'll have the chance to focus on honing these critical steps.

Check out the sklearn (Python) or caret (R) documentation pages for instructions. You should practice regressionclassification, and clustering algorithms.

Tutorials

  • Python: sklearn - Official tutorial for the sklearn package
  • R: caret - Webinar given by the author of the caret package

Data Sources

  • UCI Machine Learning Repository - 350+ searchable datasets spanning almost every subject matter. You'll definitely find datasets that interest you.
  • Kaggle Datasets - 100+ datasets uploaded by the Kaggle community. There are some really fun datasets here, including PokemonGo spawn locations and Burritos in San Diego.
  • data.gov - Open datasets released by the U.S. government. Great place to look if you're interested in social sciences.

2. Play Money Ball

In the book Moneyball, the Oakland A's revolutionized baseball through analytical player scouting. They built a competitive squad while spending only 1/3 of what large market teams like the Yankees were paying for salaries.

First, if you haven't read the book yet, you should check it out. It's one of our favorites!

Fortunately, the sports world has a ton of data to play with. Data for teams, games, scores, and players are all tracked and freely available online.

There are plenty of fun machine learning projects for beginners. For example, you could try...

  • Sports betting... Predict box scores given the data available at the time right before each new game.
  • Talent scouting... Use college statistics to predict which players would have the best professional careers.
  • General managing... Create clusters of players based on their strengths in order to build a well-rounded team.

Sports is also an excellent domain for practicing data visualization and exploratory analysis. You can use these skills to help you decide which types of data to include in your analyses.

Data Sources

  • Sports Statistics Database - Sports statistics and historical data covering many professional sports and several college ones. Clean interface makes it easier for web scraping.
  • Sports Reference - Another database of sports statistics. More cluttered interface, but individual tables can be exported as CSV files.
  • cricsheet.org - Ball-by-ball data for international and IPL cricket matches. CSV files for IPL and T20 internationals matches are available.

3. Predict Stock Prices

The stock market is like candy-land for any data scientists who are even remotely interested in finance.

First, you have many types of data that you can choose from. You can find prices, fundamentals, global macroeconomic indicators, volatility indices, etc... the list goes on and on.

Second, the data can be very granular. You can easily get time series data by day (or even minute) for each company, which allows you think creatively about trading strategies.

Finally, the financial markets generally have short feedback cycles. Therefore, you can quickly validate your predictions on new data.

Some examples of beginner-friendly machine learning projects you could try include...

  • Quantitative value investing... Predict 6-month price movements based fundamental indicators from companies' quarterly reports.
  • Forecasting... Build time series models, or even recurrent neural networks, on the delta between implied and actual volatility.
  • Statistical arbitrage... Find similar stocks based on their price movements and other factors and look for periods when their prices diverge.

Obvious disclaimer: Building trading models to practice machine learning is simple. Making them profitable is extremely difficult. Nothing here is financial advice, and we do not recommend trading real money.

Tutorials

Data Sources

4. Teach a Neural Network to Read Handwriting

Neural networks and deep learning are two success stories in modern artificial intelligence. They've led to major advances in image recognition, automatic text generation, and even in self-driving cars.

To get involved with this exciting field, you should start with a manageable dataset.

The MNIST Handwritten Digit Classification Challenge is the classic entry point. Image data is generally harder to work with than "flat" relational data. The MNIST data is beginner-friendly and is small enough to fit on one computer.

Handwriting recognition will challenge you, but it doesn't need high computational power.

To start, we recommend with the first chapter in the tutorial below. It will teach you how to build a neural network from scratch that solves the MNIST challenge with high accuracy.

Tutorial

  • Neural Networks and Deep Learning (Online Book) - Chapter 1 walks through how to write a neural network from scratch in Python to classify digits from MNIST. The author also gives a very good explanation of the intuition behind neural networks.

Data Sources

  • MNIST - MNIST is a modified subset of two datasets collected by the U.S. National Institute of Standards and Technology. It contains 70,000 labeled images of handwritten digits.

5. Investigate Enron

The Enron scandal and collapse was one of the largest corporate meltdowns in history.

In the year 2000, Enron was one of the largest energy companies in America. Then, after being outed for fraud, it spiraled downward into bankruptcy within a year.

Luckily for us, we have the Enron email database. It contains 500 thousand emails between 150 former Enron employees, mostly senior executives. It's also the only large public database of real emails, which makes it more valuable.

In fact, data scientists have been using this dataset for education and research for years.

Examples of machine learning projects for beginners you could try include...

  • Anomaly detection. Map the distribution of emails sent and received by hour and try to detect abnormal behavior leading up to the public scandal.
  • Social network analysis. Build network graph models between employees to find key influencers.
  • Natural language processing. Analyze the body messages in conjunction with email metadata to classify emails based on their purposes.

Data Sources

6. Write ML Algorithms from Scratch

Writing machine learning algorithms from scratch is an excellent learning tool for two main reasons.

First, there's no better way to build true understanding of their mechanics. You'll be forced to think about every step, and this leads to true mastery.

Second, you'll learn how to translate mathematical instructions into working code. You'll need this skill when adapting algorithms from academic research.

To start, we recommend picking an algorithm that isn't too complex. There are dozens of subtle decisions you'll need to make for even the simplest algorithms.

After you're comfortable building simple algorithms, try extending them for more functionality. For example, try extending a vanilla logistic regression algorithm into a lasso/ridge regression by adding regularization parameters.

Finally, here's a tip every beginner should know: Don't be discouraged is your algorithm is not as fast or fancy as those in existing packages. Those packages are the fruits of years of development!

Tutorials

Finally, we also have a free 7 day crash course on applied machine learning.

Comments are closed.