How to Learn Python for Data Science in 2017 (Updated)

In this guide, we’ll cover how to learn Python for data science, including our favorite curriculum for self-study.

You see, data science is about problem solving, exploration, and extracting valuable information from data.

To do so effectively, you’ll need to wrangle datasets, train machine learning models, visualize results, and much more. Enter Python.

This is the best time ever to learn Python. In fact, Forbes named it a top 10 technical skill in terms of job demand growth. Let’s discuss why…

How to Learn Python for Data Science

Why Learn Python for Data Science?

Python is one of the most widespread languages in the world, and it has a passionate community of users:

Python Popularity, TIOBE Index

It has an even more loyal following within the data science profession.

Some people judge the quality of a programming language by the simplicity of its "hello, world!" program. Python does pretty well by this standard:

For comparison, here's the same output in Java:

Great, case closed. See you back here after you've mastered Python?

Well, in all seriousness, simplicity is one of Python's greatest strengths. Thanks to its precise and efficient syntax, Python can accomplish the same tasks with less code than other languages. This makes implementing solutions refreshingly fast.

In addition, Python's vibrant data science community means you'll be able to find plenty of tutorials, code snippets, and people to commiserate with fixes to common bugs. Stackoverflow will be one of your best friends.

Finally, Python has an all-star lineup of libraries (a.k.a. packages) for data analysis and machine learning, which drastically reduce the time it takes to produce results. More on these later.

How to Learn Python Efficiently

Before we go into what you'll need to learn, let's discuss what you won't need.

You won't need a C.S. degree.

Most data scientists will never deal with topics such as memory leaks, cryptography, or "Big O" notation. You'll be fine as long as you can write clean, logical code in a scripting language such as Python or R.

You won't need a complete course on Python.

Python and data science are not synonymous.

All Uses for Python

You won't need to memorize all the syntax.

Instead, focus on grasping the intuition, such as when function is appropriate or how conditional statements work. You'll gradually remember the syntax after Googling, reading documentation, and good ol' fashioned practice.

We recommend a top-down approach.

We advocate a top-down approach with the goal of getting results first and then solidifying concepts over time. In fact, we prefer to cut out "classroom" study in favor of real-world practice.

  1. You'll start by learning core programming concepts.
  2. Next, you'll gain working knowledge of essential data science libraries.
  3. Finally, you'll practice and refine your skills through actual projects.

This approach will allow you to build mastery over time while having more fun.

Aside: Installing Python through Anaconda

There are many ways to install Python on your computer, but we recommend the Anaconda bundle, which comes with the libraries you'll need for data science.

Step 1: Core Programming Concepts

Effective programming is not about memorizing syntax, but rather mastering a new way of thinking.

Therefore, take your time in building a solid foundation of core programming concepts. These will help you translate solutions in your head into instructions for a computer.

If you are new to programming...

If you are completely new to programming, we recommend the excellent Automate the Boring Stuff with Python book, which has been released for free online under a creative commons license.

The book promises "practical programming for total beginners," and it keeps each lesson down-to-earth. Read up to Chapter 6 - Manipulating Strings and complete the practice questions along the way.

Automate the Boring Stuff by Al Sweigart

If you have experience in another language...

If you only need to brush up on Python syntax, then we recommend the following video, aptly named "Learn Python in One Video:"

Again, the goal of this step is not to learn everything about Python and programming. Instead, focus on the intuition.

You should be able to answer questions such as:

  • What's the difference between an integer, float, and string?
  • How can I use Python as a calculator?
  • What is a for loop? When would I write one?
  • What is the basic structure of a function?
  • How can I use conditional statements (if... else...) to add logic?
  • How do import statements work?

Additional resources

If you'd like more practice with the core programming concepts, check out the following resources.

  • Code Fights is a platform with many short coding challenges that can be completed in 5-minute chunks (although it's so fun that you might find yourself playing through it for hours at at time). You'll gain points along the way and unlock new levels, making it a nice way to track your progression as well.
  • The Python Challenge is one of the coolest puzzles on the web, so don't be put off by its 1990's graphics. You can complete all 33 levels with the help of Python scripts. One user called it "an addictive way to learn the ins and outs of Python..." We agree!
  • PracticePython.org is a collection of short practice problems in Python. It's updated almost every week with a new problem. What's really nice is that the author includes multiple user-submitted solutions for each problem so you can see alternative ways of solving them.
  • How to Think Like a Computer Scientist is a fantastic interactive online book that takes a whirlwind tour through key programming concepts (with Python). If you're completely new to programming, this might be a good option. It's like a condensed "C.S. 101" course.

Step 2: Essential Data Science Libraries

Next, we're going to focus on the for data science part of "how to learn Python for data science."

As we mentioned earlier, Python has an all-star lineup of libraries for data science. Libraries are simply bundles of pre-existing functions and objects that you can import into your script to save time.

These are the action steps we recommend for efficiently picking up a new library:

  1. Open up a new Jupyter Notebook (see below).
  2. Read the library's documentation for 30 minutes for a high-level introduction of its modules.
  3. Import the library into your Jupyter Notebook.
  4. Follow its step-by-step quickstart tutorial to see the library in action.
  5. Review its documentation for another 30 minutes to learn what else it's capable of.

We don't recommend diving much deeper into a library right now because you'll likely forget most of what you've learned by the time you jump into projects. Instead, aim to discover what each library is capable of.

Jupyter Notebook

If you installed Python through the Anaconda bundle as we recommended above, it will also come with Jupyter Notebook. Jupyter Notebook is a lightweight IDE that's a favorite among data scientists. We recommend it for your projects.

You can open a new notebook through Anaconda Navigator, which came with Anaconda. Check out this short video for instructions.

These are the essential libraries you'll need:

NumPy

NumPy allows easy and efficient numeric computation, and many other data science libraries are built on top of it.

Pandas

Pandas is high-performance library for data structures and exploratory analysis. It's built on top of NumPy.

Matplotlib

Matplotlib is a flexible plotting and visualization library. It's powerful but somewhat cumbersome. You have the option of skipping Matplotlib for now and using Seaborn to get started (see our Seaborn recommendation below).

Scikit-Learn

Scikit-Learn is the premier general-purpose machine learning library in Python. It has many popular algorithms and modules for pre-processing, cross-validation, and much more.

Bonus: Seaborn

Seaborn makes it much easier to plot common data visualizations. It's built on top of Matplotlib and offers a more pleasant high-level wrapper.

Step 3: End-to-End Projects

By now, you'll have a basic understanding of programming and a working knowledge of essential libraries. This actually covers most of the Python you'll need to get started with data science.

At this point, some students will feel a bit overwhelmed. That's OK, and it's perfectly normal.

If you were to take the slow and traditional bottom-up approach, you might feel less overwhelmed, but it would have taken you 10 times as long to get here.

Now the key is to dive in immediately and start gluing everything together. Again, our goal up to here has been to just learn enough to get started.

Next, it's time to solidify your knowledge through plenty of practice and projects.

You have several options.

Kaggle Competitions

The first option is to participate on Kaggle, a site that hosts data science competitions.

The main advantage of Kaggle is that every project is self-contained. You're given the dataset, a goal, and tutorials to get you started.

The major disadvantage of competitions is that they're usually not representative of real-world data science. The "Getting Started" competitions are way too basic while the standard competitions (i.e. those with prize pools) are usually too tough for beginners.

If you're interested in this path, check out our Beginner's Guide to Kaggle.

DIY Projects

The next option is to structure your own projects and pick datasets that interest you.

The main advantage of this approach is that the projects are more representative of real-world data science. You'll likely need to define your own goals, collect data, clean your dataset, engineer features, and so on.

The disadvantage of DIY projects is that you'll need to already be familiar with a proper data science workflow. Without one, you could miss important steps or get stuck without knowing how to proceed.

If you go with this path, check out our free 7-day crash course on applied machine learning, which covers the key steps in a data science workflow. We also have another article with several DIY project ideas.

Guided Projects

Finally, there are guided end-to-end projects.

Proper guided projects should combine the best of both words - they should be representative of real-world data science and allow you to solidify your skills through a carefully planned learning curve.

Many data science bootcamps offer this as a main benefit. Bootcamps usually conclude with a "capstone project" that allows you to see all the moving pieces together, from start to finish.

We've also crafted our own Machine Learning Masterclass to solve this exact need. It will provide you over-the-shoulder mentorship for real-world projects while teaching you all of the key concepts in context.

The masterclass also includes a comprehensive Python course that gets you up to speed ASAP. In fact, many successful students have enrolled without any prior programming experience. Learn more about it here.

3 Comments

  • Saeed

    August 17, 2017

    Love love love your content! It is helping me immensely. You are so generous with your knowledge I was instantly compelled to purchase your master class. Now I can’t wait for more courses from you!

    • EliteDataScience

      August 17, 2017

      Thanks for the kind words, Saeed! We will work hard to continue producing helpful content 🙂