Explore

Foundations of Data Science: K-Means Clustering in Python

Approx. 29 hours to complete

Save Course

Go to Course

Course Summary

This course is designed to teach students how to use k-means clustering in Python for data science applications. Students will learn how to implement k-means clustering algorithms, evaluate their results, and apply them to real-world problems.

Key Learning Points

Learn how to use k-means clustering in Python for data science applications
Implement k-means clustering algorithms
Evaluate the results of k-means clustering and apply it to real-world problems

Learning Outcomes

Implement k-means clustering algorithms using Python
Evaluate the effectiveness of k-means clustering
Apply k-means clustering to real-world problems

Prerequisites or good to have knowledge before taking this course

Basic understanding of Python programming
Familiarity with data science concepts

Course Difficulty Level

Intermediate

Course Format

Self-paced
Online
Video lectures
Hands-on exercises

Similar Courses

Machine Learning with Python
Applied Data Science with Python
Data Mining

Related Education Paths

Related Books

Description

Organisations all around the world are using data to predict behaviours and extract valuable real-world insights to inform decisions. Managing and analysing big data has become an essential part of modern finance, retail, marketing, social science, development and research, medicine and government.

This MOOC, designed by an academic team from Goldsmiths, University of London, will quickly introduce you to the core concepts of Data Science to prepare you for intermediate and advanced Data Science courses. It focuses on the basic mathematics, statistics and programming skills that are necessary for typical data analysis tasks. You will consider these fundamental concepts on an example data clustering task, and you will use this example to learn basic programming skills that are necessary for mastering Data Science techniques. During the course, you will be asked to do a series of mathematical and programming exercises and a small data clustering project for a given dataset.

Knowledge

Define and explain the key concepts of data clustering
Demonstrate understanding of the key constructs and features of the Python language.
Implement in Python the principle steps of the K-means algorithm.
Design and execute a whole data clustering workflow and interpret the outputs.

Outline

Week 1: Foundations of Data Science: K-Means Clustering in Python
Welcome and Introduction
Introduction to Data Science
What is Data?
Types of Data
Machine Learning
Supervised vs Unsupervised Learning
K-Means Clustering
Preparing your Data
A Real World Dataset
Types of Data – Review Information
Supervised vs Unsupervised – Review Information
K-Means Clustering – Review Information
Week 1 Summative Assessment

Week 2: Means and Deviations in Mathematics and Python
2.0: Week 2 Introduction
2.1 – Introduction to Mathematical Concepts of Data Clustering
2.2 – Mean of One Dimensional Lists
2.3 – Variance and Standard Deviation
2.4 Jupyter Notebooks
2.5 Variables
2.6 Lists
2.7 Computing the Mean
2.8 Better Lists: NumPy
2.9 Computing the Standard Deviation
Week 2 Conclusion
Population vs Sample, Bias
Variability, Standard Deviation and Bias
Python Style Guide
Numpy and Array Creation
Population vs Sample – Review Information
Mean of One Dimensional Lists – Review Information
Variance and Standard Deviation – Review Information
Jupyter Notebooks – Review Information
Variables – Review Information
Lists – Review Information
Computing the Mean – Review Information
Better Lists – Review Information
Computing the Standard Deviation – Review Information
Week 2 Summative Assessment

Week 3: Moving from One to Two Dimensional Data
Week 3 Introduction
3.1 Multidimensional Data Points and Features
3.2 Multidimensional Mean
3.3 Dispersion: Multidimensional Variables
3.4 Distance Metrics
3.5 Normalisation
3.6 Outliers
3.7 Basic Plotting
3.7a Storing 2D Coordinates in a Single Data Structure
3.8 Multidimensional Mean
3.9 Adding Graphical Overlays
3.10 Calculating the Distance to the Mean
3.11 List Comprehension
3.12 Normalisation in Python
3.13 Outliers and Plotting Normalised Data
Week 3 Conclusion
Multidimensional Data Points and Features Recap
Multidimensional Mean Recap
Multidimensional Variables Recap
Distance Metrics Recap
Normalisation Recap
Note on Matplotlib
Matplotlib Scatter Plot Documentation
Matplotlib Patches Documentation
List Comprehension Documentation
3.12 Errata
Multidimensional Data Points and Features – Review Information
Multidimensional Mean – Review Information
Dispersion: Multidimensional Variables – Review Information
Distance Metrics – Review Information
Normalisation – Review Information
Outliers – Review Information
Basic Plotting – Review Information
Storing 2D Coordinates – Review Information
Multidimensional Mean – Review Information
Adding Graphical Overlays – Review Information
Calculating Distance – Review Information
List Comprehension – Review Information
Normalisation in Python – Review Information
Outliers – Review Information
Week 3 Summative Assessment

Week 4: Introducing Pandas and Using K-Means to Analyse Data
Week 4 Introduction
4.1: Using the Pandas Library to Read csv Files
4.1a: Sorting and Filtering Data Using Pandas
4.1b: Labelling Points on a Graph
4.1c: Labelling all the Points on a Graph
4.2: Eyeballing the Data
4.3: Using K-Means to Interpret the Data
Week 4: Conclusion
Week 4 Code Resources
Pandas Read_CSV Function
More Pandas Library Documentation
The Pyplot Text Function
For Loops in Python
Documentation for sklearn.cluster.KMeans
Using the Pandas Library to Read csv Files – Review Information
Sorting and Filtering Data Using Pandas – Review Information
Labelling Points on a Graph – Review Information
Labelling all the Points on a Graph – Review Information
Eyeballing the Data – Review Information
Using K-Means to Interpret the Data – Review Information
Week 4 Summative Assessment

Week 5: A Data Clustering Project
Introduction to Week 5
5.1 Can a Machine Detect Fake Notes?
5.2 Working for a Client
5.3 How to Organize Work on Your Project
5.4 Dealing With Difficulties
5.5 No Data no Data Science: Introduction of the Dataset
5.6 Modelling
5.7 Presenting the Project Results
5.8 Concluding Remarks
Week 5 Code Resource – the Dataset for our Project
Saving plt.scatter Outputs as Figures
Additional Recommended Reading for Week 5
How Would You Help? – Review Information
Python – Review Information
Week 5 Summative Assessment

Summary of User Reviews

Discover the power of k-means clustering in data science with this comprehensive course on Coursera. Gain practical skills in Python and learn how to implement k-means clustering algorithms for real-world applications. Highly recommended for anyone looking to advance their knowledge in data science.

Key Aspect Users Liked About This Course

Many users found the course to be well-structured and easy to follow.

Pros from User Reviews

Clear and concise explanations of k-means clustering concepts.
Hands-on exercises and projects to practice implementing k-means clustering algorithms.
The course provides a good balance between theory and practical applications.
Instructors are knowledgeable and responsive to questions.
Excellent resource for beginners and intermediate learners in data science.

Cons from User Reviews

Some users found the pace of the course to be too slow.
The course could benefit from more advanced topics and applications of k-means clustering.
No certification is offered upon completion of the course.
Some users experienced technical difficulties with the online platform.
The course may not be suitable for users looking for a deep dive into the mathematical underpinnings of k-means clustering.

Recommended for you

DP-100: A-Z Machine Learning using Azure Machine Learning

Microsoft Azure DP-100: Designing and Implementing a Data Science Solution Exam Covered. Learn Azure Machine Learning DP-100 is designed for Data Scientists....

Save Course

The Complete Machine Learning Course with Python

Build a Portfolio of 12 Machine Learning Projects with Python, SVM, Regression, Unsupervised Machine Learning & More! Brand new sections include:...

Save Course

2021 Python for Machine Learning & Data Science Masterclass

Learn about Data Science and Machine Learning with Python! Including Numpy, Pandas, Matplotlib, Scikit-Learn and more! Join Jose Portilla's over 2 million students to learn about the future today!...

Save Course

Machine Learning for Data Analysis

Are you interested in predicting future outcomes using your data? Make sure to familiarize yourself with course 3 of this specialization before diving into these machine learning concepts....

Save Course