1. Mall Customer Segmentation
  2. Problem Statement: You own the mall and want to understand the customers like who can be easily converge (Target customers) so that the
  3. sense can be given to marketing team and plan the strategy accordingly.
  4. Description Variables:
  5. CustomerID: Unique ID assigned to the customer
  6. Gender: Gender of the customer
  7. Age: Age of the customer
  8. Annual Income (k$): Annual Income of the customer
  9. Spending Score (1–100): Score assigned by the mall based on customer behavior and spending nature.
  10. Steps:
  11. 1.Importing Libraries
  12. 2.Importing Data
  13. 3.Data Visualization
  14. 4.Clustering using K-Means
  15. 5.Selection of Clusters
  16. 6.Plotting the Cluster Boundary and Clusters – Use Age, income etc.
  17. 7.Visualization of Cluster Result
  18. Inference:- Please prepare report on inference and explain each cluster what it conveys.
  19.  
  20. Definition:
  21. Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. In the realm of machine learning, k-means clustering can be used to segment customers (or other data) efficiently.
  22. Properties of Clusters:
  23. All the data points in a cluster should be similar to each other.
  24. The data points from different clusters should be as different as possible.
  25. The approach Kmeans follows to solve the problem is called Expectation Maximization:
  26. Specify number of clusters K.
  27. Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement.
  28. Keep iterating until there is no change to the centroids. i.e assignment of data points to clusters isn’t changing.
  29. Compute the sum of the squared distance between data points and all centroids.
  30. Assign each data point to the closest cluster (centroid).
  31. Compute the centroids for the clusters by taking the average of the all data points that belong to each cluster.
  32. Stopping Criteria for K-Means Clustering
  33. There are essentially three stopping criteria that can be adopted to stop the K-means algorithm:
  34. Centroids of newly formed clusters do not change
  35. Points remain in the same cluster
  36. Maximum number of iterations are reached
  37. When to use Cluster Analysis?
  38. If we are using a labeled data we can use classification technique whereas in case when the data is not labeled we can cluster the data based on certain feature and try to label it on our own. So when we use cluster analysis we don’t have labels(ie..data is not labeled)in the context of machine learning this is called as unsupervised learning.
  39. Final Goal:
  40. The goal of clustering is to maximize the similarity of observation within the cluster and maximize the dissimilarity between the clusters.