Artificial Intelligence and Machine Learning Articles

Get In Touch For Details! Request More Information

Name

Email ID

Phone Number

Education Qualification

Current Profile

Select your interested program

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Affinity Propagation: A Complete Beginner’s Guide

By Vishalini Devarajan

Jul 20, 2026 7 Min Read 556 Views

(Last Updated)

Most clustering methods force you to answer their biggest question up front: how many clusters do you want? K‑Means, for example, needs that number before it starts, and a wrong choice can make the results meaningless. That requirement often defeats the point of clustering, which is to discover the group structure in unfamiliar data.

Affinity Propagation avoids this by letting the data choose its own cluster centers. Every point considers every other point as a potential exemplar and exchanges messages iteratively to negotiate which points become representatives. The exemplars that emerge define clusters, so the algorithm returns the number of clusters as part of the result rather than needing it as an input.

In this article, we will walk through everything a beginner needs to understand about Affinity Propagation: the core idea of exemplar-based clustering, how the responsibility and availability matrices work, what the damping factor and preference parameters control, how to implement it in Python using scikit-learn, how it compares to K-Means, and where it is genuinely useful in real-world problems.

TL;DR

Affinity Propagation clusters by letting data pick exemplars (actual data points) rather than requiring a preset number of clusters.
It builds a similarity matrix (usually negative squared distance) and iteratively exchanges two messages responsibility and availability—between all points to decide exemplars.
The diagonal “preference” values control how likely a point is to become an exemplar; higher preference → more clusters.
Key hyperparameters: preference (controls cluster count) and damping (stabilizes updates to prevent oscillation). Complexity is O(N²) in memory/time, so it’s best for moderate-sized datasets.
Use it when exemplar interpretability or unknown cluster count matters; prefer K‑Means for very large datasets or when you know K and need speed.

The Core Concept: Exemplar-Based Clustering
The Similarity Matrix Of Affinity Propagation
The Two Messages: Responsibility and Availability
Affinity Propagation vs. K-Means
Advantages of Affinity Propagation
Limitations of Affinity Propagation
Real-World Applications
Wrapping Up
FAQ

Do I need to choose the number of clusters beforehand?
What similarity should I use?
How do preference and damping affect results?
Is Affinity Propagation scalable to large datasets?
When should I prefer Affinity Propagation over K‑Means?

What Is Affinity Propagation?

Affinity Propagation is an unsupervised clustering algorithm that automatically identifies the number of clusters in a dataset without requiring it to be predefined. It works by exchanging messages between data points to determine which points are most representative, called exemplars, for the dataset. These messages—responsibility and availability—are iteratively updated until a stable set of exemplars emerges. This approach allows the algorithm to form clusters based on similarity and is useful in pattern recognition and data grouping tasks.

The Core Concept: Exemplar-Based Clustering

Affinity propagation : A complete beginners guide

The Exemplar Concept
Affinity Propagation centers clustering around exemplars actual data points that serve as representative cluster heads, rather than computed centroids, making results more interpretable when a real example is required.
No Need to Pre‑Specify K
Unlike K‑Means or many other methods, Affinity Propagation automatically determines both the number of clusters and their exemplars from the input similarity matrix, removing the need to choose K ahead of time.
Message‑Passing Between Points
The algorithm views data as a fully connected network where points exchange messages about their suitability to be exemplars; these iterative responsibilities and availabilities converge to reveal exemplars and cluster memberships.
Interpretability and Use Cases
Because exemplars are real observations, Affinity Propagation is especially useful for tasks like selecting representative images, identifying archetypal customer profiles, or finding representative gene situations where an actual data point is a more meaningful summary than an abstract centroid.

The Similarity Matrix Of Affinity Propagation

Before any messages are passed, Affinity Propagation builds a similarity matrix that captures how alike every pair of data points is.
The algorithm does not require prior labels and clusters purely based on similarity. It starts with a similarity matrix S, where S(i, j) represents the similarity between points xi and xj. By default, similarity is calculated as the negative squared Euclidean distance.
This means for two points i and k, the similarity is computed as:
S(i, k) = -(xi – xk)²
The more similar two points are, the less negative this value is. The most similar pair of points will have a similarity close to zero. Points that are far apart will have very large negative similarities.
The diagonal of the similarity matrix, S(i, i), is special. These values are called preferences and represent how likely each data point is to become an exemplar on its own merits, before considering any messages from other points.
Diagonal elements S(i, i) are called preferences, indicating how likely each point is to be chosen as an exemplar. A higher preference makes a point more likely to become a cluster center.

The Two Messages: Responsibility and Availability

The two messages : Responsibilities and availability

The algorithm operates by iteratively passing two types of messages between all pairs of data points. These messages are stored in two matrices that are updated at every iteration.

The Responsibility Matrix R

The responsibility matrix R has values r(i, k) that quantify how well-suited xk is to serve as the exemplar for xi, relative to other candidate exemplars for xi.

Think of the responsibility message r(i, k) as point i telling point k, “Based on the available alternatives, you are this much better than my next-best option as an exemplar for me.” A high positive value means point k is a genuinely good exemplar candidate for point i. A negative value means there is a better candidate available.

The responsibility update formula is:

R(i, k) = S(i, k) – max{ A(i, k’) + S(i, k’) } for all k’ ≠ k

In plain terms, the responsibility of k for i equals the direct similarity between i and k, minus the best combined score any other candidate k’ offers. If k is the only good option, its responsibility score will be high. If many other candidates are equally good, k’s responsibility will be lower.

The Availability Matrix A

The availability matrix A contains values a(i, k) that represent how appropriate it would be for xi to pick xk as its exemplar, taking into account other points’ preference for xk as an exemplar.

Think of the availability message a(i, k) as point k telling point i: “This is how much collective support I have from other points to be an exemplar.” A high availability means many other points also want point k as their exemplar, making it a strong candidate. A low or negative availability means k is not well-supported by the broader data.

For non-diagonal elements, the availability update is:

A(i, k) = min{ 0, R(k, k) + sum of max(0, R(i’, k)) } for all i’ ≠ i and i’ ≠ k

The self-availability A(k, k) is updated separately:

A(k, k) = sum of max(0, R(i’, k)) for all i’ ≠ k

This captures how much total positive evidence exists across all points that k should be an exemplar.

Implementing Affinity Propagation in Scikit-Learn

Here is a complete implementation showing how to use Affinity Propagation in scikit-learn:

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import AffinityPropagation

from sklearn.datasets import make_blobs

from sklearn.metrics import silhouette_score

# Generate sample data

X, y_true = make_blobs(n_samples=300, centers=4,

cluster_std=0.6, random_state=42)

# Fit Affinity Propagation

af = AffinityPropagation(

damping=0.7, # stabilize convergence

preference=None, # use median similarity (default)

max_iter=200, # maximum iterations

convergence_iter=15, # stop if stable for 15 iterations

affinity=’euclidean’, # similarity measure

random_state=42

)

af.fit(X)

labels = af.labels_

exemplar_indices = af.cluster_centers_indices_

n_clusters = len(exemplar_indices)

print(f”Estimated number of clusters: {n_clusters}”)

print(f”Exemplar indices: {exemplar_indices}”)

# Evaluate quality

if n_clusters > 1:

sil_score = silhouette_score(X, labels)

print(f”Silhouette Score: {sil_score:.4f}”)

# Visualize

plt.figure(figsize=(8, 6))

colors = plt.cm.tab10(np.linspace(0, 1, n_clusters))

for k, color in enumerate(colors):

class_members = labels == k

exemplar = X[exemplar_indices[k]]

plt.scatter(X[class_members, 0], X[class_members, 1],

color=color, alpha=0.6, s=30)

plt.scatter(exemplar[0], exemplar[1], s=200,

color=color, marker=’*’, edgecolors=’black’,

linewidths=1.5, zorder=5)

plt.title(f’Affinity Propagation: {n_clusters} Clusters Found’)

plt.xlabel(‘Feature 1’)

plt.ylabel(‘Feature 2’)

plt.tight_layout()

plt.savefig(‘affinity_propagation.png’, dpi=150)

print(“Plot saved.”)

The star markers in the plot show the exemplars, the actual data points selected as cluster centers. Notice that you did not need to specify the number of clusters at any point. The algorithm found them automatically.

💡 Did You Know?

Affinity Propagation is a clustering algorithm designed to identify exemplars—data points that are actual members of the dataset and act as cluster centers. This makes the results highly interpretable, especially in applications like selecting representative images, archetypal customer profiles, or key genetic expressions in bioinformatics. Unlike methods that require predefining the number of clusters, Affinity Propagation determines cluster structure through message passing between data points based on a full similarity matrix, which leads to O(n²) memory complexity in practice.

While this is distinct from bagging, it is worth noting that bootstrap sampling in ensemble methods typically includes about 63% unique samples per iteration. In Affinity Propagation, however, practitioners often tune the preference parameter—for example using the median or minimum similarity—as a principled way to control how many clusters emerge, effectively adjusting the granularity of the resulting structure.

Affinity Propagation vs. K-Means

Understanding where each algorithm excels makes the choice between them clear.

K-Means is often better when you have large datasets where O(n × k × d) is feasible and O(n²) would be too slow or memory-heavy, data are numeric vectors where Euclidean distance is meaningful and clusters are roughly spherical, you know or can choose a reasonable k, and speed and simplicity are priorities.
Affinity Propagation is better when you have a meaningful pairwise similarity measure not necessarily Euclidean distance), you want exemplars that are actual data points, and the number of clusters is unknown.
Affinity Propagation is more robust to cluster shape and scale because it uses arbitrary similarities, but it is sensitive to the similarity function and preference choice and can produce many small clusters if preferences are too high.
The most important practical consideration is dataset size. K-Means scales well to large datasets. Affinity Propagation has a time complexity of O(N² × T) where N is the number of samples and T is the number of iterations, which makes it impractical for datasets with more than a few thousand points without modifications.

Advantages of Affinity Propagation

Affinity Propagation does not require a pre-specified number of clusters, making it a useful tool for exploratory data analysis. It has been shown to outperform other clustering algorithms on certain types of datasets.
The exemplar-based approach means cluster centers are always real, interpretable data points rather than abstract computed centroids.
This is especially valuable in domains like bioinformatics, where an exemplar gene or protein is directly useful, or in image clustering, where an exemplar image can be displayed and examined.
The inventors of affinity propagation showed it is better for certain computer vision and computational biology tasks, such as clustering pictures of human faces and identifying regulated transcripts, than K-Means, even when K-Means was allowed many random restarts and initialized using PCA.

Limitations of Affinity Propagation

Affinity Propagation can be computationally expensive for large datasets or datasets with many features. It may converge to suboptimal solutions, especially when the data has a high degree of variability or noise.
It can be sensitive to the choice of the damping factor. It may produce many small clusters or clusters with only one or a few members, which may not be meaningful.
The O(N²) memory requirement to store the full similarity matrix is perhaps the most severe practical constraint. For a dataset of 10,000 points, the similarity matrix alone requires storing 100 million values.
For datasets larger than a few thousand samples, this quickly becomes infeasible on standard hardware without modifications such as sparse similarity matrices or subsampling.
Non-convergence is another practical challenge. Non-convergence is often due to oscillations. The solution is to adjust the damping factor. High computational cost is also a limitation, and for large datasets, the solution is to use sparse matrices or subsampling.

Real-World Applications

Gene expression data analysis is a major application where affinity propagation has been used to identify key genes and group similar gene expressions, aiding in understanding biological processes and disease mechanisms.
Customer segmentation uses affinity propagation to cluster customers based on purchase behavior and preferences so businesses can tailor marketing strategies to specific segments.
Bioinformatics applications include identifying representative genes or proteins, and social network analysis uses it to identify influential individuals or communities.
In natural language processing, affinity propagation has been applied to document clustering and text summarization.
By treating pairwise sentence or document similarities as the input matrix, the algorithm identifies exemplar sentences that best represent clusters of related content, which can then serve as the basis for extractive summarization.

If you’re serious about mastering Affinity Propagation, understanding exemplar-based clustering, message passing, and how to implement it in Python without predefining the number of clusters, don’t miss the chance to enroll in HCL GUVI’s Artificial Intelligence & Machine Learning Course, co-designed by Intel.

Wrapping Up

Affinity Propagation is one of the most conceptually elegant clustering algorithms in machine learning. By framing clustering as a network problem where every data point simultaneously campaigns to become an exemplar while evaluating others as candidates for their own cluster, it elegantly bypasses the requirement to specify the number of clusters in advance.

The two-matrix message-passing framework, where responsibility captures how good a candidate exemplar is for a specific point and availability captures how much collective support a candidate has from all points, creates a self-organizing process that converges to a principled solution grounded in the actual structure of the data.

Understanding how the preference parameter controls cluster granularity and how the damping factor stabilizes convergence gives you the practical control needed to apply the algorithm to real datasets effectively.

While its quadratic complexity limits scalability, for moderate-sized datasets where exemplar interpretability and automatic cluster discovery matter, Affinity Propagation remains one of the most powerful tools in the unsupervised learning toolkit.

FAQ

1. Do I need to choose the number of clusters beforehand?

No. Affinity Propagation determines the number of clusters automatically; you control granularity indirectly via the preference parameter.

2. What similarity should I use?

The default is negative squared Euclidean distance. Use a similarity that fits your data (cosine, correlation, or custom kernels) because results depend strongly on the similarity measure.

3. How do preference and damping affect results?

Higher preference values make more points likely to become exemplars (more clusters). Damping (0.5–0.9 typical) slows updates to prevent oscillations; increase damping if the algorithm fails to converge.

4. Is Affinity Propagation scalable to large datasets?

Not easily—it requires O(N²) memory/time for the full similarity matrix and message updates. For thousands of points you may need subsampling, sparse similarities, or alternative clustering methods (e.g., K‑Means).

5. When should I prefer Affinity Propagation over K‑Means?

Prefer Affinity Propagation when you want exemplar-based clusters, don’t know K, or have a meaningful pairwise similarity (not necessarily Euclidean). Use K‑Means for large datasets, when clusters are roughly spherical, and when you can supply a reliable K.

Success Stories

About the Author

Vishalini Devarajan

An Aerospace Engineer turned content writer, I focus on making complex concepts easy to understand through well-structured, reader-friendly blogs. Whether it’s a technical topic or a non-technical one, I love creating content that is clear, engaging, and impactful.

View all posts by Vishalini Devarajan

Did you enjoy this article?

Recommended Courses

Artificial Intelligence and Machine Learning Course

Available in

English

Blog Categories

Interview Questions

Artificial Intelligence and Machine Learning Articles

Affinity Propagation: A Complete Beginner’s Guide

Table of contents

What Is Affinity Propagation?

The Core Concept: Exemplar-Based Clustering

The Similarity Matrix Of Affinity Propagation

The Two Messages: Responsibility and Availability

Affinity Propagation vs. K-Means

Advantages of Affinity Propagation

Limitations of Affinity Propagation

Real-World Applications

Wrapping Up

FAQ

1. Do I need to choose the number of clusters beforehand?

2. What similarity should I use?

3. How do preference and damping affect results?

4. Is Affinity Propagation scalable to large datasets?

5. When should I prefer Affinity Propagation over K‑Means?

Success Stories

About the Author

Vishalini Devarajan

Did you enjoy this article?

Recommended Courses

Most Popular

Artificial Intelligence and Machine Learning Course

Syllabus

Know More

Chatgpt for Everyone

Natural Language Processing Us...

Dalle in French

Machine Learning and AI Servic...

ChatGPT for Programmers

Keras for Beginners

Keras for Beginners in Hindi

Keras for Beginners in Telugu

Deep learning using Pytorch

Deep learning using Pytorch

Practical Machine Learning

Building a Virtual AI Assistan...

Schedule 1:1 free counselling

Similar Articles

Artificial Intelligence and Machine Learning Articles