Affinity Propagation: A Complete Beginner’s Guide
Jun 04, 2026 7 Min Read 37 Views
(Last Updated)
Most clustering methods force you to answer their biggest question up front: how many clusters do you want? K‑Means, for example, needs that number before it starts, and a wrong choice can make the results meaningless. That requirement often defeats the point of clustering, which is to discover the group structure in unfamiliar data.
Affinity Propagation avoids this by letting the data choose its own cluster centers. Every point considers every other point as a potential exemplar and exchanges messages iteratively to negotiate which points become representatives. The exemplars that emerge define clusters, so the algorithm returns the number of clusters as part of the result rather than needing it as an input.
In this article, we will walk through everything a beginner needs to understand about Affinity Propagation: the core idea of exemplar-based clustering, how the responsibility and availability matrices work, what the damping factor and preference parameters control, how to implement it in Python using scikit-learn, how it compares to K-Means, and where it is genuinely useful in real-world problems.
TL;DR
- Affinity Propagation clusters by letting data pick exemplars (actual data points) rather than requiring a preset number of clusters.
- It builds a similarity matrix (usually negative squared distance) and iteratively exchanges two messages responsibility and availability—between all points to decide exemplars.
- The diagonal “preference” values control how likely a point is to become an exemplar; higher preference → more clusters.
- Key hyperparameters: preference (controls cluster count) and damping (stabilizes updates to prevent oscillation). Complexity is O(N²) in memory/time, so it’s best for moderate-sized datasets.
- Use it when exemplar interpretability or unknown cluster count matters; prefer K‑Means for very large datasets or when you know K and need speed.
Table of contents
- The Core Concept: Exemplar-Based Clustering
- The Similarity Matrix Of Affinity Propagation
- The Two Messages: Responsibility and Availability
- Affinity Propagation vs. K-Means
- Advantages of Affinity Propagation
- Limitations of Affinity Propagation
- Real-World Applications
- Wrapping Up
- FAQ
- Do I need to choose the number of clusters beforehand?
- What similarity should I use?
- How do preference and damping affect results?
- Is Affinity Propagation scalable to large datasets?
- When should I prefer Affinity Propagation over K‑Means?
What Is Affinity Propagation?
Affinity Propagation is an unsupervised clustering algorithm that automatically identifies the number of clusters in a dataset without requiring it to be predefined. It works by exchanging messages between data points to determine which points are most representative, called exemplars, for the dataset. These messages—responsibility and availability—are iteratively updated until a stable set of exemplars emerges. This approach allows the algorithm to form clusters based on similarity and is useful in pattern recognition and data grouping tasks.
The Core Concept: Exemplar-Based Clustering
- The Exemplar Concept
Affinity Propagation centers clustering around exemplars actual data points that serve as representative cluster heads, rather than computed centroids, making results more interpretable when a real example is required. - No Need to Pre‑Specify K
Unlike K‑Means or many other methods, Affinity Propagation automatically determines both the number of clusters and their exemplars from the input similarity matrix, removing the need to choose K ahead of time. - Message‑Passing Between Points
The algorithm views data as a fully connected network where points exchange messages about their suitability to be exemplars; these iterative responsibilities and availabilities converge to reveal exemplars and cluster memberships. - Interpretability and Use Cases
Because exemplars are real observations, Affinity Propagation is especially useful for tasks like selecting representative images, identifying archetypal customer profiles, or finding representative gene situations where an actual data point is a more meaningful summary than an abstract centroid.
The Similarity Matrix Of Affinity Propagation
- Before any messages are passed, Affinity Propagation builds a similarity matrix that captures how alike every pair of data points is.
- The algorithm does not require prior labels and clusters purely based on similarity. It starts with a similarity matrix S, where S(i, j) represents the similarity between points xi and xj. By default, similarity is calculated as the negative squared Euclidean distance.
- This means for two points i and k, the similarity is computed as:
- S(i, k) = -(xi – xk)²
- The more similar two points are, the less negative this value is. The most similar pair of points will have a similarity close to zero. Points that are far apart will have very large negative similarities.
- The diagonal of the similarity matrix, S(i, i), is special. These values are called preferences and represent how likely each data point is to become an exemplar on its own merits, before considering any messages from other points.
- Diagonal elements S(i, i) are called preferences, indicating how likely each point is to be chosen as an exemplar. A higher preference makes a point more likely to become a cluster center.
The Two Messages: Responsibility and Availability
The algorithm operates by iteratively passing two types of messages between all pairs of data points. These messages are stored in two matrices that are updated at every iteration.
- The Responsibility Matrix R
The responsibility matrix R has values r(i, k) that quantify how well-suited xk is to serve as the exemplar for xi, relative to other candidate exemplars for xi.
Think of the responsibility message r(i, k) as point i telling point k, “Based on the available alternatives, you are this much better than my next-best option as an exemplar for me.” A high positive value means point k is a genuinely good exemplar candidate for point i. A negative value means there is a better candidate available.
The responsibility update formula is:
R(i, k) = S(i, k) – max{ A(i, k’) + S(i, k’) } for all k’ ≠ k
In plain terms, the responsibility of k for i equals the direct similarity between i and k, minus the best combined score any other candidate k’ offers. If k is the only good option, its responsibility score will be high. If many other candidates are equally good, k’s responsibility will be lower.
- The Availability Matrix A
The availability matrix A contains values a(i, k) that represent how appropriate it would be for xi to pick xk as its exemplar, taking into account other points’ preference for xk as an exemplar.
Think of the availability message a(i, k) as point k telling point i: “This is how much collective support I have from other points to be an exemplar.” A high availability means many other points also want point k as their exemplar, making it a strong candidate. A low or negative availability means k is not well-supported by the broader data.
For non-diagonal elements, the availability update is:
A(i, k) = min{ 0, R(k, k) + sum of max(0, R(i’, k)) } for all i’ ≠ i and i’ ≠ k
The self-availability A(k, k) is updated separately:
A(k, k) = sum of max(0, R(i’, k)) for all i’ ≠ k
This captures how much total positive evidence exists across all points that k should be an exemplar.
Implementing Affinity Propagation in Scikit-Learn
Here is a complete implementation showing how to use Affinity Propagation in scikit-learn:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import AffinityPropagation
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score
# Generate sample data
X, y_true = make_blobs(n_samples=300, centers=4,
cluster_std=0.6, random_state=42)
# Fit Affinity Propagation
af = AffinityPropagation(
damping=0.7, # stabilize convergence
preference=None, # use median similarity (default)
max_iter=200, # maximum iterations
convergence_iter=15, # stop if stable for 15 iterations
affinity=’euclidean’, # similarity measure
random_state=42
)
af.fit(X)
labels = af.labels_
exemplar_indices = af.cluster_centers_indices_
n_clusters = len(exemplar_indices)
print(f”Estimated number of clusters: {n_clusters}”)
print(f”Exemplar indices: {exemplar_indices}”)
# Evaluate quality
if n_clusters > 1:
sil_score = silhouette_score(X, labels)
print(f”Silhouette Score: {sil_score:.4f}”)
# Visualize
plt.figure(figsize=(8, 6))
colors = plt.cm.tab10(np.linspace(0, 1, n_clusters))
for k, color in enumerate(colors):
class_members = labels == k
exemplar = X[exemplar_indices[k]]
plt.scatter(X[class_members, 0], X[class_members, 1],
color=color, alpha=0.6, s=30)
plt.scatter(exemplar[0], exemplar[1], s=200,
color=color, marker=’*’, edgecolors=’black’,
linewidths=1.5, zorder=5)
plt.title(f’Affinity Propagation: {n_clusters} Clusters Found’)
plt.xlabel(‘Feature 1’)
plt.ylabel(‘Feature 2’)
plt.tight_layout()
plt.savefig(‘affinity_propagation.png’, dpi=150)
print(“Plot saved.”)
The star markers in the plot show the exemplars, the actual data points selected as cluster centers. Notice that you did not need to specify the number of clusters at any point. The algorithm found them automatically.
Affinity Propagation is a clustering algorithm designed to identify exemplars—data points that are actual members of the dataset and act as cluster centers. This makes the results highly interpretable, especially in applications like selecting representative images, archetypal customer profiles, or key genetic expressions in bioinformatics. Unlike methods that require predefining the number of clusters, Affinity Propagation determines cluster structure through message passing between data points based on a full similarity matrix, which leads to O(n²) memory complexity in practice.
While this is distinct from bagging, it is worth noting that bootstrap sampling in ensemble methods typically includes about 63% unique samples per iteration. In Affinity Propagation, however, practitioners often tune the preference parameter—for example using the median or minimum similarity—as a principled way to control how many clusters emerge, effectively adjusting the granularity of the resulting structure.
Affinity Propagation vs. K-Means
Understanding where each algorithm excels makes the choice between them clear.
- K-Means is often better when you have large datasets where O(n × k × d) is feasible and O(n²) would be too slow or memory-heavy, data are numeric vectors where Euclidean distance is meaningful and clusters are roughly spherical, you know or can choose a reasonable k, and speed and simplicity are priorities.
- Affinity Propagation is better when you have a meaningful pairwise similarity measure not necessarily Euclidean distance), you want exemplars that are actual data points, and the number of clusters is unknown.
- Affinity Propagation is more robust to cluster shape and scale because it uses arbitrary similarities, but it is sensitive to the similarity function and preference choice and can produce many small clusters if preferences are too high.
- The most important practical consideration is dataset size. K-Means scales well to large datasets. Affinity Propagation has a time complexity of O(N² × T) where N is the number of samples and T is the number of iterations, which makes it impractical for datasets with more than a few thousand points without modifications.
Advantages of Affinity Propagation
- Affinity Propagation does not require a pre-specified number of clusters, making it a useful tool for exploratory data analysis. It has been shown to outperform other clustering algorithms on certain types of datasets.
- The exemplar-based approach means cluster centers are always real, interpretable data points rather than abstract computed centroids.
- This is especially valuable in domains like bioinformatics, where an exemplar gene or protein is directly useful, or in image clustering, where an exemplar image can be displayed and examined.
- The inventors of affinity propagation showed it is better for certain computer vision and computational biology tasks, such as clustering pictures of human faces and identifying regulated transcripts, than K-Means, even when K-Means was allowed many random restarts and initialized using PCA.
Limitations of Affinity Propagation
- Affinity Propagation can be computationally expensive for large datasets or datasets with many features. It may converge to suboptimal solutions, especially when the data has a high degree of variability or noise.
- It can be sensitive to the choice of the damping factor. It may produce many small clusters or clusters with only one or a few members, which may not be meaningful.
- The O(N²) memory requirement to store the full similarity matrix is perhaps the most severe practical constraint. For a dataset of 10,000 points, the similarity matrix alone requires storing 100 million values.
- For datasets larger than a few thousand samples, this quickly becomes infeasible on standard hardware without modifications such as sparse similarity matrices or subsampling.
- Non-convergence is another practical challenge. Non-convergence is often due to oscillations. The solution is to adjust the damping factor. High computational cost is also a limitation, and for large datasets, the solution is to use sparse matrices or subsampling.
Real-World Applications
- Gene expression data analysis is a major application where affinity propagation has been used to identify key genes and group similar gene expressions, aiding in understanding biological processes and disease mechanisms.
- Customer segmentation uses affinity propagation to cluster customers based on purchase behavior and preferences so businesses can tailor marketing strategies to specific segments.
- Bioinformatics applications include identifying representative genes or proteins, and social network analysis uses it to identify influential individuals or communities.
- In natural language processing, affinity propagation has been applied to document clustering and text summarization.
- By treating pairwise sentence or document similarities as the input matrix, the algorithm identifies exemplar sentences that best represent clusters of related content, which can then serve as the basis for extractive summarization.
If you’re serious about mastering Affinity Propagation, understanding exemplar-based clustering, message passing, and how to implement it in Python without predefining the number of clusters, don’t miss the chance to enroll in HCL GUVI’s Artificial Intelligence & Machine Learning Course, co-designed by Intel.
Wrapping Up
Affinity Propagation is one of the most conceptually elegant clustering algorithms in machine learning. By framing clustering as a network problem where every data point simultaneously campaigns to become an exemplar while evaluating others as candidates for their own cluster, it elegantly bypasses the requirement to specify the number of clusters in advance.
The two-matrix message-passing framework, where responsibility captures how good a candidate exemplar is for a specific point and availability captures how much collective support a candidate has from all points, creates a self-organizing process that converges to a principled solution grounded in the actual structure of the data.
Understanding how the preference parameter controls cluster granularity and how the damping factor stabilizes convergence gives you the practical control needed to apply the algorithm to real datasets effectively.
While its quadratic complexity limits scalability, for moderate-sized datasets where exemplar interpretability and automatic cluster discovery matter, Affinity Propagation remains one of the most powerful tools in the unsupervised learning toolkit.
FAQ
1. Do I need to choose the number of clusters beforehand?
No. Affinity Propagation determines the number of clusters automatically; you control granularity indirectly via the preference parameter.
2. What similarity should I use?
The default is negative squared Euclidean distance. Use a similarity that fits your data (cosine, correlation, or custom kernels) because results depend strongly on the similarity measure.
3. How do preference and damping affect results?
Higher preference values make more points likely to become exemplars (more clusters). Damping (0.5–0.9 typical) slows updates to prevent oscillations; increase damping if the algorithm fails to converge.
4. Is Affinity Propagation scalable to large datasets?
Not easily—it requires O(N²) memory/time for the full similarity matrix and message updates. For thousands of points you may need subsampling, sparse similarities, or alternative clustering methods (e.g., K‑Means).
5. When should I prefer Affinity Propagation over K‑Means?
Prefer Affinity Propagation when you want exemplar-based clusters, don’t know K, or have a meaningful pairwise similarity (not necessarily Euclidean). Use K‑Means for large datasets, when clusters are roughly spherical, and when you can supply a reliable K.



Did you enjoy this article?