kandinsky

Kandinsky

Clustering and Quantization
Using photographs as visual input

Extracting the most significant colors from the photograph using K-Means, Photo © Shaurya Agarwal

Significant colors in a photograph.


Scope

This started as a very simple exploration of the simplest clustering algorithm in use, but I can see that doing a more comprehensive coverage of algorithms may be very valuable. Kandinsky aims to cover:

I. Basic building blocks

  1. Similarity/Distance Measures:
    • Euclidean Distance (Cartesian)
    • Manhattan Distance
    • Cosine Distance
    • Mahalanobis Distance
    • Domain-specific Distances
  2. Data Preprocessing:
    • Feature Scaling and Normalization
    • Dimensionality Reduction (e.g., PCA, t-SNE)
  3. Cluster Evaluation:
    • Internal Measures (Cohesion, Separation)
      • Silhouette Coefficient
      • Davies-Bouldin Index
    • External Measures (vs. Ground Truth)
      • Purity, Rand Index, Adjusted Rand Index

II. Clustering Algorithms

  1. Partitioning-Based
    • K-Means (hard assignments)
    • K-Medoids (more robust to outliers)
    • Fuzzy C-Means (soft assignments)
  2. Hierarchical
    • Agglomerative (Bottom-up)
      • Various linkage methods (single, complete, average)
    • Divisive (Top-down)
  3. Density-Based
    • DBSCAN (Density-Based Spatial Clustering of Applications with Noise, discovers clusters of varying shapes)
    • OPTICS (Ordering Points To Identify the Clustering Structure, extension of DBSCAN, provides reachability plot)
    • HDBSCAN (Improved density clustering, handles varying densities)
  4. Distribution-Based
    • Gaussian Mixture Models (GMM) (assumes data follows mixtures of Gaussian distributions)
  5. Grid-Based
    • STING (Statistical Information Grid-based Clustering)
    • CLIQUE (Clustering In QUEst)
  6. Neural Network-Based
    • Autoencoders (Variational, Denoising, etc.)
      • Learn latent representations for clustering
    • Self-Organizing Maps (SOMs)
      • Preserve neighborhood relationships in a grid-like space
    • Deep Embedded Clustering (DEC)

III. Additional Stuff to tackle when I get time and braincycles to spare…

…so yeah! there’s a bunch of work needed!


Notebooks [WIP]


Eight Down Toofaan Mail

Kandinsky helped in the cinematography for our feature film Eight Down Toofaan Mail.

our feature film **Eight Down Toofaan Mail**

Talks

References

Font