All my technology and research in one place
Data Engineering Workshops with Spark (PySpark), Pandas, Dask, Ray etc - some of the most popular libraries in the field. Supports Google Colab, click on the badge next to each notebook’s link.
These serve as practical notes and references on common machine learning algorithms with an introduction to Pandas and Numpy.
concepts like zero-copy-columnar-layout-distributed-vectorized etc. that sound like Vogon Poetry to data engineering teams trying to modernize their game…
Long-form posts on data engineering and technology in collaboration with my team
Pronounced “cooler” provides 3 approaches to building a regular expressions engine - a toy overview, a backtracking based implementation and a finite-automata based approach.
Discover key colours in a painting or photograph using K-Means clustering. Also provides the proportions of the colours. Some results (more in the repo):







