Databricks co-founder Matei Zaharia wins ACM Prize and declares AGI is already here
April 10, 2026 – 6:09 am
In short:
Matei Zaharia, the Berkeley computer science professor and Databricks co-founder who created Apache Spark, has won the 2026 ACM Prize in Computing. The prize recognizes his foundational contributions to distributed data systems and AI infrastructure. He will receive a $250,000 award funded by an Infosys endowment.
Zaharia is donating the prize proceeds to charity. In an interview following the announcement, he argued that Artificial General Intelligence (AGI) has already arrived, although "it’s just not in a form that we appreciate". He suggested the field should move away from benchmarking AI against human cognition.
From PhD Thesis to Global Infrastructure
Zaharia began developing Apache Spark as a doctoral student at UC Berkeley in 2009. It offered a faster alternative to Hadoop MapReduce, the dominant framework for large-scale data processing at the time. Spark significantly sped up tasks like iterative workloads, machine learning training, and graph processing by moving intermediate computations into memory.
His work on Spark earned him the ACM Doctoral Dissertation Award in 2014. The project became the seed for Databricks, which he co-founded in 2013 with six Berkeley colleagues. Databricks reached a $134 billion valuation in December 2025 and reported a revenue run rate of $5.4 billion in February 2026, growing at over 65% year-over-year.
The ACM recognized Zaharia for "visionary development of distributed data systems and computing infrastructure" that has enabled large-scale machine learning, analytics, and AI.
The Open-Source Ecosystem
Zaharia played a key role in popularizing the Apache Spark, which is licensed under Apache 2.0, the same license used by Google for its Gemma 4 open-weight model family. It has become the standard framework for releasing AI models and tools intended for broad commercial adoption.
Delta Lake, MLflow, and the Data Lakehouse
Zaharia’s contributions extended beyond Spark. He developed Delta Lake to address challenges with cloud data lakes, introducing ACID transactional semantics for reliable data storage. This led to the data lakehouse, a new architectural pattern combining cost-effectiveness and scalability of data lakes with consistency and governance features traditionally found in data warehouses. Databricks’ core product is now based on the data lakehouse concept, which has gained widespread adoption in enterprise data engineering.