Privacy engineering is an emerging discipline within the software and data engineering domains aiming to provide methodologies, tools, and techniques such that the engineered systems provide acceptable levels of privacy.

In this talk, learn about Databricks’ recent work on anonymization and privacy preserving analytics on large scale geo location datasets.

In particular, the focus is on how to scale anonymization and geospatial analytics workloads with Spark, maximizing the performance by combining multi-dimensional spatial indexing with Spark in-memory computations.

In production, we have successfully achieved 1500+ times enhancements in terms of geo location anonymization, and 10+ times enhancements on nearest neighbour search based on anonymized geo datasets.

tt ads