I am a Lead Solutions Architect at Databricks, where I've spent the last five years advising customers ranging from startups to Fortune 500 enterprises. I also help lead a team of field ambassadors for streaming products and I'm interested in improving industry awareness of effective streaming patterns for data integration and production machine learning. I used to work as a software engineer doing networking automation.
Here are some of my perspectives on data architecture, data processing, and data science:
General overview of tradeoffs involved in designing a production grade ML training/scoring system with event streaming. Stresses the importance of skills that intersect between data engineering, streaming, and data science.
Existing SOTA techniques and limitations of text2sql with metadata augmentation, and directions for ensuring data quality and freshness with data engineering and streaming techniques
Clearing up the hype by describing why event-driven data processing is important regardless of the old debate around "real-time" and "near real-time." I also get into detail around use case prioritization and how to think about data source characteristics
Architectures that can move between batch and incremental processing without changing the storage and API allow us to solve common data trust problems, such as stale data, as well as production AI/ML risks, such as concept drift.”
Performance optimization techniques for processing data for a relational data warehouse. I go over the tradeoffs between cost, latency, and accuracy that all data/AI problems necessitate
Basic performance optimization and design patterns for data processing and ML training
Course I created and teach at Databricks
Asynchronously traverse a tree or graph in JavaScript with Promises