General overview of tradeoffs involved in designing a production grade ML training/scoring system with event streaming. Stresses the importance of skills that intersect between data engineering, streaming, and data science.
Existing SOTA techniques and limitations of text2sql with metadata augmentation, and directions for ensuring data quality and freshness with data engineering and streaming techniques
Clearing up the hype by describing why event-driven data processing is important regardless of the old debate around "real-time" and "near real-time." I also get into detail around use case prioritization and how to think about data source characteristics
Architectures that can move between batch and incremental processing without changing the storage and API allow us to solve common data trust problems, such as stale data, as well as production AI/ML risks, such as concept drift.”
Performance optimization techniques for processing data for a relational data warehouse. I go over the tradeoffs between cost, latency, and accuracy that all data/AI problems necessitate
Basic performance optimization and design patterns for data processing and ML training
Course I created and teach at Databricks
Asynchronously traverse a tree or graph in JavaScript with Promises