Dillon Bostwick

Real-Time Machine Learning

Presented at ODSC Europe 2023

General overview of tradeoffs involved in designing a production grade ML training/scoring system with event streaming. Stresses the importance of skills that intersect between data engineering, streaming, and data science.

Learn More

Text to Insights

Presented at Generative AI Conference 2023

Existing SOTA techniques and limitations of text2sql with metadata augmentation, and directions for ensuring data quality and freshness with data engineering and streaming techniques

Learn More

Real-Time Embedding Clustering

Published in ODSC Blog

Reference architecture and code for solving common outlier detection problems like fraud detection using embeddings. Some considerations for performing vector operations to analyze tabular data in a Spark Structured Stream.

Learn More

Data Architecture and Use Cases

To Improve Data Availability, Think Right-Time (Datanami)

Guest published article in Datanami.com

Clearing up the hype by describing why event-driven data processing is important regardless of the old debate around "real-time" and "near real-time." I also get into detail around use case prioritization and how to think about data source characteristics

Learn More

Data Ingestion, Fast and Slow

Presented at Data & AI SUmmit 2023

Architectures that can move between batch and incremental processing without changing the storage and API allow us to solve common data trust problems, such as stale data, as well as production AI/ML risks, such as concept drift.”

Learn More

Delta Live Tables (E-book)

E-book, 2021

End-to-end overview of Delta Live Tables core concepts.

Learn More

Data Architecture

How to Perform ETL on 1 Billion EDW Records for Under $1

Published in Databricks Blog

We present record breaking results for the TPC-DI bench, and the performance techniques used.

Learn More

Real-Time Data Warehousing

Presented at Data & AI Summit, 2022

Performance optimization techniques for processing data for a relational data warehouse. I go over the tradeoffs between cost, latency, and accuracy that all data/AI problems necessitate

Learn More

SmartSQL Queries using Delta Engine

Data Lab Podcast, 2020

Basic performance optimization and design patterns for data processing and ML training

Learn More

Other

Course: Data and AI from First Principles

Free course curriculum

Course I created and teach at Databricks

Learn More

Databricks Infrastructure Automation

Published in Databricks Blog, 2019

Built a tool that automated cloud deployments; introduced Terraform to customers when IaC was an early concept

Learn More

async-recurse (JS library)

Open Source contribution

Asynchronously traverse a tree or graph in JavaScript with Promises

Learn More