Linear Digressions
Channel Details
Linear Digressions
Podcast by Ben Jaffe and Katie Malone
Recent Episodes
291 episodesSo long, and thanks for all the fish
All good things must come to an end, including this podcast. This is the last episode we plan to release, and it doesn’t cover data science—it’s mostl...
A Reality Check on AI-Driven Medical Assistants
The data science and artificial intelligence community has made amazing strides in the past few years to algorithmically automate portions of the heal...
A Data Science Take on Open Policing Data
A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied wit...
Procella: YouTube's super-system for analytics data storage
This is a re-release of an episode that originally ran in October 2019.
If you’re trying to manage a project that serves up analytics dat...
The Data Science Open Source Ecosystem
Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source pr...
Rock the ROC Curve
This is a re-release of an episode that first ran on January 29, 2017.
This week: everybody's favorite WWII-era classifier metric! But...
Criminology and Data Science
This episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Z...
Racism, the criminal justice system, and data science
As protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into on...
An interstitial word from Ben
A message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.
Convolutional Neural Networks
This is a re-release of an episode that originally aired on April 1, 2018
If you've done image recognition or computer vision tasks with...
Stein's Paradox
This is a re-release of an episode that was originally released on February 26, 2017.
When you're estimating something about some object...
Protecting Individual-Level Census Data with Differential Privacy
The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset...
Causal Trees
What do you get when you combine the causal inference needs of econometrics with the data-driven methodology of machine learning? Usually these two do...
The Grammar Of Graphics
You may not realize it consciously, but beautiful visualizations have rules. The rules are often implict and manifest themselves as expectations about...
Gaussian Processes
It’s pretty common to fit a function to a dataset when you’re a data scientist. But in many cases, it’s not clear what kind of function might be most...
Keeping ourselves honest when we work with observational healthcare data
The abundance of data in healthcare, and the value we could capture from structuring and analyzing that data, is a huge opportunity. It also presents...
Changing our formulation of AI to avoid runaway risks: Interview with Prof. Stuart Russell
AI is evolving incredibly quickly, and thinking now about where it might go next (and how we as a species and a society should be prepared) is critica...
Putting machine learning into a database
Most data scientists bounce back and forth regularly between doing analysis in databases using SQL and building and deploying machine learning pipelin...
The work-from-home episode
Many of us have the privilege of working from home right now, in an effort to keep ourselves and our family safe and slow the transmission of covid-19...
Understanding Covid-19 transmission: what the data suggests about how the disease spreads
Covid-19 is turning the world upside down right now. One thing that’s extremely important to understand, in order to fight it as effectively as possib...
Network effects re-release: when the power of a public health measure lies in widespread adoption
This week’s episode is a re-release of a recent episode, which we don’t usually do but it seems important for understanding what we can all do to slow...
Causal inference when you can't experiment: difference-in-differences and synthetic controls
When you need to untangle cause and effect, but you can’t run an experiment, it’s time to get creative. This episode covers difference in differences...
Better know a distribution: the Poisson distribution
This is a re-release of an episode that originally ran on October 21, 2018.
The Poisson distribution is a probability distribution functi...
The Lottery Ticket Hypothesis
Recent research into neural networks reveals that sometimes, not all parts of the neural net are equally responsible for the performance of the netwo...
Interesting technical issues prompted by GDPR and data privacy concerns
Data privacy is a huge issue right now, after years of consumers and users gaining awareness of just how much of their personal data is out there and...
Thinking of data science initiatives as innovation initiatives
Put yourself in the shoes of an executive at a big legacy company for a moment, operating in virtually any market vertical: you’re constantly hearing...
Building a curriculum for educating data scientists: Interview with Prof. Xiao-Li Meng
As demand for data scientists grows, and it remains as relevant as ever that practicing data scientists have a solid methodological and technical foun...
Running experiments when there are network effects
Traditional A/B tests assume that whether or not one person got a treatment has no effect on the experiment outcome for another person. But that’s not...
Zeroing in on what makes adversarial examples possible
Adversarial examples are really, really weird: pictures of penguins that get classified with high certainty by machine learning algorithms as drumsets...
Unsupervised Dimensionality Reduction: UMAP vs t-SNE
Dimensionality reduction redux: this episode covers UMAP, an unsupervised algorithm designed to make high-dimensional data easier to visualize, cluste...
Data scientists: beware of simple metrics
Picking a metric for a problem means defining how you’ll measure success in solving that problem. Which sounds important, because it is, but oftentime...
Communicating data science, from academia to industry
For something as multifaceted and ill-defined as data science, communication and sharing best practices across the field can be extremely valuable but...
Optimizing for the short-term vs. the long-term
When data scientists run experiments, like A/B tests, it’s really easy to plan on a period of a few days to a few weeks for collecting data. The thing...
Interview with Prof. Andrew Lo, on using data science to inform complex business decisions
This episode features Prof. Andrew Lo, the author of a paper that we discussed recently on Linear Digressions, in which Prof. Lo uses data to predict...
Using machine learning to predict drug approvals
One of the hottest areas in data science and machine learning right now is healthcare: the size of the healthcare industry, the amount of data it gene...
Facial recognition, society, and the law
Facial recognition being used in everyday life seemed far-off not too long ago. Increasingly, it’s being used and advanced widely and with increasing...
Lessons learned from doing data science, at scale, in industry
If you’ve taken a machine learning class, or read up on A/B tests, you likely have a decent grounding in the theoretical pillars of data science. But...
Varsity A/B Testing
When you want to understand if doing something causes something else to happen, like if a change to a website causes and dip or rise in downstream con...
The Care and Feeding of Data Scientists: Growing Careers
In the third and final installment of a conversation with Michelangelo D’Agostino, VP of Data Science and Engineering at Shoprunner, about growing and...
The Care and Feeding of Data Scientists: Recruiting and Hiring Data Scientists
This week’s episode is the second in a three-part interview series with Michelangelo D’Agostino, VP of Data Science at Shoprunner. This discussion cen...