25/10/2025 –, Iniciante
Come learn about #DataMorph, a new open source Python package and teaching tool that can be used to morph an input dataset of 2D points into select shapes, while preserving the summary statistics.
Statistics do not come intuitively to humans; they always try to find simple ways to describe complex things. Given a complex dataset, they may feel tempted to use simple summary statistics like the mean, median, or standard deviation to describe it. However, these numbers are not a replacement for visualizing the distribution.
To illustrate this fact, researchers have generated many datasets that are very different visually, but share the same summary statistics. In this talk, I will discuss Data Morph, an open source package that builds on previous research using simulated annealing to perturb an arbitrary input dataset into a variety of shapes, while preserving the mean, standard deviation, and correlation to multiple decimal points. I will showcase how it works, discuss the challenges faced during development, and explore the limitations of this approach.
Ciência e Análise de Dados, Computação Científica, Programação Criativa
Quais conhecimentos prévios são necessários para que seja possível acompanhar bem a sua atividade? –Attendees should understand what the mean, standard deviation, variance, and correlation are at a high level (no equations necessary). Attendees should also know what class inheritance is at a high-level and be able to follow along with code that instantiates an instance of a class and calls methods on it (a few lines of code at a time).
O que as pessoas que participarem podem esperar aprender na sua atividade? –Relying solely on simple summary statistics like the mean, median, or standard deviation is not enough to describe complex data. Come and see why this is the case and learn what it takes to translate research into an open-source library.
Stefanie Molin is a software engineer at Bloomberg in New York City. She is also a core developer of numpydoc and the author of “Hands-On Data Analysis with Pandas".