Motivations

In scientific research and data analysis, we often need to quantify and compare differences between data sets. Metrics, such as distances, divergences, and similarity scores, are essential mathematical tools for this task. They allow us to define and measure how different two data sets are, or how well a model fits a given dataset.

In machine learning, metrics are commonly used to assess the performance of models, compare the accuracy of different algorithms, or to tune model parameters. But the same concepts and techniques are also useful in many other scientific disciplines, such as ecology, biology, physics, social sciences, and more.

However, choosing the right metric for a specific task can be a challenging task, especially when multiple metrics are available, and it is not always obvious which one is the most appropriate for a given problem. Moreover, different metrics often have different properties and assumptions, and they can yield very different results depending on the data being analyzed.

The goal of this workshop is to introduce participants to the most commonly used metrics in scientific research and machine learning, and to provide them with a solid understanding of their mathematical properties, applications, and limitations. By the end of the workshop, participants will be able to select, apply, and interpret different metrics to compare data sets, assess the quality of models, and solve a wide range of data analysis problems.

Learning Outcomes

By the end of this workshop, you will be able to:

  • Understand the mathematics behind distance calculations and formal metrics
  • Compare and contrast common metrics used across scientific disciplines and machine learning applications
  • Measure differences in distributions
  • Relate different metrics to one another
  • Apply these concepts to real-world data and research problems

Workshop Format

This workshop will consist of a mix of presentation materials and interactive coding examples that will give participants a chance to practice the concepts that we cover. Here’s a rough breakdown of how the workshop will be structured:

  1. Introduction (10 minutes) - We’ll start by providing an overview of the goals and objectives for the workshop, as well as the schedule for the session.
  2. Theory (20 minutes) - Next, we’ll cover the theoretical underpinnings of formal metrics and distance calculations, including some of the most commonly used metrics in scientific disciplines and machine learning applications.
  3. Metrics Presentation & Coding Examples (60 minutes) - We’ll cover a selection of metrics (tailored to the audience) and intersperse several interactive sections. Participants will then have an opportunity to try out some of the concepts covered in the presentation using interactive code examples that they can run in their web browsers.
  4. Q&A and Wrap-up (10 minutes) - We’ll end the workshop with a brief Q&A session to ensure that all participants’ questions have been answered and to provide additional resources for further learning.
  5. By incorporating live coding examples throughout the workshop, we hope to make the material more engaging and interactive, and give participants the opportunity to apply what they’ve learned in a practical way.

See the workshop structure page for more information about general workshop format.