Deploy m3 in Real-World Networks

January 2025 to Present (ongoing project)

Research assistant to Chenning Li in Networks and Mobile Systems Lab of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)

This UROP project focuses on m3, a machine learning–based data center network simulator designed to accurately estimate performance metrics such as tail latency. The goal is to enhance m3 by developing a feedback-driven control loop using real-world data center measurements to improve network configuration and optimization.

Modern data center networks are vast and complex, with their performance directly affecting the infrastructure that supports latency-sensitive applications. m3 bridges the gap between fast coarse-grained simulations and precise performance prediction by combining traditional modeling with machine learning techniques. For this project, the scope of work includes adapting m3 to support multiple traffic classes, collecting and analyzing network data, and validating the simulator’s predictions against measured results.

My role in this project is to extend m3’s functionality to support traffic class weights, which will allow it to simulate weighted bandwidth allocation in data center networks. This feature is critical for enabling close-loop control in simulations.

The enhancement involves generalizing flowSim (a flow-level simulator) to compute max/min fluid flow rate allocations while considering traffic class weighting. Specifically, the task includes:

  1. Enumerating Active Traffic Classes: Identifying classes with active (uncompleted) flows at each bottleneck.
  2. Bandwidth Allocation: Allocating bandwidth to each class proportionally to its weight while redistributing bandwidth from inactive classes.
  3. Rate Freezing: Iteratively identifying and freezing the most constrained flows across all links and traffic classes.

Validating Rates: Ensuring that at every step, newly constrained flows are assigned rates no smaller than those in previous rounds.

I enabled support for multiple traffic classes:

  • Workflow:
    • Run flowSim with traffic classes enabled to obtain per-flow FCT slowdowns.
    • Convert FlowSim’s estimations into multiple feature maps, one for each traffic class.
    • Train ML models for each traffic class.

More To Explore

EM Touch App

EMTouch is an app that trains users to identify the emotions behind common facial expressions, with the goal of facilitating communication between autistic and neurotypical individuals.

Astronomy App

App that informers users to special astronomical phenomena and enables astronomers to share their data and findings through an online platform