Home / AWS/ AWS re:Invent 2025: AWS Clean Rooms Introduces Synthetic Data Generation for Privacy-Safe ML Training

AWS re:Invent 2025: AWS Clean Rooms Introduces Synthetic Data Generation for Privacy-Safe ML Training

December 3, 2025

What is AWS re:Invent?
AWS re:Invent is Amazon Web Services’ largest annual cloud conference, where AWS announces new services, enhancements, and strategic direction for the coming year. The event features keynotes, technical deep dives, hands-on sessions, and hundreds of product launches, making it one of the most influential cloud events in the industry.

As part of this year’s re:Invent announcements, AWS introduced a new capability for AWS Clean Rooms ML that allows organizations and their partners to generate privacy-enhancing synthetic datasets for machine learning (ML). This capability enables teams to train regression and classification models using data that preserves the statistical characteristics of the original dataset while avoiding exposure of real, identifiable records.

 

This update expands how companies can collaborate on sensitive data while maintaining strict privacy controls. It unlocks ML use cases that were previously limited by regulatory and data-sharing constraints.

What AWS Announced

AWS Clean Rooms ML now supports the creation of synthetic training datasets that mimic the patterns and distributions of real data. The training code gains access only to the synthetic version, not the original records. This reduces the risk of models memorizing or leaking sensitive information and makes it safer to develop collaborative ML models across organizational boundaries.

With this capability, partners can jointly generate ML-ready datasets for uses such as:

• Campaign optimization
• Fraud detection
• Medical or scientific research
• Cross-brand customer analytics
• Joint product or promotion planning

AWS provided an example involving an airline and a hotel brand that want to improve customer targeting without sharing raw data. Synthetic data allows the two organizations to collaborate without exposing sensitive consumer information.

Why Synthetic Data Matters

1. Maintains Data Utility While Protecting Identities

Synthetic datasets preserve the statistical structure of the original data while de-identifying individuals. This reduces the risk of exposing personal information and minimizes the chance that models inadvertently memorize specific records.

2. Enables ML Training Across Organizations

Companies can collaborate on ML initiatives with partners even if data contains sensitive personal information or cannot leave its originating environment. Synthetic data becomes a safe intermediary when regulatory restrictions prevent direct sharing.

3. Unlocks Use Cases Previously Limited by Privacy Concerns

Industries that rely heavily on regulated or sensitive data, such as travel, healthcare, finance, and advertising, can now jointly develop ML models without compromising data governance.

Expanded Support for Multiple Clouds and Data Sources

In addition to synthetic data generation, AWS Clean Rooms now supports Snowflake and Amazon Athena as data sources within Clean Rooms collaborations.

 

This enhancement enables organizations to collaborate securely even when partner data resides outside Amazon S3. Data no longer needs to be copied or migrated, which avoids:

• Compliance risk
• Pipeline maintenance overhead
• Stale or outdated datasets
• Costs associated with data movement

This supports a more seamless collaboration model with zero extract, transform, and load, also referred to as zero-ETL.

Example Use Case

An advertiser with data in Amazon S3 and a media publisher with data in Snowflake can run an audience overlap analysis without sharing raw data or building ETL pipelines. No source data from external locations is permanently stored in AWS Clean Rooms, and any data temporarily read during analysis is deleted upon query completion.

How Clean Rooms Uses Privacy Controls

Clean Rooms collaborations rely on analysis rules, which govern how data can be queried and what outputs are permitted. Data owners choose:

 

• Which columns are accessible
• What types of queries are allowed
• Whether additional analyses are permitted
• Whether queries require manual review

These rules provide precise control over how datasets can be used in a collaboration.

Partner and Customer Feedback

Early adopters highlighted benefits such as:

 

• Privacy-enhanced collaboration across multiple cloud environments
• Improved data interoperability between partners
• Safer joint analytics and personalization workflows

Organizations such as Kinective Media by United Airlines and Snowflake expressed support for the cross-cloud and privacy-preserving features.

Why This Announcement Matters

These new AWS Clean Rooms ML capabilities represent a meaningful advancement for secure data collaboration. By enabling synthetic data generation and supporting multiple cloud sources, AWS is helping organizations:

 

• Use fresher and more complete datasets
• Reduce compliance and data-handling burdens
• Avoid moving or exposing sensitive records
• Build more accurate joint ML models
• Scale partnerships across diverse technology stacks

For companies exploring collaborative analytics or ML while maintaining strict privacy boundaries, this update broadens what is possible.

Forged Concepts

Explore expert cloud, AWS, and DevOps insights by forged Concepts, a trusted AWS MSP

View All Posts