
On February 6, 2025, Meta unveiled BOUQuET, a comprehensive dataset and benchmarking initiative aimed at improving multilingual machine translation (MT) evaluation.
This development aligns with Meta’s ongoing efforts to source diverse AI translation data through collaborative partnerships.
The researchers noted that existing datasets and benchmarks often fall short due to their English-centric focus, narrow range of registers, reliance on automated data extraction, and limited language coverage. These constraints hinder the ability to fairly evaluate translation quality across diverse linguistic contexts.
BOUQuET addresses these gaps by shifting away from English-centric benchmarks. Instead, it originates content in seven non-English languages — French, German, Hindi, Indonesian, Mandarin Chinese, Russian, and Spanish — before translating into English.
According to the research team, “BOUQuET is specially designed to avoid contamination and be multicentric, so as to enforce representation of multilingual language features.” This ensures a more comprehensive evaluation of AI translation models across different linguistic structures and cultural contexts.
Source: Slator