EU forces Big Tech to lift the veil on AI training data with a mandatory transparency template

Brussels introduces mandatory transparency template as part of landmark AI Act

software development flowchart, concept for computer programming, machine learning, visual representation of code flow (3d render)

Text: Martti Asikainen, 25.7.2025 Photo: Adobe Stock Photos

Nauha, jossa FAIR:in yhteistyökumppanien logot.

The European Commission has introduced a mandatory transparency template forcing developers of powerful artificial intelligence systems to reveal key details about the data used to train their models, in the latest effort to bring accountability to an industry dominated by a handful of tech giants.

The measure, announced on Wednesday, requires companies behind so-called foundation models – including OpenAI’s GPT-4, Google’s Gemini, and Meta’s Llama – to provide “sufficiently detailed summaries” of their training datasets as part of the EU’s groundbreaking AI Act.

According to the European Comission’s Executive Vice-President for Tech Sovereignty, Security and Democracy Henna Virkkunen, today’s template adopted by the Commission is another important step towards trustworthy and transparent AI. 

“By providing an easy-to-use document, we are supporting providers of general-purpose AI models to comply with the AI Act. This is how we can build trust in AI and unlock its full potential for the benefit of the economy and the society”, she stated.

Industry resistance is expected

Under the new rules, which will come into force when the AI Act becomes operational in 2026, developers of general-purpose AI (GPAI) models must disclose information about dataset provenance, content filtering methods, and copyright protection measures. 

While companies will not be required to release the actual data, they must summarise its contents, structure, sources and relevance – including whether synthetic data or web-scraped material was used.

The move represents the EU’s most ambitious attempt yet to regulate the opaque world of AI development, where training data is typically treated as a closely guarded trade secret. 

The transparency requirements are designed to address mounting concerns about misinformation, algorithmic bias, intellectual property infringement, and the systemic risks posed by unaccountable AI systems.

Industry resistance is expected, with critics arguing that excessive transparency could expose companies to competitive risks or reveal security vulnerabilities. However, EU officials insist the goal is not to stifle innovation but to ensure it serves the public interest, and align it with fundamental rights and societal values

Too big, too fast?

The European approach stands in stark contrast to the more hands-off regulatory stance adopted in the United States, where comprehensive AI legislation has yet to materialise. 

Brussels hopes to replicate the global influence of its General Data Protection Regulation (GDPR), which reshaped privacy laws worldwide after its introduction in 2018.

The new rules are likely to force American tech companies to change how they operate if they wish to maintain access to the lucrative European markets. Or as one policy analyst put it: “If you want to operate in Europe, you play by Europe’s rules.”

The transparency template forms part of the EU’s broader AI Act, the world’s first comprehensive regulation governing artificial intelligence. The legislation takes a risk-based approach, with the most stringent requirements applied to AI systems deemed to pose the highest risks to fundamental rights and safety.

As generative AI continues its rapid expansion across industries and societies, Brussels is betting that mandatory transparency will help avoid the regulatory mistakes of the early internet era, when digital platforms were allowed to grow “too big, too fast, and too unaccountable.”

Stakes are high

The new rules will apply to foundation models that meet certain computational thresholds, ensuring that the largest and most influential AI systems face the strictest oversight. 

Companies will need to demonstrate compliance through detailed documentation that can be scrutinised by regulators and, in some cases, made available to researchers and civil society organisations.

For European policymakers, the stakes could not be higher. With AI systems increasingly shaping everything from job recruitment to criminal justice decisions, ensuring transparency and accountability has become a matter of democratic governance rather than just technical regulation.

The announcement comes as concerns grow about the concentration of AI power in the hands of a few major technology companies, most of them based in the United States or China. 

By asserting regulatory authority over AI development, the EU aims to ensure that European values and interests are reflected in the technologies that will shape the future.

White logo of Finnish AI Region (FAIR EDIH). In is written FAIR - FINNISH AI REGION, EDIH
Euroopan unionin osarahoittama logo

Finnish AI Region
2022-2025.
Media contacts