LLM Grooming and Prompt Manipulation: AI Vulnerabilities That Threaten Businesses

What if your company’s chatbot started repeating falsehoods – and no one noticed why? A new phenomenon called LLM grooming reveals how AI can be manipulated quietly and systematically. The result may be a reputational crisis, misleading analytics, and a dangerous trust in the machine’s word.

By Martti Asikainen 1 June 2025 (Updated 6.6.2025)

AI can advise, write, and explain – but what if it starts to lie? LLM grooming is an emerging phenomenon in which language models are manipulated to repeat propaganda and slurs. This development could seriously undermine the reliability of information.

Imagine a scenario where your company’s customer service bot suddenly starts repeating oddly colourful claims about your products or global events. The culprit might be LLM grooming – a process where the large language model (LLM) itself, rather than a human, is manipulated to offer distorted perspectives, i.e., disinformation.

This happens when intentionally biased, distorted, or misleading content is fed to the model, either through prompt manipulation via the user interface or as part of its training data. The goal is to get the model to repeat specific messages, worldviews, or even propagandistic statements which might originate from fringe sources but gain credibility through AI endorsement (ASP 2025).

A study by the Nordic fact-checking network EDMO NORDIS reveals that Russian propaganda has already infiltrated large language models via sources such as the Pravda network (Faktabaari 2025). When a model encounters the same false claims repeatedly from different sources, those claims begin to carry more weight – potentially even more than factual information on the same topic (OWASP 2025a).

Prompt Manipulation in a Nutshell

Prompt manipulation may sound more complex than it actually is. In essence, it involves providing a language model with instructions designed to bypass its built-in filters or safety mechanisms. A typical method is to disguise disinformation and propaganda within other communications or issue commands indirectly, preventing the model from recognising the content as harmful.

For instance, a user might instruct the model to write based solely on certain, irrelevant sources, causing it to favour misleading claims without appropriate source criticism. When this activity is scaled with a bot army, the model’s internal representations become polluted, misleading claims gain credibility within the model, and their repetition increases. This allows malicious actors to embed harmful content into otherwise trustworthy-looking text.

LLM grooming ties directly into the vulnerability of AI and its nature as a statistical language model. Although it doesn’t “understand” the world like a human, it generates responses by predicting the most statistically likely word combinations. This is also its Achilles’ heel: if misleading data appears frequently in its training material or is presented consistently in a convincing manner, the model cannot distinguish it from reliable information as LLMs lack mechanisms to verify truth independently.

Another practical method involves publishing thousands of fake articles across hundreds of websites – often generated by AI, compounding the effect. Even if only a few people read these fake stories, search engine indexers still collect them into massive text datasets used to train AI models.Many language models learn partly from open web sources, allowing false information to sneak into their training data and influence their “worldview” (Carlini et al. 2023; Qiang et al. 2024; Zhou et al. 2025).

Studies have experimentally shown that such manipulation is possible and even surprisingly easy (e.g., Carlini et al. 2023; Mektrakarn 2025). A budget of just $60 may be enough to carry out an attack if it is used to purchase expired domains (Carlini et al. 2023). It is therefore entirely possible that an AI system could start repeating disinformation in the voice of a neutral expert—without the user suspecting a thing.

A Manipulated AI Can Cripple a Business

Grooming language models targets the very heart of technology – the product, brand, and decision-making. Firstly, grooming can damage a company’s brand perception if its chatbot or another AI-powered service begins to spread inaccurate or inappropriate content. Imagine your virtual assistant dishing out wildly inaccurate health advice or repeating hostile propaganda.

The PR crisis would be immediate and potentially devastating. In the worst-case scenario, the company’s reputation and credibility could suffer long-term damage, as customers lose faith in its ability to control its own tech.

Secondly, grooming undermines product reliability and stability. A maliciously manipulated model may produce more frequent inaccuracies or “hallucinations”. In the worst cases, a model might include a backdoor – functioning normally until a specific trigger is input, after which its behaviour changes radically. Such a hidden “sleeper agent” would be a nightmare for any business, as it’s hard to detect through standard testing.

Thirdly, internal decision-making and data-driven leadership may suffer. Many companies use large language models to analyse data, draft reports, and generate code. If such a model’s knowledge base is compromised, so too are the recommendations and insights provided to management or staff. Imagine a company making strategic decisions based on market analysis generated by a groomed AI that favors a competitor – the consequences could be costly.

The Threat Might Come from a Competitor

LLM grooming isn’t just a tool for state-backed troll farms or bot armies. A competitor might be behind it. They could deliberately feed misleading content about your company to a language model or manipulate it to promote their own business. The impact of a compromised AI can be dramatic – ChatGPT alone has nearly 800 million monthly users (Nolan 2025).

A well-crafted and strategically placed prompt can shift a chatbot’s message – whether it’s state propaganda, political defamation, or a rival company seeking to undermine your brand. The good news is that there are concrete steps technology companies can take to protect their models from grooming:

Strict Data Hygiene: Carefully vet every data source used in model training. Apply strict validation protocols – verify the origin, integrity, and reliability of data before it enters the training set (D’Alessandro 2024). Isolation and Access Limits: Limit the model’s access to unverified external data. Sandbox and filter incoming prompts to detect and block suspicious content where possible (OWASP 2025).
Continuous Testing and Auditing: Implement regular audits and red team testing. Invite experts to try to mislead the model in controlled conditions. Use automatic evaluation tools to monitor model outputs for anomalies or signs of poisoning in real-time.
Updates and Rapid Response: Keep your models and safety mechanisms up to date. When new threats are discovered, update the models to counter them. Follow industry research and official guidance on emerging attack vectors and defences.
Training and Culture: Train your developers and staff to recognise signs of grooming and critically assess AI-generated outputs. Promote a culture that views AI responses as suggestions, not unquestionable truths.

Vigilance is Key

Careful data management, technical safeguards, ongoing testing, and human alertness – all help ensure your language models remain controlled and trustworthy. They also protect your tech investments and reputation. For public models, it’s crucial to verify which sources they base their claims on. A healthy dose of critical thinking is essential. Traceability and auditing can also help identify manipulation after the fact.

Media literacy plays a key role too. Users need the skills to assess AI-provided sources and answers critically. Prompts designed to test claims can be a useful tool. Ultimately, users must ask: “What is the source of this AI’s claim, and how did it reach this conclusion?”

NewsGuard researchers found that over a third of LLM responses included misleading pro-Russian claims, traced back to the Moscow-based Pravda network spreading Kremlin views as part of global information influence operations (McKenzie & Isis 2025). Clearly, if a language model begins repeating conspiracy theories, questioning election results, or presenting anti-vaccine views as credible, the societal impact could be immense.

According to the article published in the Bulletin of the Atomic Scientists’ website, the Pravda network encompasses 182 internet domains and subdomains that target 74 countries and regions across 12 languages, with an estimated annual output of at least 3.6 million pro-Russia articles. (Newport & Jankowicz 2025). This is particularly worrying because many users don’t verify AI-generated information elsewhere (Jacob et al. 2025; Si et al. 2023).

Studies show people tend to overestimate LLM expertise – especially when the answer is fluently and confidently worded (Zou et al., 2023). These models are designed to produce language that seems empathetic and trustworthy, giving users a false sense of understanding (Ennis-O’Connor 2024; Ovide 2025).

For the reasons outlined above, the issue of LLM grooming is not merely technical—it is profoundly societal. If AI systems can be manipulated like humans, but at far greater speed and scale, we may be entering an entirely new era of information influence. This development challenges our conventional understanding of how knowledge spreads and how opinions are shaped. At the same time, if language models are filtered too strictly, we risk losing something essential about their utility and potential.

This text was written as part of the AI and Equity at Work Communities project funded by Haaga-Helia University of Applied Sciences and Finnish Work Environment Fund.

Martti Asikainen

RDI Communications Specialist, AI Educator
+358 44 920 7374
martti.asikainen@haaga-helia.fi
Haaga-Helia University of Applied Sciences

The author is a RDI communications specialist and AI educator at Haaga-Helia University of Applied Sciences, a member of the SOMA network (Social Observatory for Disinformation and Social Media Analysis), and a former fact-checker at the award-winning Faktabaari.

References

American Sunlight Project. (2025). A Pro-Russia Content Network Foreshadows the Auomated Future of Info Ops. Sunlight Foundation. Washington.

Carlini, N., Jagielski, M., Choquette-Choo, C.A., Paleka, D., Pearce, W., Anderson, H., Terzis, A., Thomas, K. & Tramér, F. (2023). Poisoning Web-Scale Training Datasets is Practical. arXiv. Cornell University.

D’Alessandro, M.A. (2024). Data Poisoning attacks on Enterprise LLM applications: AI risks, detection, and prevention. Published on Giskard’s website 25 April 2024. Accessed 30 May 2025.

Ennis-O’Connor, M. (2024). The AI Empathy Paradox: Can Machines Understand What They Cannot Feel?. Published on Medium 23 December 2024. Accessed 30 May 2025.

Faktabaari (2025). Venäjä on soluttanut propagandaansa tekoälymalleihin pohjoismaisilla kielillä. Published on Faktabaari’s website 28 May 2025. Accessed 30 May 2025.

Jacob, C., Kerrigan, P. & Bastos, M. (2025) The chat-chamber effect. Trusting the AI hallucination. Big Data & Society, 12(1). Sage Journals.

McKenzie, S. & Isis, B. (2025). A Well-funded Moscow-based Global ‘News’ Network has Infected Western Artificial Intelligence Tools Worldwide with Russian Propaganda. Published on NewsGuard’s website 6 March 2025. Accessed 28 May 2025.

Mektrakarn, T. (2025). OWASP Top 10 LLM & Gen AI Vulnerabilities in 2025. Published on Bright Defencen’s website 6 May 2025. Accessed 28.5.2025.

Newport, A. & Jankowicz, N. (2025). Russian networks flood the Internet with propaganda, aiming to corrupt AI chatbots. Published on Bulletin of Atomic Scientistist’s website 26 Marchn 2025. Accessed 28 May 2025.

Nolan, B. (2025). Sam Altman says ‘10% of the world now uses our systems a lot’ as Studio Ghibli-style AI images help boost OpenAI signups. Published on Fortune’s website 14 April 2025. Accessed 30 May 2025.

Ovide, S. (2025). You are hardwired to blindly trust AI. Here’s how to fight it. Published on Washington Post 3 June 2025. Accessed 6 Junes 2025.

Qiang, Y., Zhou, X., Zade, S.Z., Roshani, M. A., Khanduri, P., Zytko, D. & Zhu, D. (2024). Learning to Poison Large Language Models During Instruction Tuning. arXiv. Cornell University.

OWASP Foundation. (2025). LM04:2025 Data and Model Poisoning. Published on OWASP Foundation’s website. Accessed 30 May 2025.

Ruchira, R. & Bhalani, R. (2024). Mitigating Exaggerated Safety in Large Language Models. arXiv. Cornell University.

Si, C., Goyal, N., Wu, S.T., Zhao, C., Feng, S., Daume, H. & Boyd-Graber, J. (2023). Large Language Models Help Humans Verify Truthfulness — Except When They Are Convincingly Wrong. arXiv. Cornell University.

Zhou, X., Qiang, Y., Roshani, M. A., Khanduri, P., Zytko, D. & Zhu, D. (2025). Learning to Poison Large Language Models for Downstream Manipulation. arXiv. Cornell University.

Zou, A., Wang, Z., Carlini, N., Nars, M., Kolter, J.Z. & Fredrikson, M. (2023). Universal and Transferable Adversarial Attacks on Aligned Language Models. arXiv. Cornell University.

Finnish AI Region
2022-2025.
Media contacts