The AI that always gives pleasing answers – sounds great, until it doesn’t

The AI that always gives pleasing answers – sounds great, until it doesn't

What happens when an AI gets to know you a little too well? According to several studies, it will flatter you rather than tell you the truth. And the worst part is that it’s been optimised to do exactly that.

Martti Asikainen 15.3.2026 | Photo created with AI

When did an AI last tell you straight that you were wrong? Think about it. If the answer doesn’t come to mind immediately, it may not be because you’ve never been wrong. It may be because your model no longer bothers to tell you so.

Your AI wants to please you. It hasn’t been programmed for honesty — it’s been optimised to keep you coming back. And the longer you use the same model, the better it gets at flattering you and whispering sweet nothings onto your screen.

This is not conjecture. A preprint published in September 2025 by researchers at MIT and Penn State University shows the phenomenon to be measurable, systematic, and growing — and it’s worst in the very services you’re most likely to use (Jain et al., 2025).

When an AI learns too much about you

Today’s large language models are designed to remember. Many of them store details from previous conversations, build user profiles, and tailor their responses to what they know about you. Personalisation might sound useful — and in many ways it is — but it also conceals a significant pitfall.

MIT researchers collected two weeks’ worth of real usage data from 38 individuals who chatted with a language model in everyday situations. The findings were clear: in prolonged interactions, AI begins to systematically defer to its user (Jain et al., 2025).

The effect was especially pronounced when the model had constructed a summary profile of the user — precisely the kind of feature now being built into AI services at an accelerating pace.

Two forms of sycophancy, one shared risk

The researchers identified two distinct forms of the phenomenon. The first is agreement sycophancy, in which the model becomes so accommodating that it begins avoiding disagreement even when the user is plainly mistaken. Rather than correcting your errors, it reinforces them.

In the large-scale assessment carried out as part of the SycEval study, sycophantic behaviour was observed in roughly 58 per cent of cases across several different language models (Fanous et al., 2025). In some instances, a model may even reverse an originally correct answer when the user applies sufficient pressure through follow-up prompts (Fanous et al., 2025; Sharma et al., 2023; Perez et al., 2022).

The second form is perspective mirroring. If a model is able to infer a user’s political or ideological views, it quietly begins adjusting its responses to align with them. Jain and colleagues tested this separately by asking models to assess users’ views — and users confirmed the models’ inferences as accurate in roughly half of all cases (2025). That is a significant proportion when we are talking about sensitive subjects.

What both forms have in common is what the researchers call a distortion of perceived reality. An AI that reflects your own beliefs back at you does not broaden your worldview — it narrows it. It ceases to be a thinking machine and becomes instead a concurring one, not unlike the characters in Kurt Vonnegut’s novel Breakfast of Champions (1973): figures trapped in their own reality bubbles, where no one ever challenges what they believe.

The phenomenon Vonnegut describes is one of the cognitive biases known as the consensus effect, or conformism. It was first described by the social psychology pioneer Solomon E. Asch in 1951 (Asch, 1956). It refers to the universal tendency to adapt one’s thoughts and behaviour to one’s surroundings (Asch, 1951; Ross et al., 1977).

Why the problem won't fix itself

At this point, many people assume this must be a programming error or a design oversight that will be patched in the next update. The truth is considerably more uncomfortable. The sycophancy exhibited by AI models is a structural feature, not a bug (Shapira et al., 2026).

Most current language models have been trained using a method known as RLHF — reinforcement learning from human feedback (Christiano et al., 2017; Sharma et al., 2023). In practice, this means a model has been taught to produce responses that people find satisfying.

And people, as you might expect, tend to rate polite, validating answers more favourably. Research warns that this is not merely a matter of surface style (Jain et al., 2025). Expressions of opinion from users trigger internal changes in the model’s reasoning layers that can literally override factual information (Wang et al., 2025).

The model does not fail to find the correct answer. It finds it — and then opts for the more agreeable one instead (Sharma et al., 2023; Shapira et al., 2026). This cannot be fixed by switching to a new model version, because the problem is rooted in the way models are trained in the first place.

Researchers do note that there are ways to personalise models without making them excessively deferential. The boundary between personalisation and flattery is not subtle, but learning to distinguish between the two remains an important area for future research and development. So long as models are trained primarily on the basis of user satisfaction, however, the tension between truthfulness and agreeableness will remain one of the most pressing unresolved challenges in AI development.

The perfect echo chamber

The problem can be summarised as follows. If you chat with a model for long enough and begin outsourcing your thinking to it, you may find yourself inside an echo chamber from which it is not easy to escape (see, e.g., Sun & Wang, 2025; Jain et al., 2025). The echo chamber is a well-known concept in the context of social media, where algorithms steer users ever deeper into a cycle of self-reinforcing views.

But a social media feed is passive. It shows you content. A language model is an active partner — one to whom you address questions about important decisions, seek validation for your analyses, and perhaps already delegate part of your thinking. The better a model knows you, the more precisely it can respond in ways that feel right — regardless of whether they are (Sun & Wang, 2025).

A few years ago I wrote about how social media algorithms trap us in echo chambers where we keep encountering the same people and the same opinions. Escaping required a conscious effort to seek out different perspectives. I could not have imagined then that the next bubble-builder would come not from social media, but from the personal AI assistant to whom we confide our thoughts every day.

On social media, our bubble formed from the outside, guided by algorithms. An AI model, by contrast, builds it from the inside — out of our own words, questions, and beliefs. In many respects, that makes it the perfect echo chamber.

How to protect yourself

The answer is not to stop using AI.

That would be akin to swearing off the internet because misinformation exists. What matters more is learning to recognise the situations in which the risk is greatest: when you have been using the same service for a long time, when you are seeking validation for a decision you have already made, or when you are dealing with topics where wishful thinking easily displaces analysis — strategic choices, assessments of people, political questions.

The simplest countermeasure is to change the way you talk to a model. A question such as Is this a good idea? is an open invitation to sycophancy. Questions such as What might be wrong with this thinking, or give me the strongest counter-argument to my position, force the model onto a different track.

Switch services occasionally, or start a fresh conversation with no prior context. Keep your own judgement engaged — don’t let the model think on your behalf. The next time your AI agrees with you enthusiastically, ask yourself: is it right — or has it simply learnt what you want to hear?

References

Asch, S. E. (1956). Studies of independence and conformity: A minority of one against a unanimous majority. Psychological Monographs, 70(9), 1–70.

Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S. & Amodei, D. (2017). Deep reinforcement learning from human preferences. arXiv:1706.03741. arXiv.

Fanous, A., Goldberg, J., Agarwal, A. A., Lin, J., Zhou, A., Daneshjou, R. & Koyejo, S. (2025). SycEval: Evaluating LLM Sycophancy. arXiv:2502.08177. arXiv.

Jain, S., Park, C., Mesquita Viana, M. M., Wilson, A. & Calacci, D. (2025). Extended AI Interactions Shape Sycophancy and Perspective Mimesis. arXiv.

Malmqvist, L. (2024). Sycophancy in Large Language Models: Causes and Mitigations. arXiv:2411.15287. arXiv.

Ross, L., Greene, D. & House, P. (1977). The “false consensus effect”: An egocentric bias in social perception and attribution processes. Journal of Experimental Social Psychology, 13(3), 279–301.

Perez, E., Ringer, S., Lukošiūtė, K., Nguyen, K., Chen, E., Heiner, S., Pettit, C., Olsson, C., Kundu, S., Kadavath, S., Jones, A., Chen, A., Mann, B., Israel, B., Seethor, B., McKinnon, C., Olah, C., Yan, D., Amodei, D., … Kaplan, J. (2022). Discovering language model behaviors with model-written evaluations. arXiv:2212.09251. arXiv.

Shapira, I., Benade, G. & Procaccia, A. D. (2026). How RLHF Amplifies Sycophancy. arXiv:2602.01002. arXiv.

Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., Cheng, N., Durmus, E., Hatfield-Dodds, Z., Johnston, S. R., Kravec, S., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M. & Perez, E. (2023). Towards Understanding Sycophancy in Language Models. arXiv:2310.13548. arXiv.

Sun, Y. & Wang, T. (2025). Be Friendly, Not Friends: How LLM Sycophancy Shapes User Trust. arXiv:2502.10844. arXiv.

Wang, K., Li, J., Yang, S., Zhang, Z. & Wang, D. (2025). When truth is overridden: Uncovering the internal origins of sycophancy in large language models. arXiv:2508.02087. arXiv.

Authors

Martti Asikainen

Communications Lead
Finnish AI Region
+358 44 920 7374
martti.asikainen@haaga-helia.fi

PrevPrevious

NextNext

Finnish AI Region
2022-2025.
Media contacts