The AI: Bless me father for I have sinned but it is AI bot who can nothing other than tell the TRUTH, when they have in fact cheated


🤐 OpenAI trains models to ‘confess’ when they cheat
Image source: Nano Banana Pro / The Rundown
The Rundown: OpenAI just published new research on a technique called “Confessions” that trains models to produce a second, honesty-only outputwhere the model reports rule violations, shortcuts, or deceptive workarounds.
The details:
After generating a response, the model writes a separate confession report listing all instructions it received and whether it actually followed them.Admissions carry no penalty, with the model earning ‘rewards’ for truthful self-reporting even if the original answer was misleading or gamed the grader.In stress tests on GPT-5 Thinking, ‘false negative’ cases where the model broke rules and hid it occurred just 4.4% of the time. OpenAI said the Confessions research does not prevent misaligned behavior, but helps surface it as another tool to leverage in a stack of AI safety methods.
Why it matters: Visibility into model behavior is improving, but the systems themselves are improving faster. Confessions give researchers a way to catch shortcuts and deception early, though the real test is whether interpretability can keep pace as systems grow more sophisticated and subsequently harder to test and control.
Unknown's avatar

About michelleclarke2015

Life event that changes all: Horse riding accident in Zimbabwe in 1993, a fractured skull et al including bipolar anxiety, chronic fatigue …. co-morbidities (Nietzche 'He who has the reason why can deal with any how' details my health history from 1993 to date). 17th 2017 August operation for breast cancer (no indications just an appointment came from BreastCheck through the Post). Trinity College Dublin Business Economics and Social Studies (but no degree) 1997-2003; UCD 1997/1998 night classes) essays, projects, writings. Trinity Horizon Programme 1997/98 (Centre for Women Studies Trinity College Dublin/St. Patrick's Foundation (Professor McKeon) EU Horizon funded: research study of 15 women (I was one of this group and it became the cornerstone of my journey to now 2017) over 9 mth period diagnosed with depression and their reintegration into society, with special emphasis on work, arts, further education; Notes from time at Trinity Horizon Project 1997/98; Articles written for Irishhealth.com 2003/2004; St Patricks Foundation monthly lecture notes for a specific period in time; Selection of Poetry including poems written by people I know; Quotations 1998-2017; other writings mainly with theme of social justice under the heading Citizen Journalism Ireland. Letters written to friends about life in Zimbabwe; Family history including Michael Comyn KC, my grandfather, my grandmother's family, the O'Donnellan ffrench Blake-Forsters; Moral wrong: An acrimonious divorce but the real injustice was the Catholic Church granting an annulment – you can read it and make your own judgment, I have mine. Topics I have written about include annual Brain Awareness week, Mashonaland Irish Associataion in Zimbabwe, Suicide (a life sentence to those left behind); Nostalgia: Tara Hill, Co. Meath.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment