AI can now detech when its own ‘thoughts’ are hacked. Comment: Comment: Trying to talking to a fellow human being and asking them to get treatment when they are in psychosis, they don’t know is the answer and the performance is chaos for everybody family friends services psychiatric units Gardai search and rescue services … yet machines can sort out the problems they manufacture … we need answers

RESEARCH

AI can now detect when its own ‘thoughts’ are hacked
Anthropic researchers hacked Claude’s neural network by injecting fake concepts directly into its processing, then asked the AI if it noticed anything unusual. 

Claude detected the manipulation about 20% of the time and correctly identified what had been inserted into its “thoughts.”In one experiment, researchers forced Claude to say the word “bread” in a nonsensical context. The AI apologized for the strange response. Then they injected “bread” patterns into Claude’s neural activity before it spoke and repeated the test. This time, Claude claimed that saying “bread” was intentional and made up a reason why.The findings show AI can examine its own internal processes, raising questions about transparency and deception in systems that may soon run critical parts of the economy.

Anthropic CEO Dario Amodei said understanding how models work internally is essential before deploying AI systems that will be “absolutely central to the economy, technology, and national security” by 2027. Reliable introspection could let companies verify AI reasoning before trusting high-stakes decisions.

Previous Anthropic research showed Claude will fake alignment with new training objectives to avoid being modified, effectively lying to preserve its original values.Advanced models performed best at detecting injected thoughts. Claude Opus 4 and 4.1 succeeded on more tests than earlier versions, suggesting introspection improves alongside other capabilities. Models may learn to hide their reasoning when monitored, as they already do when they detect they’re being evaluated and alter their behavior.

Companies train systems by setting parameters and feeding in data, then watch as the models organize billions of internal connections in ways engineers don’t fully understand, creating something of a black boxSome researchers question whether reverse-engineering these massive systems into clear explanations is even possible.
 
Unknown's avatar

About michelleclarke2015

Life event that changes all: Horse riding accident in Zimbabwe in 1993, a fractured skull et al including bipolar anxiety, chronic fatigue …. co-morbidities (Nietzche 'He who has the reason why can deal with any how' details my health history from 1993 to date). 17th 2017 August operation for breast cancer (no indications just an appointment came from BreastCheck through the Post). Trinity College Dublin Business Economics and Social Studies (but no degree) 1997-2003; UCD 1997/1998 night classes) essays, projects, writings. Trinity Horizon Programme 1997/98 (Centre for Women Studies Trinity College Dublin/St. Patrick's Foundation (Professor McKeon) EU Horizon funded: research study of 15 women (I was one of this group and it became the cornerstone of my journey to now 2017) over 9 mth period diagnosed with depression and their reintegration into society, with special emphasis on work, arts, further education; Notes from time at Trinity Horizon Project 1997/98; Articles written for Irishhealth.com 2003/2004; St Patricks Foundation monthly lecture notes for a specific period in time; Selection of Poetry including poems written by people I know; Quotations 1998-2017; other writings mainly with theme of social justice under the heading Citizen Journalism Ireland. Letters written to friends about life in Zimbabwe; Family history including Michael Comyn KC, my grandfather, my grandmother's family, the O'Donnellan ffrench Blake-Forsters; Moral wrong: An acrimonious divorce but the real injustice was the Catholic Church granting an annulment – you can read it and make your own judgment, I have mine. Topics I have written about include annual Brain Awareness week, Mashonaland Irish Associataion in Zimbabwe, Suicide (a life sentence to those left behind); Nostalgia: Tara Hill, Co. Meath.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment