INTERVIEW: If you put Randy Andy and Fergie on the stand they'd convict themselves
Is Boris Johnson the third man in a threesome with Jeffrey Epstein and Ghislaine Maxwell? What is wrong with these freaks? asks investigator Shaun Attwood.
Harvard’s Lauren Williams, a MacArthur ‘genius,’ joins international effort to challenge notions of AI supremacy
Have reports of AI replacing mathematicians been greatly exaggerated?
Artificial intelligence has attained an impressive series of feats — solving problems from the International Math Olympiad, conducting encyclopedic surveys of academic literature, and even finding solutions to some longstanding research questions. Yet these systems largely remain unable to match top experts in the conceptual frontiers of research math.
Now a Harvard professor and other world-renowned mathematicians have launched a grand experiment to more clearly define the boundary between artificial and human intelligence. These scholars have challenged AI companies to crack a series of tough problems that the mathematicians themselves recently have solved but kept under wraps. The effort seeks to answer a key question: Where has AI achieved mastery and where does human intelligence still reign supreme?
“This is a tricky question to answer because the capabilities of AI are improving all the time,” said Lauren Williams, Dwight Parker Robinson Professor of Mathematics at Harvard, who recently won a genius grant from the MacArthur Foundation. “But, at least at the moment, AI is not so good at making a creative leap and solving problems far outside the kinds of problems that already have been solved.”
Williams is among a team of 11 mathematicians — including a Fields medalist and two MacArthur geniuses — who are organizing First Proof.The project seeks to create a more objective methodology for evaluating the ability of AI systems to solve research math questions.
But not all efforts have been so successful. One recent analysis showed that large language models (LLMs) managed to solve a small fraction of research-level math problems, but were prone to logical errors, fundamental misconceptions, and hallucinations of existing results. Some researchers have concluded that AI tools currently are most useful for assisting with grunt work — such as literature reviews — but not solving big research problems autonomously.
The First Proof project was initiated by Mohammed Abouzaid, a professor of mathematics at Stanford University. He said many of the highly touted demonstrations of AI capabilities in math “did not really reflect my experience as a mathematician.”
Tech companies, he said, tend to focus on results that they can measure with automated, scalable systems. They often recast research questions into forms that can be answered by current technologies — but not necessarily the approaches research mathematicians would take. In addition, much of the research has been conducted by parties with vested interests.
So the team of mathematicians — from institutions including Harvard, Columbia, Duke, Yale, UC Berkeley, and the University of Texas at Austin — decided it was time for an independent evaluation.In December, they met in Berkeley to assemble research problems that they had recently solved but not yet published. Their 10 problems represent a diverse span of mathematics including number theory, algebraic combinatorics, spectral graph theory, symplectic topology, and numerical linear algebra.
The solutions — each no more than about five pages — have been encrypted and stored within a secure depository. The authors publicly unveiled the problems on Feb. 5 and will reveal the solutions on Feb. 13.
Experts will compare the proofs produced by mathematicians against those produced by AI (problems may be solved in more than one way). The organizers plan to issue another set of problems later this year.
In preliminary tests with GPT 5.2 Pro and Gemini 3.0 Deepthink, the authors reported that, “The best publicly available AI systems struggle to answer many of our questions.” Abouzaid said the AI models solved two of the 10 problems in preliminary tests. “We are already learning a lot by seeing which of our 10 questions it can answer,” he said.
In playing with AI tools, Williams found that they seemed superficially useful but became unreliable at deeper levels.
“Whenever I’ve asked AI a question about which I know very little, the answer generally appears helpful and informative,” she said. “But as I ask questions closer to my own expertise, I’ll start seeing mistakes. If I ask questions close to things I’m working on, it will sometimes hallucinate and start telling me, ‘Oh, the answer to that question is in this paper that I wrote’ — except it’s not a paper that I wrote. Sometimes it will invent references, and the only reason I know they’re not real is because they said I was the author — and I never wrote such a paper.”
Williams said AI sometimes distorted her query. Instead of answering her original question, it shifted to another question that could be answered from the existing literature.
“It can be quite good at mimicking things that have been done before, or putting together some known results to get to a statement that follows from them,” said Williams. “If it’s something algorithmic, it’s excellent.”
But those questions do not represent the forefront of the field.
Typically, research math involves three phases: coming up with a good question, developing a framework for attacking the problem, and solving it. The first two steps remain beyond the reach of AI, so the challenge aims to test only the final one — finding solutions to already-defined problems.
Another co-author, Martin Hairer, professor of pure mathematics at EPFL in Switzerland and Imperial College London and winner of the 2014 Fields Medal, said the group sought to “to push back a bit on the narrative that ‘math has been solved’ just because some LLM managed to solve a bunch of Math Olympiad problems.”
“As of now, this idea of mathematicians being replaced by AI is complete nonsense in my opinion,” said Hairer. “Maybe this will change in the future, but I find it hard to believe that the type of models we’re seeing at the moment will suddenly start producing genuinely new insights.”
Less like a picture, more like a video game? Cognitive scientist explains how we ‘see’ what isn’t real.
Sy Boles
Harvard Staff Writer
February 6, 2026 5 min read
Imagine this: A person walks into a room and knocks a ball off a table.
Did you imagine the gender of the person? The color of the ball? The position of the person relative to the ball?
Yes and no, says cognitive scientist Tomer Ullman, the Morris Kahn Associate Professor of Psychology, who with Halely Balaban recently published a paper titled “The Capacity Limits of Moving Objects in the Imagination.” If you’re like most people, you probably thought about some of these things, but not others. People build mental imagery hierarchically, starting with the ideas of “person,” “room,” “ball,” and “table,” then placing them in relation to one another in space, and only later filling in details like color.
“Our imaginations are actually patchwork and fuzzy and not filled in,” he said. His theory: Your mind’s eye might be lazier than you think. But that’s not necessarily a bad thing. “You leave things out until you need them.”
For the latest installment of “One Word Answer,” we asked Ullman to elaborate further on the current scientific thinking behind “imagination.”
Some people very much associate imagination with the visual or the sensory: I’m creating a scene in my head that isn’t real right now.
But there’s a different view of the imagination that’s more conceptual.When I’m imagining, say, three leprechauns jumping on a trampoline, I might not create the image in my head, but I understand the sentence. I’m creating a thought that I know is not true, but I’m considering it anyway. In that sense, imagination is very related to ideas of pretense and play.
The argument about the role of imagery in imagination is a very old one. As far back as Plato — though he wasn’t the first to make this argument — some people have argued that imagination is like stamping a piece of clay. We perceive something and stamp the image into our minds, and we can then call up that image later, basically just as we originally saw it. While that’s an outdated way of thinking about things, it’s surprising how much parts of it have held up.
I take inspiration from a notion in computer science called lazy evaluation — lazy, not in a negative sense, but more in terms of economy of resources. I can conceptualize the calculation 17 times 63 in my head, and if you need me to figure it out, I can, but as long as you don’t need me to evaluate it, I won’t. It’s very context-specific. The same is true for our imaginations. For example, I ask you to imagine a strawberry. What color was it? You’ll probably tell me it was red. But did you really see the red strawberry? Maybe not immediately — you hadn’t gotten to that part of the rendering yet. You just know that strawberries are red.
It helps to think about people who create video games or animated films. They create in a very hierarchical way. Say the scene is going to be a hedgehog rolling down a hill. They don’t start with coloring in the individual pixels of the hedgehog; they start with the spatial placements and the movement of the objects. Then at the end they can render it and fill in the pixels. And they can render it multiple times, changing the colors and so forth as desired, all built off the same base model.
The debate around what happens in your head when you imagine stuff is very hot right now when it comes to aphantasia, a neurological difference where people don’t seem to experience visual images. It took a while for researchers to agree that this was a real thing, but it is.
You might imagine that aphantasia presents a challenge for some cognitive activities, but it turns out that people with aphantasia are perfectly able to complete any number of cognitive tasks that seem very visual, like rotating an object 180 degrees in their mind or counting the number of windows in their house.
So how can that be? Halely Balaban and I suggest that aphantasia is a broken rendering operation — the underlying scene is there, but the last-mile rendering isn’t happening.
Let’s play out one of those examples. If I ask you to count the windows in your house, you’ll likely see, in your mind’s eye, a kind of video-game visualization of your home, and you’ll pan the camera around and count. Many people report some version of that. But when you ask people with aphantasia that question, they pause for a bit, and you ask, “What are you doing?”
“I’m counting the windows.”
“Do you see anything?”
“No, just give me a second.”
The task of counting the windows doesn’t require visualizing. It’s actually us normies without aphantasia who are fooling ourselves. We see these images and get very taken in by them. We think they play a causal role in our thinking, but in fact, it’s the nonvisual framework underneath the image that does the real work.
Terrestrial data centers are so 2025. We’re taking our large-scale compute infrastructure into orbit, baby! Or at least, that’s what Big Tech is yelling from the rooftops at the moment. It’s quite a bonkers idea that’s hoovering up money and mindspace, so let’s unpack what it’s all about – and whether it’s even grounded in reality.
Let’s start with the basics. You might already know that a data center is essentially a large warehouse filled with thousands of servers that run 24/7.
AI companies like Anthropic, OpenAI, and Google use data centers in two main ways:
Training AI models – This is incredibly compute-intensive. Training a model like the ones powering OpenAI’s ChatGPT or Anthropic’s Claude required running calculations across thousands of specialized chips (GPUs) simultaneously for weeks or months.
Running AI services – When you converse with those models’ chatbots, your messages go to a data center where servers process it and send back the model’s response. Multiply that by millions of users having conversations simultaneously, and you need enormous computing power ready on-demand.
AI companies need data centers because they provide the coordinated power of thousands of machines working in tandem on these functions, plus the infrastructure to keep them running reliably around the clock.
To that end, these facilities are always online with ultra-fast internet connections, and they have vast cooling systems to keep those servers running at peak performance levels. All this requires a lot of power, which puts a strain on the grid and squeezes local resources.
So what’s this noise about data centers in space? The idea’s been bandied about for a while now as a vastly better alternative that can harness infinitely abundant solar energy and radiative cooling hundreds of miles above the ground in low Earth orbit.
Powerful GPU-equipped servers would be contained in satellites, and they’d move through space together in constellations, beaming data back and forth as they travel around the Earth from pole to pole in the sun-synchronous orbit.
The thinking behind space data centers is that it’ll allow operators to scale up compute resources far more easily than on Earth. Up there, there aren’t any constraints of easily available power, real estate, and fresh water supplies needed for cooling.
Starcloud has already begun running and training a large language model in space, so it can speak Shakespearean English
According to Musk’s calculations, it should be possible to increase the number of rocket launches and the data center satellites they can carry. “There is a path to launching 1 TW/year (1 terawatt of compute power per year) from Earth,” he noted in a memo, adding that AI compute resources will be cheaper to generate in space than on the ground within three years from now.
As Elon Musk’s SpaceX and xAI are set to merge, he’s keen on increasing the number of satellite-carrying rocket launches per year to serve the need for space data centers
You’ve also got to dissipate heat from the space-based data centers, and have astronauts maintain them periodically. And that’s to say nothing about how these satellites will affect the work of astronomers or potentially increase light pollution.
Ultimately, there’s a lot of experimentation and learning to be gleaned from these early efforts to build out compute resources in space before any company or national agency can realistically scale them up.
And while it might eventually become possible to do so despite substantial difficulties, it’s worth asking ourselves whether AI is actually on track to benefit humanity in all the ways we’ve been promised, and whether we need to continually build out infrastructure for it – whether on the ground or way up beyond the atmosphere.View gallery – 3 images
Smell has a powerful connection to memory because the brain’s olfactory system is directly linked to regions responsible for learning and emotion. Research from UC Irvine showed that older adults who inhaled natural scents like rose, orange, or lavender during sleep experienced a 226% improvement in memory and learning tests. Participants were exposed to different fragrances nightly over several months, leading to stronger neural pathways involved in cognition. The findings suggest that scent stimulation during sleep can enhance brain plasticity without drugs or invasive treatments. This highlights a simple, non-invasive method with strong potential to support memory and reduce cognitive decline.
AI promises to make work more productive for lawyers, but there’s a problem: Their clients are using it, too.
Why it matters: The rise of AI is creating new headaches for attorneys: They’re worried about the fate of the billable hour, a reliable profit center for aeons, and are perturbed by clients getting bad legal advice from chatbots.
Zoom in: “It’s like the WebMD effect on steroids,” says Dave Jochnowitz, a partner at the law firm Outten & Golden, referring to how medical websites can give people a misguided understanding of their condition.
“ChatGPT is telling them ‘you got a killer case, ‘ ” Jochnowitz says.
But the models don’t understand the full context, the actual laws that apply or the full history of certain types of cases. They often give clients a false impression of what’s possible, lawyers tell Axios.
The disconnect “makes people not believe what you’re saying,” says S. Randall Hood, a personal injury lawyer and founder of McGowan, Hood, Felder & Phillips in South Carolina.
The big picture: AI is shaking up the legal world, but it’s still early stages.
Firms are just beginning to use AI to more efficiently sort through and draft documents — the latest evolution in legal technology that’s revolutionized the practice going back decades.
Lawyers are also learning quickly — after many were sanctioned by courts for using AI to write documents that turn up with false information, the practice has died down.
“There’s been a lot of experimentation,” particularly around document work, says Aubrey Bishai, chief innovation officer at Vinson & Elkins.
“Staying up all night and copying provisions into a master Excel doesn’t need to happen anymore.”
Bishai says she hasn’t noticed anyone working less, however, just switching their focus to more substantive issues.
State of play: The potential impact of AI on the industry was evident recently when shares of companies that sell legal software, like LegalZoom.com and Thomson Reuters, fell sharply after AI company Anthropic released a new legal product.
Reality check: The market’s reaction was probably a bit over the top, says Elliott Rush, a law professor and economist at ETH Zurich who studies AI.
Anthropic’s new product is good for document analysis, he says, but it’s not a substitute for Westlaw or LexisNexis — it’s not connected to databases of case law and statutes.
“There isn’t yet a new product that’s all of a sudden going to replace a lot of legal software.”
Friction point: As AI makes much of the work more productive, a big source of worry for firm lawyers is the fate of the billable hour.
If tech speeds up tasks, that would mean fewer hours worked. Lawyers can’t bill for more time than they actually spend on a task, the American Bar Association said in recent AI guidance.
Reality check: Don’t expect the legal world to be disrupted quickly. Adoption will take time, observers say.
“There are going to be some big claims about what can be done. A lot of them will turn out to be vaporware,” says J.H. “Rip” Verkerke, a law professor at the University of Virginia.”That’s what a lot of firms have discovered.”