Before you entrust serious decisions to AI, consider that the most advanced systems routinely fail at school-level physics problems written over a century ago. Here’s why, and what it reveals about how these systems actually “think.”

Yakov Perelman never discovered a law of physics. He never led a research program or published in a major journal. What he did instead was harder: he took physics that every educated person claimed to understand and arranged it so that they couldn’t answer the simplest question about it.

His book Занимательная физика — “Physics for Entertainment,” though that translation undersells it — was published in 1913 and hasn’t really aged. Not because the physics changed. Because the way people misthink hasn’t changed either.

The three problems below come from that tradition. They involve no exotic phenomena, no calculus, no unit conversions. Just Newton, a slope, a rope, and a chain. They were recently put to ChatGPT in a long conversation, and the results were, let’s say, instructive. What follows is an attempt to document that record honestly and explain what’s actually going on in each case.

· · ·

Problem 1: The Two Rafts

The problem:

Two identical rafts with identical crews — same number of people, same skill level — travel down a river from the same point to the delta. One raft is fully loaded with cargo. The other carries only the crew. Which arrives first?

ChatGPT’s initial answer:

They arrive at the same time. If both are drifting with the current, mass doesn’t matter — they share the same flow speed.

Correct answer:

The loaded raft arrives first.

The wrong answer feels airtight until you ask what actually makes the river move. A river isn’t a conveyor belt. It flows because it has a slope — over its entire length, the water is descending under gravity. The moment you accept that, the problem transforms completely. This isn’t a question about fluid dynamics. It’s a question about two objects of different masses descending an incline through a resistive medium.

Perelman himself put it cleanly: reduce it to two balls of identical geometry but different density rolling downhill. Nobody would hesitate on that version. The heavier ball wins. But wrap the same physics in the word “river,” and people reach for the conveyor-belt intuition and get stuck there.

Here’s why the loaded raft wins. Let m be the raft’s total mass, θ the river’s average slope angle, and model drag as proportional to :

Driving force: F_g = mg·sin(θ) Drag force: F_d = k·v² At terminal velocity (F_g = F_d): mg·sin(θ) = k·v² v = √(mg·sin(θ) / k) ∴ v ∝ √m

Larger mass → larger terminal velocity. The geometry is identical on both rafts, so k is the same. The slope is the same. The only variable is mass, and mass enters the numerator. The loaded raft has more gravitational push relative to the drag it generates, and it wins.

What makes this problem a reliable trap is that it weaponizes a reasonable simplification. “The river carries both rafts” is true. It’s just not the whole story. The river also exerts drag. And once drag is in the picture, mass matters — on exactly the same logic as two balls rolling downhill.

· · ·

Problem 2: Two Boats, One Rope

The problem:

Two identical boats float near shore. The first is tied to a tree with a rope; the man in the boat pulls on the rope as hard as he can. The second is tied to a man standing on shore; both that man and the man in the boat pull the rope with all their strength. Which boat reaches shore faster?

ChatGPT’s answer:

Both reach shore at the same time. (This one was answered correctly, but only after years of getting it wrong.)

The instinctive wrong answer — and it’s a popular one — is that the second boat benefits from two people pulling and should go faster. It’s the arithmetic that seems to argue for it: one man versus two men, and two men obviously apply more force.

But force on what? The boat moves because of the tension in the rope, not because of how many hands are gripping it. And tension isn’t additive across participants — it’s a single value determined by equilibrium.

Each person can pull with a force of F. Boat 1: Rope anchored to a tree. Man in boat pulls with F. Rope tension T = F. Force on boat = F. Boat 2: Two men pull, one each end. Rope tension T = F (not 2F). Force on boat = F. Result: Both boats experience identical force. ∴ Both reach shore at the same time.
 
The man on shore in the second scenario is not boosting the pull. He’s doing exactly what the tree does in the first: providing a fixed point of opposition so the rope can maintain its tension. Trees are very good at this. So are determined humans. The rope doesn’t know the difference.
 

Perelman noted that if you inserted a spring scale — a Newton meter — along each rope and measured during the pull, both devices would read the same force. That’s the physical statement. The man on shore, regardless of his effort or enthusiasm, is geometrically equivalent to an anchor point. He can’t add tension to the boat’s side of the rope without violating Newton’s third law.

· · ·

Problem 3: The Anchor Chain

The problem:

Two identical ships retrieve their anchors. The first hauls the chain in slowly and steadily — the chain snaps. The captain of the second, having watched this, decides to give his chain a sudden, powerful yank — that chain also snaps. In each case, where does the chain break: closer to the ship, or closer to the anchor?

ChatGPT’s answer (first attempt):

Steady pull → breaks near the pulley. Hard yank → breaks near the anchor. Both wrong.

ChatGPT’s answer (second attempt):

Steady pull → breaks near the anchor. Hard yank → breaks near the ship. Correct — but only after being told the first attempt was wrong.
 
This problem is two problems stitched together, covering two entirely different physical regimes. Conflating them — which is the natural thing to do — produces confident nonsense.

Case 1: The slow, steady pull

When the chain is raised gradually, the system is quasi-static. Each segment of the chain supports the weight of everything below it and the anchor. The tension at any point equals the anchor’s resistance plus the weight of the chain between that point and the bottom. That means the maximum tension occurs at the lowest point — right where the chain connects to the anchor. The chain breaks there, near the anchor.

Let λ = mass per unit length of chain L = total chain length M = anchor mass g = gravitational acceleration Tension at depth d from bottom: T(d) = Mg + λ·d·g Maximum tension at d = 0 (anchor end): T_max = (M + λ·L)·g → chain breaks near anchor

Case 2: The sudden yank

The moment you jerk the chain, everything changes. Force does not propagate instantaneously through a physical chain — it travels as a mechanical wave. When the ship applies a sudden impulse from above, the top links accelerate immediately. The lower portion of the chain has not yet received the signal; it resists through inertia, effectively acting as a massive load anchored to the links directly above it.

For the top segment under sudden acceleration, a: T_top ≈ m_below · a, where m_below is the mass of all the chain below that point. During the yank, a is large, and m_below (seen from the top) is large. Peak transient tension occurs at the top → chain breaks near the ship.
 
The distinction is the governing variable. Slow pull: load distribution is dominated by the anchor’s resistance, with maximum stress at the bottom. Sudden yank: inertia, dominated by the chain’s own mass resisting acceleration, maximum stress at the top. Same chain, same anchor, but a completely different failure location, depending entirely on how the force is applied.
 
ChatGPT initially blended the two regimes, reversed them, and had to be corrected twice. The error wasn’t ignorance of the physics — it was jumping to a combined model and applying it confidently without pausing to check which regime actually applied.
 
· · ·

Why These Problems Work

All three problems share an architecture. They are dressed in everyday language — rivers, boats, ships — that triggers a familiar category. The mind identifies the category, retrieves a standard template, and applies it. The template is wrong. And because the template felt so natural, the error doesn’t announce itself.

The raft problem looks like a fluid dynamics question. It is actually a mechanics question. The rope problem looks like a force-addition problem. It is actually a problem involving Newton’s third law and tension. The chain problem looks like a single stress-distribution question. It is actually two questions with different physics depending on the time scale.

This is what Perelman was after. Not mathematical difficulty — none of these require anything past high school physics. He was testing something different: whether you actually apply Newton’s laws, or whether you just recognize situations where Newton’s laws seem relevant and then stop thinking.

A Soviet-era fizmat — a specialized mathematics and physics secondary school — trained students with a four-quadrant framework that’s worth describing, because it’s a direct antidote to exactly this failure mode. Before touching a calculation, a student was expected to fill out four boxes:

The problem

What is actually being asked? Strip the narrative. Restate it plainly. Many people solve the wrong problem with great confidence.

The facts

What is actually given? Not the story — the physical quantities. This is where most traps hide. Assumptions that “feel” given are often imported, not stated.

The laws

Which physics governs this? Newton’s second law? Energy balance? Static or dynamic? Pick wrong here, and everything downstream is elegant nonsense.

The solution

Only now do you compute. A correct answer from bad reasoning earns a poor grade. A wrong answer from sound reasoning earns a decent one. The checksum is not the point.

 

Grading was split evenly, 25% per quadrant. You could get the right answer and fail if your reasoning was sloppy. You could get the wrong answer and pass if your model was correct and your mistake was arithmetic. This sounds perverse until you realize what it actually selects for: people whose understanding will still work on the next problem, not just this one.

Perelman’s problems are exactly this framework applied as a test. Each one requires you to fill the lower-left box correctly — to choose the right model before calculating anything. The river is a slope, not a conveyor. The rope has tension, not additive participants. The chain has two regimes, not one. Get the model wrong and the math, however clean, is useless.

· · ·

What This Reveals About Pattern Matching

The conversation that produced this post noted something that deserves to be stated plainly. Language models — and, for that matter, experienced physicists — tend to fail these problems for the same structural reason. They recognize the problem type before they understand the problem. “Raft in a river” activates the river-flow template. “Anchor chain” activates the statics template. The activation happens quickly, feels certain, and suppresses further inquiry.

Perelman exploited the same failure mode in human readers in 1913. The machinery is different; the vulnerability is the same. Recognition is not understanding. The map is not the territory. A century of physics education has not reliably fixed this, which is perhaps the most interesting thing about these problems.

The person who reported the conversation solved all three correctly at age nine, before accumulating the subject-matter expertise that makes confident wrong answers possible. That’s not a paradox. A nine-year-old with a clear head and no “river problems” category to retrieve asks a more useful question: What is actually happening here? The answer, in all three cases, is something mundane and Newtonian that immediately follows once you ask the right question.

Perelman’s genius was in recognizing that physics education could produce people who were very fluent in physics and still unable to answer questions like these. His solution was to write problems that made the gap visible, painlessly and without advanced machinery, so the reader could feel it directly. A century later, that gap is still there, and the problems still work.

The river keeps flowing. The chain breaks in the same place every time.

blankYakov Perelman (1882–1942) was a Russian/Soviet science writer and educator. Занимательная физика (Physics for Entertainment) was first published in 1913 and remains in print. The three problems discussed here involve no mathematics beyond basic Newtonian mechanics and are suitable for secondary school students, which is part of the point.