Home Applications Machine Learning The Black Box on the Org Chart

The Black Box on the Org Chart

April 5, 2026

There’s a particular kind of delusion that takes hold in tech booms. It starts with a true thing, wraps itself around a useful thing, and then inflates into a preposterous thing.

The true thing is that large language models can be useful. They can summarize, scaffold, autocomplete, translate from one notation to another, and generate a first draft faster than most humans can find the right folder. The useful thing is that software shops can squeeze real productivity out of that, especially for low-stakes code generation, boilerplate, tests, migrations, and documentation.

The preposterous thing is the belief that, because a machine can produce plausible-looking work, it can therefore be entrusted with responsibility.

That leap, from “useful tool” to “autonomous actor,” is where the trouble starts.

A lot of the current AI boom rests on a category error. The sales pitch says we are transcending programming, transcending explicit instructions, transcending all the fussy old machinery of classical software. But the most commercially successful AI use case in serious organizations is not “AI runs the business.” It is “AI writes Python.”

That’s not a post-software future. That is software with an unusually talkative preprocessor.

And that detail matters, because it explains the trust boundary. Businesses will tolerate AI when AI produces an artifact that can be inspected, tested, linted, profiled, audited, versioned, rolled back, and cursed at in a bug report. In other words, they trust AI most when AI produces something that can be converted back into an ordinary engineering discipline. The output is accepted because it can be pinned to the mat and examined under bright lights.

This is not a minor implementation detail. It is the whole game.

If an AI assistant proposes a patch that can be read, diffed, and rejected, the organization retains agency. If the AI acts directly, especially in a high-stakes workflow, the organization inherits all the risk and loses most of the explanatory power. A generated code artifact is a buffer between human judgment and machine improvisation. It is a legal buffer, an operational buffer, and maybe most importantly, a psychological buffer.

That psychological part is the one the industry keeps pretending is temporary. It isn’t.

People do not trust systems they cannot understand, validate, or troubleshoot. That doesn’t mean they need a transistor-level understanding of every machine they use. Nobody needs to derive TCP from first principles to send an email. But they do need the system to fit inside a model of cause and effect that they can live with. They need repeatability. They need known failure modes. They need some plausible answer to “why did it do that?” and “how do we stop it from doing it again?”

Current AI systems are bad at providing those answers. Worse, they are often bad in the most infuriating way possible: they produce polished nonsense. They don’t merely fail. They narrate the failure with confidence.

Anyone who has tried using an LLM to build a nontrivial software project knows the pattern. The first few thousand lines are exhilarating. The thing scaffolds routes, models, glue code, tests, and helper functions at a speed that feels like cheating. Then the codebase gets bigger. More state. More cross-cutting assumptions. More weird edge cases. More places where changing one thing subtly invalidates three other things. You point out a bug. The model churns through a heroic number of tokens, announces that it has identified the root cause, delivers a soothingly generic explanation, proposes a fix, and the fix does not work.

Now the human gets to do the real job. Two cups of coffee, tens of thousands of lines of code, and a growing suspicion that the machine’s contribution was to increase the entropy while sounding helpful.

This is not just a quality problem. It is a scaling problem. These systems are excellent at local plausibility and unreliable at global coherence. They are good at emitting code-shaped objects and much worse at maintaining a stable mental model of a large evolving system. They do not “carry” the architecture the way a competent engineer does. They reconstruct a temporary approximation from the available context window and then improvise over the gaps. On a toy project, this looks magical. On a real project, it looks like technical debt accelerated to machine speed.

That gap between local competence and global understanding is one source of mistrust. Another is liability.

If a human employee makes a catastrophic error, a company has a standard script. It can identify the root cause, discipline or remove the person responsible, hire outside experts, remediate the issue, settle with the client, and promise improved controls. The narrative may not be flattering, but it is legible. The company can tell a story in which bad judgment was isolated, corrected, and fenced off from recurrence.

If an AI system materially produced the same failure, that story collapses.

You cannot fire the model in the ordinary sense if it is already embedded into the workflow, and your staff has been reduced on the assumption that the machine can carry the load. You may not be able to identify a clean root cause at all. Was it the prompt? The retrieval context? A latent model behavior? An integration defect? A subtle mismatch between generated modules? A forgotten assumption in an earlier output that was later built upon by another machine-generated component? Once the organization has substituted AI for staff capacity, “just remove the source of the problem” becomes an implausible response.

And the client, naturally, does not care.

The client does not care whether the error came from a reckless developer, a negligent manager, an underbaked deployment, or an LLM that spun a yarn and called it architecture. The client cares that the system broke, the deadline slipped, the data was exposed, the revenue was interrupted, or the compliance problem has their lawyers billing by the hour. From the client’s perspective, “the AI did something unexpected” is not an explanation. It is a confession that the vendor no longer controls its own production process.

This is where AI enthusiasm collides with executive risk tolerance. Corporate leadership loves replacing salaries with software right up until software starts creating liabilities that cannot be pushed downward onto an employee, a contractor, or a scapegoat. AI is not a legal person. It cannot be negligent in a way that relieves the company. It cannot absorb blame. It cannot sit across from a customer and apologize convincingly. It cannot be cross-examined into restoring confidence. So liability moves uphill, toward the people who approved the deployment and dismantled the human systems that used to catch mistakes.

Leaders are many things, but brave about personal exposure is usually not one of them.

That matters because the economics of AI are often sold as though labor substitution were the main event. Replace a portion of the engineering staff with AI assistants, let the remaining humans supervise, and capture the savings. But once the generated code becomes harder to debug, the root causes harder to isolate, the client narratives harder to sustain, and the liability harder to distribute, those savings start looking suspiciously fictional. Not gone, necessarily, but offset by new costs that the pitch deck left out: senior engineer time, remediation consulting, legal review, customer trust erosion, vendor lock-in, and the need to reinsert humans into the loop as expensive janitors for machine-made complexity.

The more unsettling implication is that these may not be temporary defects. Some of this is not a bug in implementation but a property of the architecture. These systems work by modeling statistical relationships in data and generating outputs that fit those relationships. They are not performing causal reasoning in the way human beings flatter themselves into believing they do. They can produce excellent results and terrible explanations. They can produce poor results and elegant explanations. The explanation is often part of the performance, not a transparent window into a chain of thought.

That gets worse when people start fantasizing about multi-agent systems.

In the glossy version, multiple agents collaborate. One plans, one researches, one codes, one tests, one reports. In the ugly version, you have a set of opaque probabilistic machines passing compressed internal representations around in forms that no human can naturally inspect, validate, or interrogate. The fact that the top layer emits natural language should not fool anyone into thinking the system “thinks” in natural language. Underneath the friendly prose, these systems operate in spaces that are alien to human audit. They do not reason in English, or Russian, or Python. They project, compress, rank, and sample.

That may be perfectly adequate for some tasks. It is not an inspiring basis for delegated accountability.

Once you have multiple such systems overlapping, the organization’s ability to recover from failure drops further. Now you do not merely have a black box. You have a committee of black boxes, all confident, all productive-looking, and none with a comprehensible internal life. In a demo, this is “agents collaborating.” In an incident report, it is “we do not know exactly how the system arrived at the action that caused the harm.”

No general counsel wants to read that sentence. No customer wants to hear it. No insurer wants to underwrite it.

So what happens next is not the extinction of AI, nor its effortless takeover. The likely outcome is more boring and more consequential: AI gets forced into narrower product shapes.

It becomes an assistant, a copilot, a drafting engine, a code generator, a search-and-summarize layer, a constrained automation component. It lives behind approval gates. It produces artifacts for review. It is monitored, rate-limited, logged, red-teamed, and fenced off from unsupervised authority in domains where mistakes have expensive consequences. It is judged less by raw benchmark performance and more by its explainability, repeatability, and failure containment.

This is bad news if you are selling artificial employees. It is decent news if you are selling tools.

The most durable AI businesses may not be the ones promising autonomous replacement, but the ones that make machine output easier to inspect, verify, trace, constrain, and roll back. In other words, the winners may be the firms that accept mistrust as rational and build around it, instead of treating it as a temporary marketing obstacle.

Because that mistrust is rational.

It is not irrational to hesitate before turning meaningful authority over to a system whose outputs are hard to predict, whose errors accumulate across complex projects, whose explanations are often synthetic, whose internal operations are opaque, and whose failures concentrate legal and operational risk on the humans who remain. It is not backward to want code you can inspect rather than decisions you cannot unpack. It is not Luddism to prefer a flawed employee you can discipline over an inscrutable machine you can only swap out at massive expense after it has already been woven into your process.

For years, the AI industry has promised to transcend programming. Instead, it has been discovered that the safest place for AI in business is often one layer below trust, generating software that humans can still analyze and debug. That is not a trivial niche. It may be the real market. But it is a much smaller and much more constrained dream than the one advertised from conference stages.

The black box may be brilliant. It may even be right most of the time. But in a world where every serious decision carries rewards and penalties, being right is not enough. Someone has to be able to explain what happened, fix what broke, and stand in front of the client without sounding like they outsourced judgment to a haunted autocomplete engine.

That someone, for the foreseeable future, is still a human.

Igor

Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.

Symbol	USD	% 1h	% 24h	% 7d
BTC	37,157	0.55	2.50	7.72
ETH	1,716.5	0.31	3.66	4.71
USDT	0.9995	0.00	0.03	0.08
BNB	573.79	1.04	3.41	14.85
USDC	0.9997	0.00	0.00	0.00
XRP	0.3813	0.14	0.63	2.13
SOL	147.93	0.13	1.23	6.13
TRX	0.3191	0.20	2.18	6.88
	?	---	0.00	0.00
	?	---	0.00	0.00

Bitcoin $ 37,157	Bitcoin 2.50 %
Ethereum $ 1,716.5	Ethereum 3.66 %
Litecoin $ 53.16	Litecoin 0.18 %
XRP $ 0.3813	XRP 0.63 %

Randomizing Filenames

Finding Gaps in Timestamps

Coronavirus Stats in Bash

Updating Lynis

Converting Geofency Data to Google Maps