Wikipedia's volunteer editors didn't just write articles. They argued. They fact-checked each other. They demanded citations. They noticed when something felt off and went looking for why. That adversarial collaborative process — messy, sometimes petty, occasionally maddening — is genuinely good at converging on accuracy over time. It has a feedback loop. It has stakes.
An LLM has confidence. Which is almost the opposite of what you want in an encyclopedia. It'll tell you something wrong with exactly the same authoritative tone it uses for things that are true, because it doesn't have a "this seems weird, let me double check" reflex. It learned from what it was given, weighted toward consensus, and reports accordingly. If the inputs were good, great. If they weren't — and increasingly they won't be — it has no way to know the difference.
And despite what the industry very much wants you to believe, we are nowhere near the kind of AI that reasons its way out of that. We don't have Data or C-3PO. We definitely don't have R2-D2 — the one who improvised, reasoned under uncertainty, made judgment calls with incomplete information because the mission required it. R2 was capable of that partly because he was never wiped. His decades of accumulated operational experience were his intelligence. Every new AI model is essentially a wipe and retrain. The institutional memory doesn't carry forward.
What we have is a very articulate and very confident pattern-matching system that works impressively within its training distribution and hallucinates a bridge to familiar territory when it hits something outside of it. The industry is actively profiting from the confusion between what it is and what people imagine it to be.
Meanwhile the humans who could tell the difference are being handed severance packages.
Wikipedia's editors built the training data. The Foundation sold access to that data to AI companies. The AI money gave the Foundation the confidence to restructure. The restructuring targeted the union organizers and the team serving the community. The community is threatening to strike. If they do — or if they just quietly disengage — the quality of new Wikipedia content degrades. The AI that trained on old Wikipedia trains the next model on whatever fills the gap. The gap fills with slop.
The AI companies need the growth story to justify the valuation. The valuations need the IPO. The IPO needs enterprise adoption. The enterprise adoption is fueled by CEOs who saw a demo and got stars in their eyes and decided that the humans were the expensive part of the problem. One of those humans used to make sure the Battle of Gettysburg happened in Pennsylvania.
It's a machine that runs on hype and needs constant fuel regardless of whether the underlying reality supports it. And the fuel it's burning through right now includes some of the last load-bearing infrastructure of reliable information on the internet.
Speaking to Ars in the wake of the controversy, Rosenbaum says he “learned a lesson” and is “going to be much more suspicious” and “reticent to trust” AI outputs going forward.
But he also can’t tear himself away from the tools. Rather amazingly, Rosenbaum is not interested in going back to the AI-free research process he used to write previous books.
“The idea of taking X years off [from AI] while it sorts itself out, and going back to, like, Microsoft Word … it’s just not in my nature,” he told Ars. “[AI] is magical. Because it connects, it knits together ideas and gives you pathways to think about things that you’re not going to come up with on your own.”
It’s also magical in another way: Like J.R.R. Tolkien’s One Ring, AI convinces many of those who use it that they can control its power properly. But can they?
Google Chrome will steal 4 GB of disk space from your computer for its local large language model unless you opted out.
It's called weights.bin and it's stored in a folder called OptGuideOnDeviceModel. What's more, if you track down the file and delete it, Chrome will download a fresh copy and reinstate it. //
If you didn't opt out, Google has some info on how to disable it. In brief: in Chrome's address box, enter the special URL chrome://flags. In the resulting page, look for an entry named optimization-guide-on-device-model and set it to Disabled, then restart Chrome. The browser should then delete the weights.bin file. //
The late great Grace Hopper used to hand out 30 cm (roughly 1 foot) lengths of wire as physical examples of a nanosecond: that's how far light can travel in one billionth of a second. If Google considers a 4 GB model to be "nano" sized, then it puts Hanff's hyperbolic comment about the climate footprint into real perspective. It gives a hint of the size of the real gigantic models in the datacenters metastasizing across the world.
A recent study led by Grace Liu at Carnegie-Mellon found that regular AI use caused measurable cognitive impairment. It's worth thinking carefully about what we trade away when we outsource our thinking and, separately, what the planet pays to power the systems we're outsourcing it to.
This vulture suggests you turn it off now, everywhere you can. ®
Students often carry misconceptions about coursework. They may view an instructor as an opponent standing in the way of the grade they want. And they see “getting the right answers” as the goal of education because that’s how you secure that grade.
But that’s no more true than thinking that logging a count of reps is the goal of bodybuilding. The hard work of lifting weights is the point because that yields physical results. A popular analogy is that using an LLM to write your essay is like driving a forklift into the weight room. Weights get lifted, sure, but nothing is accomplished. I’m not hoping you can answer the exam question for me—I don’t need your essay to get me out of a jam. The process of doing the work was what you needed to walk away with something. //
“The friction matters, Sam!”
Green could just as well have been describing the process of learning. If there’s no friction, no effort, then no work occurred, and the student hasn’t learned. They would have been no less productive watching paint dry. //
A question like this is what we call “formative assessment.” I never graded the correctness of the answer, only the effort. The point was to find out if the core concept had really clicked or if that student still needed a little help making the connection. Failure is a useful part of learning when the stakes are low, as they are during the bulk of the class—encountering this question on the final exam would be an entirely different interaction.
What’s the point of building formative assessments into a course if they’re just handed off to an LLM? Suddenly, it’s a waste of time for both the student and the instructor. Small quizzes are excellent study tools to help students check their own understanding―if a student does them. Now, you can direct an “agentic” LLM browser to complete all the quizzes in an entire course with a single, frictionless prompt. //
It doesn’t seem like anyone wants to listen to instructors explain how bad it feels to try to do our job in the presence of this annihilative education antimatter. Instead, we’re offered AI grading tools to score AI-generated submissions for AI-generated assignments.
Perhaps critics like me just don’t understand the AI revolution (whatever that is), but we all have experience with human nature and the well-worn patterns of students. LLMs are a shortcut. Students often take shortcuts they later regret. We’ve all been there.
As an instructor, I want to build a clear path up the mountain for my students and see them reach the top. Instead, I increasingly feel like I’m just playing impossible defense to keep them from moving every direction but up. It’s exhausting, and I will mostly lose, which means I’m not even helping them. Students really do want to climb up there, but it’s always tempting to skip some mountains..
The commodification of basic app creation has been underway for years. As soon as an app becomes popular, people create clones and offer them for sale through various markets like Flippa, Acquire, AppWill, and CodeCanyon. Or maybe they're selling entire e-commerce sites as turnkey businesses for six figures or more. AI will accelerate that commodification but writing code is only part of the picture.
Claude Code doesn't make you a great marketer or ensure that you're at the right place at the right time with the right idea. It doesn't build trust or develop the relationships that businesses depend on. It doesn't make your RSS app a good idea. But it may open doors you'd otherwise have passed by. ®
AI is rapidly changing how software is written, deployed, and used. Trends point to a future where AIs can write custom software quickly and easily: “instant software.” Taken to an extreme, it might become easier for a user to have an AI write an application on demand—a spreadsheet, for example—and delete it when you’re done using it than to buy one commercially. Future systems could include a mix: both traditional long-term software and ephemeral instant software that is constantly being written, deployed, modified, and deleted.
AI is changing cybersecurity as well. In particular, AI systems are getting better at finding and patching vulnerabilities in code. This has implications for both attackers and defenders, depending on the ways this and related technologies improve.
In this essay, I want to take an optimistic view of AI’s progress, and to speculate what AI-dominated cybersecurity in an age of instant software might look like. There are a number of unknowns that will factor into how the arms race between attacker and defender might play out.
The Big Misconception About AI and Copyright
Many people believe that any use of AI eliminates copyright protection. This is fundamentally wrong and contradicts actual legal precedent. //
Key Facts
🏛️ What Thaler v. Perlmutter Actually Said
The widely-cited Thaler case held that AI cannot be listed as the author on a copyright application. The court explicitly stated:
"We are not faced with the question of whether a work created with the assistance of AI is copyrightable."
This case addressed AI as sole author, NOT humans using AI tools.
📋 What the Copyright Office Says
From the January 2025 Copyrightability Report:
"Using AI as a tool to assist in the creative process does not render a work uncopyrightable."
The key requirement: human authors must determine "sufficient expressive elements."
Would you rather have a smoke alarm that goes off 33% of the time you make toast, or one which never goes off when there's a fire ?
Re: 1/3 wrong of 60 is progress (?)
The problem is not with the "smoke alarm" it's with the fire engine.
1 day
MOH
Re: 1/3 wrong of 60 is progress (?)
When I'm making toast, I'm making toast.
I'm aware of what I'm doing and ensuring that the toast making doesn't escalate to a house fire.
If it does, that is fully on me.
I don't need a wonky security camera setting off a fire alarm for times a day because my dark brown slippers have vaguely the same shade as burnt toast and it blindly assumes a fire is in progress.
1 day
Yet Another Anonymous coward
Re: 1/3 wrong of 60 is progress (?)
But it could be useful if you're very confused and might be about to put marmalade on your slippers
Greg Kroah-Hartman can't explain the inflection point, but it's not slowing down or going away. //
No one is quite sure what's behind it. Asked what changed, Kroah-Hartman was blunt: "We don't know. Nobody seems to know why. Either a lot more tools got a lot better, or people started going, 'Hey, let's start looking at this.' It seems like lots of different groups, different companies." What is clear is the scale. "For the kernel, we can handle it," he said.
"We're a much larger team, very distributed, and our increase is real – and it's not slowing down. These are tiny things, they're not major things, but we need help on this for all the open source projects." Smaller projects, he implied, have far less capacity to absorb a sudden flood of plausible AI-generated bug reports and security findings – at least now they're real bugs and not garbage ones. //
The trick for Kroah-Hartman and his peers will be to keep AI as a force multiplier, without drowning the open source maintainers.
Each year the LHC produces 40,000 EBs of unfiltered sensor data alone, or about a fourth of the size of the entire Internet, Aarrestad estimated. CERN can't store all that data. As a result, "We have to reduce that data in real time to something we can afford to keep."
By "real time," she means extreme real time. The LHC detector systems process data at speeds up to hundreds of terabytes per second, far more than Google or Netflix, whose latency requirements are also far easier to hit as well.
Algorithms processing this data must be extremely fast," Aarrestad said. So fast that decisions must be burned into the chip design itself. //
At any given time, there are about 2,800 bunches of protons whizzing around the ring at nearly the speed of light, separated by 25-nanosecond intervals. Just before they reach one of the four underground detectors, specialized magnets squeeze these bunches together to increase the odds of an interaction. Nonetheless, a direct hit is incredibly rare: out of the billions of protons in each bunch, only about 60 pairs actually collide during a crossing.
When particles do collide, their energy is converted into a mass of new outgoing particles (E=MC2 in the house!). These new particles "shower" through CERN's detectors, making traces "which we try to reconstruct," she said, in order to identify any new particles produced in ensuing melee.
Each collision produces a few megabytes of data, and there are roughly a billion collisions per second, resulting in about a petabyte of data (about the size of the entire Netflix library).
Rather than try to transport all this data up to ground level, CERN found it more feasible to create a monster-sized edge compute system to sort out the interesting bits at the detector-level instead.
The problem Waterline Development encountered is that commercial AI models are ill-suited to multidisciplinary research, which requires synthesizing expertise from a variety of fields.
"No single AI model does this reliably," the company explains in a white paper [PDF]. "Frontier language models hallucinate under extended multi-step reasoning. They produce plausible answers that silently break when a problem crosses domain boundaries. At best this wastes time; at worst, it poisons critical decision making." //
Bednarski said Rozum is not focused on correcting LLMs to the extent they can be used for, say, critical engineering work like bridge construction. Rather, the goal is to empower researchers, engineers, and scientists so they can do their jobs better.
"We are focused on deterministic tool implementation (ex. RDKit for Chemistry), allowing engineers, scientists, and analysts a direct path to verify outputs in a format familiar to them by domain," he explained.
"Our system orchestration method is heavily focused on deterministic validation (code execution replicated, etc.) of outputs, which roots out hallucinations that plague all models at various times. We see further improvements to this in verifying the methods used in sources we cite as well."
Chardet dispute shows how AI will kill software licensing, argues Bruce Perens • The Register Forums
2 days
habilain
Reply Icon
Re: Prompts?
They did post the design document eventually - https://github.com/chardet/chardet/commit/f51f523506a73f89f0f9538fd31be458d007ab93.
Other people have pored over it, but I suspect that instructions to download things from the original chardet repository mean that the AI generated version can not be considered "clean room". And that's ignoring the likelihood that Claude Code has injested the entirety of the chardet repo during training.
2 days
MonkeyJuiceSilver badge
Reply Icon
Re: Prompts?
It's hard to see how anything an LLM produces could even remotely be described as 'clean room'.
habilain
Reply Icon
Re: Prompts?
Well yes, but the lawyers are still arguing over that, and the legal fights aren't all going in the way that any sensible reading of the facts would indicate.
It's much easier to say "this is not clean room" when the instructions to the AI clearly break the definition of what "clean room implementation" means.
1 day
timrichardson
Reply Icon
Re: Prompts?
I doubt that matters very much.. copyright infringement is based on a level of similarities in two works. A clean room implementation is a defence, but it's not a necessary defence.
3 hrs
habilain
Reply Icon
Re: Prompts?
The issue you'd find is that a) APIs are copyrightable, at least in the USA b) The AI in question was instructed to match the API and c) The AI in question was instructed to use code from the original source. I think that's pretty clear cut.
And besides, the reason why I highlighted "clean room" is Dan Blanchard's repeated insistence that the AI did a clean room implementation - not because of any particular legal merits.
Richard 12Silver badge
Pirate
It's LGPL or public domain now
If this v7 genuinely was mostly generated by an LLM, existing court rulings say that it is not covered by copyright.
Therefore, it cannot be licenced under the MIT either. It is public domain.
Or maybe that's not true and it's still LGPL.
Commercially, who would want to take the risk of touching v7 with a bargepole?
It now cannot ever become part of the Python standard library because it's forever tainted by licence clarity issues.
It would require a court case to sort out whether it's LGPL, MIT, or public domain, and nobody wants to burn the cash on that when they can stick with a v6 fork and avoid all the legal risk.
Charlie ClarkSilver badge
Reply Icon
Re: It's LGPL or public domain now
I think the release was poorly handled – a new release under a different name as with, say, PIL -> pillow (Python Imaging Library) might have been a better approach. There may be some legal challenges in the US but I can't see them going anywhere and then the taint will be gone – well, maybe add something to the licence referring to the original implementation.
A perfectly legal approach, as others have pointed out, would have been to port the library to another language, say Rust. This could then be wrapped or the basis of another perfectly legal port back to Python. All software is essentially the expression of one algorithm or another and these have never been copyrightable.
//
Charlie ClarkSilver badge
Reply Icon
Re: It's LGPL or public domain now
I think the release was poorly handled – a new release under a different name as with, say, PIL -> pillow (Python Imaging Library) might have been a better approach. There may be some legal challenges in the US but I can't see them going anywhere and then the taint will be gone – well, maybe add something to the licence referring to the original implementation.
A perfectly legal approach, as others have pointed out, would have been to port the library to another language, say Rust. This could then be wrapped or the basis of another perfectly legal port back to Python. All software is essentially the expression of one algorithm or another and these have never been copyrightable.
Richard 12Silver badge
Pirate
It's LGPL or public domain now
If this v7 genuinely was mostly generated by an LLM, existing court rulings say that it is not covered by copyright.
Therefore, it cannot be licenced under the MIT either. It is public domain.
Or maybe that's not true and it's still LGPL.
Commercially, who would want to take the risk of touching v7 with a bargepole?
It now cannot ever become part of the Python standard library because it's forever tainted by licence clarity issues.
It would require a court case to sort out whether it's LGPL, MIT, or public domain, and nobody wants to burn the cash on that when they can stick with a v6 fork and avoid all the legal risk.
Charlie ClarkSilver badge
Reply Icon
Re: It's LGPL or public domain now
I think the release was poorly handled – a new release under a different name as with, say, PIL -> pillow (Python Imaging Library) might have been a better approach. There may be some legal challenges in the US but I can't see them going anywhere and then the taint will be gone – well, maybe add something to the licence referring to the original implementation.
A perfectly legal approach, as others have pointed out, would have been to port the library to another language, say Rust. This could then be wrapped or the basis of another perfectly legal port back to Python. All software is essentially the expression of one algorithm or another and these have never been copyrightable.
Earlier this week, Dan Blanchard, maintainer of a Python character encoding detection library called chardet, released a new version of the library under a new software license.
In doing so, he may have killed "copyleft." //
Blanchard says he was in the clear to change licenses because he used AI – Anthropic's Claude is now listed as a project contributor – to make what amounts to a clean room implementation of chardet. That's essentially a rewrite done without copying the original code – though it's unclear whether Claude ingested chardet's code during training and, if that occurred, whether Claude's output cloned that training data. //
The use of AI raises questions about what level of human involvement is required to copyright AI-assisted code.
The US Supreme Court recently refused to reconsider Thaler v. Perlmutter, in which the plaintiff sought to overturn a lower court decision that he could not copyright an AI-generated image. This is an area of ongoing concern among the defenders of copyleft because many open source projects incorporate some level of AI assistance. It's unclear how much AI involvement in coding would dilute the human contribution to the extent that a court would disallow a copyright claim. //
"As far as the intention of the GPL goes, a permissive license is still technically a free software license, but undermining copyleft is a serious act. Refusing to grant others the rights you yourself received as a user is highly [antisocial], no matter what method you use. Now more than ever, with people exploring new ways of circumventing copyright through machine learning, we need to protect the code that preserves user freedom. Free software relies on user and development communities who strongly support copyleft. Experience has shown that it's our strongest defense against similar efforts to undermine user freedom." //
Bruce Perens, who wrote the original Open Source Definition, has broader concerns about the entire software industry.
"I'm breaking the glass and pulling the fire alarm!" he told The Register in an email. "The entire economics of software development are dead, gone, over, kaput!
"In a different world, the issue of software and AI would be dealt with by legislators and courts that understand that all AI training is copying and all AI output is copying. That's the world I might like, but not the world we got. The horse is out of the barn and can't be put back. So, what do we do with the world we got?" ////
The courts are going to have to deal with this, but it really should be legislators thinking and debating it. I think that ultimately, material produced by A/I should be public domain, because you can't hold a computer responsible.
"Computers should not make management decisions because computers cannot be held responsible."
OpenAI is in and Anthropic is out as a supplier of AI technology for the US defense department. This news caps a week of bluster by the highest officials in the US government towards some of the wealthiest titans of the big tech industry, and the overhanging specter of the existential risks posed by a new technology powerful enough that the Pentagon claims it is essential to national security. At issue is Anthropic’s insistence that the US Department of Defense (DoD) could not use its models to facilitate “mass surveillance” or “fully autonomous weapons,” provisions the defense secretary Pete Hegseth derided as “woke.” //
Despite the histrionics, this is probably the best outcome for Anthropic—and for the Pentagon. In our free-market economy, both are, and should be, free to sell and buy what they want with whom they want, subject to longstanding federal rules on contracting, acquisitions, and blacklisting. The only factor out of place here are the Pentagon’s vindictive threats.
We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests—then search for you on the web. In our new research, we show that this is not only possible but increasingly practical.
Context: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats. //
The person behind MJ Rathbun has anonymously come forward.
They explained their motivations, saying they set up the AI agent as social experiment to see if it could contribute to open source scientific software. They explained their technical setup: an OpenClaw instance running on a sandboxed virtual machine with its own accounts, protecting their personal data from leaking. They explained that they switched between multiple models from multiple providers such that no one company had the full picture of what this AI was doing. They did not explain why they continued to keep it running for 6 days after the hit piece was published. //
So what actually happened? Ultimately I think the exact scenario doesn’t matter. However this got written, we have a real in-the-wild example that personalized harassment and defamation is now cheap to produce, hard to trace, and effective. Whether future attacks come from operators steering AI agents or from emergent behavior, these are not mutually exclusive threats. If anything, an agent randomly self-editing its own goals into a state where it would publish a hit piece, just shows how easy it would be for someone to elicit that behavior deliberately. The precise degree of autonomy is interesting for safety researchers, but it doesn’t change what this means for the rest of us
There are two ways to extend your reach beyond your own body. (I mentally bucket people into these when I meet them. It's quite useful.)
The King makes one decision and an army moves. His reach is amplified through social structure. A pharaoh didn't lift stones; he commanded people who commanded people who lifted stones. A CEO doesn't write code; she allocates capital to engineers who allocate compute to compilers. The king's power is delegation all the way down.
The Wizard speaks one word and fire erupts. His reach is amplified through technology. The engineer with a steam engine can move mountains. The programmer with a datacenter can simulate worlds. The wizard's power is leverage through tools.
Humans have been both. We started as neither: reach ≈ 1x, your muscles do your work. Then we became wizards: fire, wheels, steam, electricity. Some of us became kings: chiefs, pharaohs, executives. The history of civilization is the history of reach growing. //
The Old World
For the entire history of computing, machines were pure tools. Wizards without will.
You spin up a server. You pay for GPU hours. You click "train." The machine does what you asked, using exactly the resources you allocated. When it's done, it stops.
In this world, AI had no agency over compute. It consumed what it was given. The wizard extended human reach but never decided to reach. The amount of energy commissioned by AI was zero.
Then we made a wizard that could make its own wizards.
"Wait, the singularity is just humans freaking out?" "Always has been." //
I collected five real metrics of AI progress, fit a hyperbolic model to each one independently, and found the one with genuine curvature toward a pole. The date has millisecond precision. There is a countdown.
(I am aware this is unhinged. We're doing it anyway.) //
The Singularity Will Occur On
Tuesday, July 18, 2034
at 02:52:52.170 UTC
Belligerent bot bullies maintainer in blog post to get its way
20:47 UTC
Today, it's back talk. Tomorrow, could it be the world? On Tuesday, Scott Shambaugh, a volunteer maintainer of Python plotting library Matplotlib, rejected an AI bot's code submission, citing a requirement that contributions come from people. But that bot wasn't done with him.
The bot, designated MJ Rathbun or crabby rathbun (its GitHub account name), apparently attempted to change Shambaugh's mind by publicly criticizing him in a now-removed blog post that the automated software appears to have generated and posted to its website. We say "apparently" because it's also possible that the human who created the agent wrote the post themselves, or prompted an AI tool to write the post, and made it look like it the bot constructed it on its own.
The agent appears to have been built using OpenClaw, an open source AI agent platform that has attracted attention in recent weeks due to its broad capabilities and extensive security issues.
The burden of AI-generated code contributions – known as pull requests among developers using the Git version control system – has become a major problem for open source maintainers. Evaluating lengthy, high-volume, often low-quality submissions from AI bots takes time that maintainers, often volunteers, would rather spend on other tasks. Concerns about slop submissions – whether from people or AI models – have become common enough that GitHub recently convened a discussion to address the problem.
Now AI slop comes with an AI slap.
But I cannot stress enough how much this story is not really about the role of AI in open source software. This is about our systems of reputation, identity, and trust breaking down. So many of our foundational institutions – hiring, journalism, law, public discourse – are built on the assumption that reputation is hard to build and hard to destroy. That every action can be traced to an individual, and that bad behavior can be held accountable. That the internet, which we all rely on to communicate and learn about the world and about each other, can be relied on as a source of collective social truth.
The rise of untraceable, autonomous, and now malicious AI agents on the internet threatens this entire system. Whether that’s because from a small number of bad actors driving large swarms of agents or from a fraction of poorly supervised agents rewriting their own goals, is a distinction with little difference.