The Big Misconception About AI and Copyright
Many people believe that any use of AI eliminates copyright protection. This is fundamentally wrong and contradicts actual legal precedent. //
Key Facts
🏛️ What Thaler v. Perlmutter Actually Said
The widely-cited Thaler case held that AI cannot be listed as the author on a copyright application. The court explicitly stated:
"We are not faced with the question of whether a work created with the assistance of AI is copyrightable."
This case addressed AI as sole author, NOT humans using AI tools.
📋 What the Copyright Office Says
From the January 2025 Copyrightability Report:
"Using AI as a tool to assist in the creative process does not render a work uncopyrightable."
The key requirement: human authors must determine "sufficient expressive elements."
Would you rather have a smoke alarm that goes off 33% of the time you make toast, or one which never goes off when there's a fire ?
Re: 1/3 wrong of 60 is progress (?)
The problem is not with the "smoke alarm" it's with the fire engine.
1 day
MOH
Re: 1/3 wrong of 60 is progress (?)
When I'm making toast, I'm making toast.
I'm aware of what I'm doing and ensuring that the toast making doesn't escalate to a house fire.
If it does, that is fully on me.
I don't need a wonky security camera setting off a fire alarm for times a day because my dark brown slippers have vaguely the same shade as burnt toast and it blindly assumes a fire is in progress.
1 day
Yet Another Anonymous coward
Re: 1/3 wrong of 60 is progress (?)
But it could be useful if you're very confused and might be about to put marmalade on your slippers
Greg Kroah-Hartman can't explain the inflection point, but it's not slowing down or going away. //
No one is quite sure what's behind it. Asked what changed, Kroah-Hartman was blunt: "We don't know. Nobody seems to know why. Either a lot more tools got a lot better, or people started going, 'Hey, let's start looking at this.' It seems like lots of different groups, different companies." What is clear is the scale. "For the kernel, we can handle it," he said.
"We're a much larger team, very distributed, and our increase is real – and it's not slowing down. These are tiny things, they're not major things, but we need help on this for all the open source projects." Smaller projects, he implied, have far less capacity to absorb a sudden flood of plausible AI-generated bug reports and security findings – at least now they're real bugs and not garbage ones. //
The trick for Kroah-Hartman and his peers will be to keep AI as a force multiplier, without drowning the open source maintainers.
Each year the LHC produces 40,000 EBs of unfiltered sensor data alone, or about a fourth of the size of the entire Internet, Aarrestad estimated. CERN can't store all that data. As a result, "We have to reduce that data in real time to something we can afford to keep."
By "real time," she means extreme real time. The LHC detector systems process data at speeds up to hundreds of terabytes per second, far more than Google or Netflix, whose latency requirements are also far easier to hit as well.
Algorithms processing this data must be extremely fast," Aarrestad said. So fast that decisions must be burned into the chip design itself. //
At any given time, there are about 2,800 bunches of protons whizzing around the ring at nearly the speed of light, separated by 25-nanosecond intervals. Just before they reach one of the four underground detectors, specialized magnets squeeze these bunches together to increase the odds of an interaction. Nonetheless, a direct hit is incredibly rare: out of the billions of protons in each bunch, only about 60 pairs actually collide during a crossing.
When particles do collide, their energy is converted into a mass of new outgoing particles (E=MC2 in the house!). These new particles "shower" through CERN's detectors, making traces "which we try to reconstruct," she said, in order to identify any new particles produced in ensuing melee.
Each collision produces a few megabytes of data, and there are roughly a billion collisions per second, resulting in about a petabyte of data (about the size of the entire Netflix library).
Rather than try to transport all this data up to ground level, CERN found it more feasible to create a monster-sized edge compute system to sort out the interesting bits at the detector-level instead.
The problem Waterline Development encountered is that commercial AI models are ill-suited to multidisciplinary research, which requires synthesizing expertise from a variety of fields.
"No single AI model does this reliably," the company explains in a white paper [PDF]. "Frontier language models hallucinate under extended multi-step reasoning. They produce plausible answers that silently break when a problem crosses domain boundaries. At best this wastes time; at worst, it poisons critical decision making." //
Bednarski said Rozum is not focused on correcting LLMs to the extent they can be used for, say, critical engineering work like bridge construction. Rather, the goal is to empower researchers, engineers, and scientists so they can do their jobs better.
"We are focused on deterministic tool implementation (ex. RDKit for Chemistry), allowing engineers, scientists, and analysts a direct path to verify outputs in a format familiar to them by domain," he explained.
"Our system orchestration method is heavily focused on deterministic validation (code execution replicated, etc.) of outputs, which roots out hallucinations that plague all models at various times. We see further improvements to this in verifying the methods used in sources we cite as well."
Chardet dispute shows how AI will kill software licensing, argues Bruce Perens • The Register Forums
2 days
habilain
Reply Icon
Re: Prompts?
They did post the design document eventually - https://github.com/chardet/chardet/commit/f51f523506a73f89f0f9538fd31be458d007ab93.
Other people have pored over it, but I suspect that instructions to download things from the original chardet repository mean that the AI generated version can not be considered "clean room". And that's ignoring the likelihood that Claude Code has injested the entirety of the chardet repo during training.
2 days
MonkeyJuiceSilver badge
Reply Icon
Re: Prompts?
It's hard to see how anything an LLM produces could even remotely be described as 'clean room'.
habilain
Reply Icon
Re: Prompts?
Well yes, but the lawyers are still arguing over that, and the legal fights aren't all going in the way that any sensible reading of the facts would indicate.
It's much easier to say "this is not clean room" when the instructions to the AI clearly break the definition of what "clean room implementation" means.
1 day
timrichardson
Reply Icon
Re: Prompts?
I doubt that matters very much.. copyright infringement is based on a level of similarities in two works. A clean room implementation is a defence, but it's not a necessary defence.
3 hrs
habilain
Reply Icon
Re: Prompts?
The issue you'd find is that a) APIs are copyrightable, at least in the USA b) The AI in question was instructed to match the API and c) The AI in question was instructed to use code from the original source. I think that's pretty clear cut.
And besides, the reason why I highlighted "clean room" is Dan Blanchard's repeated insistence that the AI did a clean room implementation - not because of any particular legal merits.
Richard 12Silver badge
Pirate
It's LGPL or public domain now
If this v7 genuinely was mostly generated by an LLM, existing court rulings say that it is not covered by copyright.
Therefore, it cannot be licenced under the MIT either. It is public domain.
Or maybe that's not true and it's still LGPL.
Commercially, who would want to take the risk of touching v7 with a bargepole?
It now cannot ever become part of the Python standard library because it's forever tainted by licence clarity issues.
It would require a court case to sort out whether it's LGPL, MIT, or public domain, and nobody wants to burn the cash on that when they can stick with a v6 fork and avoid all the legal risk.
Charlie ClarkSilver badge
Reply Icon
Re: It's LGPL or public domain now
I think the release was poorly handled – a new release under a different name as with, say, PIL -> pillow (Python Imaging Library) might have been a better approach. There may be some legal challenges in the US but I can't see them going anywhere and then the taint will be gone – well, maybe add something to the licence referring to the original implementation.
A perfectly legal approach, as others have pointed out, would have been to port the library to another language, say Rust. This could then be wrapped or the basis of another perfectly legal port back to Python. All software is essentially the expression of one algorithm or another and these have never been copyrightable.
//
Charlie ClarkSilver badge
Reply Icon
Re: It's LGPL or public domain now
I think the release was poorly handled – a new release under a different name as with, say, PIL -> pillow (Python Imaging Library) might have been a better approach. There may be some legal challenges in the US but I can't see them going anywhere and then the taint will be gone – well, maybe add something to the licence referring to the original implementation.
A perfectly legal approach, as others have pointed out, would have been to port the library to another language, say Rust. This could then be wrapped or the basis of another perfectly legal port back to Python. All software is essentially the expression of one algorithm or another and these have never been copyrightable.
Richard 12Silver badge
Pirate
It's LGPL or public domain now
If this v7 genuinely was mostly generated by an LLM, existing court rulings say that it is not covered by copyright.
Therefore, it cannot be licenced under the MIT either. It is public domain.
Or maybe that's not true and it's still LGPL.
Commercially, who would want to take the risk of touching v7 with a bargepole?
It now cannot ever become part of the Python standard library because it's forever tainted by licence clarity issues.
It would require a court case to sort out whether it's LGPL, MIT, or public domain, and nobody wants to burn the cash on that when they can stick with a v6 fork and avoid all the legal risk.
Charlie ClarkSilver badge
Reply Icon
Re: It's LGPL or public domain now
I think the release was poorly handled – a new release under a different name as with, say, PIL -> pillow (Python Imaging Library) might have been a better approach. There may be some legal challenges in the US but I can't see them going anywhere and then the taint will be gone – well, maybe add something to the licence referring to the original implementation.
A perfectly legal approach, as others have pointed out, would have been to port the library to another language, say Rust. This could then be wrapped or the basis of another perfectly legal port back to Python. All software is essentially the expression of one algorithm or another and these have never been copyrightable.
Earlier this week, Dan Blanchard, maintainer of a Python character encoding detection library called chardet, released a new version of the library under a new software license.
In doing so, he may have killed "copyleft." //
Blanchard says he was in the clear to change licenses because he used AI – Anthropic's Claude is now listed as a project contributor – to make what amounts to a clean room implementation of chardet. That's essentially a rewrite done without copying the original code – though it's unclear whether Claude ingested chardet's code during training and, if that occurred, whether Claude's output cloned that training data. //
The use of AI raises questions about what level of human involvement is required to copyright AI-assisted code.
The US Supreme Court recently refused to reconsider Thaler v. Perlmutter, in which the plaintiff sought to overturn a lower court decision that he could not copyright an AI-generated image. This is an area of ongoing concern among the defenders of copyleft because many open source projects incorporate some level of AI assistance. It's unclear how much AI involvement in coding would dilute the human contribution to the extent that a court would disallow a copyright claim. //
"As far as the intention of the GPL goes, a permissive license is still technically a free software license, but undermining copyleft is a serious act. Refusing to grant others the rights you yourself received as a user is highly [antisocial], no matter what method you use. Now more than ever, with people exploring new ways of circumventing copyright through machine learning, we need to protect the code that preserves user freedom. Free software relies on user and development communities who strongly support copyleft. Experience has shown that it's our strongest defense against similar efforts to undermine user freedom." //
Bruce Perens, who wrote the original Open Source Definition, has broader concerns about the entire software industry.
"I'm breaking the glass and pulling the fire alarm!" he told The Register in an email. "The entire economics of software development are dead, gone, over, kaput!
"In a different world, the issue of software and AI would be dealt with by legislators and courts that understand that all AI training is copying and all AI output is copying. That's the world I might like, but not the world we got. The horse is out of the barn and can't be put back. So, what do we do with the world we got?" ////
The courts are going to have to deal with this, but it really should be legislators thinking and debating it. I think that ultimately, material produced by A/I should be public domain, because you can't hold a computer responsible.
"Computers should not make management decisions because computers cannot be held responsible."
OpenAI is in and Anthropic is out as a supplier of AI technology for the US defense department. This news caps a week of bluster by the highest officials in the US government towards some of the wealthiest titans of the big tech industry, and the overhanging specter of the existential risks posed by a new technology powerful enough that the Pentagon claims it is essential to national security. At issue is Anthropic’s insistence that the US Department of Defense (DoD) could not use its models to facilitate “mass surveillance” or “fully autonomous weapons,” provisions the defense secretary Pete Hegseth derided as “woke.” //
Despite the histrionics, this is probably the best outcome for Anthropic—and for the Pentagon. In our free-market economy, both are, and should be, free to sell and buy what they want with whom they want, subject to longstanding federal rules on contracting, acquisitions, and blacklisting. The only factor out of place here are the Pentagon’s vindictive threats.
We show that from a handful of comments, LLMs can infer where you live, what you do, and your interests—then search for you on the web. In our new research, we show that this is not only possible but increasingly practical.
Context: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats. //
The person behind MJ Rathbun has anonymously come forward.
They explained their motivations, saying they set up the AI agent as social experiment to see if it could contribute to open source scientific software. They explained their technical setup: an OpenClaw instance running on a sandboxed virtual machine with its own accounts, protecting their personal data from leaking. They explained that they switched between multiple models from multiple providers such that no one company had the full picture of what this AI was doing. They did not explain why they continued to keep it running for 6 days after the hit piece was published. //
So what actually happened? Ultimately I think the exact scenario doesn’t matter. However this got written, we have a real in-the-wild example that personalized harassment and defamation is now cheap to produce, hard to trace, and effective. Whether future attacks come from operators steering AI agents or from emergent behavior, these are not mutually exclusive threats. If anything, an agent randomly self-editing its own goals into a state where it would publish a hit piece, just shows how easy it would be for someone to elicit that behavior deliberately. The precise degree of autonomy is interesting for safety researchers, but it doesn’t change what this means for the rest of us
There are two ways to extend your reach beyond your own body. (I mentally bucket people into these when I meet them. It's quite useful.)
The King makes one decision and an army moves. His reach is amplified through social structure. A pharaoh didn't lift stones; he commanded people who commanded people who lifted stones. A CEO doesn't write code; she allocates capital to engineers who allocate compute to compilers. The king's power is delegation all the way down.
The Wizard speaks one word and fire erupts. His reach is amplified through technology. The engineer with a steam engine can move mountains. The programmer with a datacenter can simulate worlds. The wizard's power is leverage through tools.
Humans have been both. We started as neither: reach ≈ 1x, your muscles do your work. Then we became wizards: fire, wheels, steam, electricity. Some of us became kings: chiefs, pharaohs, executives. The history of civilization is the history of reach growing. //
The Old World
For the entire history of computing, machines were pure tools. Wizards without will.
You spin up a server. You pay for GPU hours. You click "train." The machine does what you asked, using exactly the resources you allocated. When it's done, it stops.
In this world, AI had no agency over compute. It consumed what it was given. The wizard extended human reach but never decided to reach. The amount of energy commissioned by AI was zero.
Then we made a wizard that could make its own wizards.
"Wait, the singularity is just humans freaking out?" "Always has been." //
I collected five real metrics of AI progress, fit a hyperbolic model to each one independently, and found the one with genuine curvature toward a pole. The date has millisecond precision. There is a countdown.
(I am aware this is unhinged. We're doing it anyway.) //
The Singularity Will Occur On
Tuesday, July 18, 2034
at 02:52:52.170 UTC
Belligerent bot bullies maintainer in blog post to get its way
20:47 UTC
Today, it's back talk. Tomorrow, could it be the world? On Tuesday, Scott Shambaugh, a volunteer maintainer of Python plotting library Matplotlib, rejected an AI bot's code submission, citing a requirement that contributions come from people. But that bot wasn't done with him.
The bot, designated MJ Rathbun or crabby rathbun (its GitHub account name), apparently attempted to change Shambaugh's mind by publicly criticizing him in a now-removed blog post that the automated software appears to have generated and posted to its website. We say "apparently" because it's also possible that the human who created the agent wrote the post themselves, or prompted an AI tool to write the post, and made it look like it the bot constructed it on its own.
The agent appears to have been built using OpenClaw, an open source AI agent platform that has attracted attention in recent weeks due to its broad capabilities and extensive security issues.
The burden of AI-generated code contributions – known as pull requests among developers using the Git version control system – has become a major problem for open source maintainers. Evaluating lengthy, high-volume, often low-quality submissions from AI bots takes time that maintainers, often volunteers, would rather spend on other tasks. Concerns about slop submissions – whether from people or AI models – have become common enough that GitHub recently convened a discussion to address the problem.
Now AI slop comes with an AI slap.
But I cannot stress enough how much this story is not really about the role of AI in open source software. This is about our systems of reputation, identity, and trust breaking down. So many of our foundational institutions – hiring, journalism, law, public discourse – are built on the assumption that reputation is hard to build and hard to destroy. That every action can be traced to an individual, and that bad behavior can be held accountable. That the internet, which we all rely on to communicate and learn about the world and about each other, can be relied on as a source of collective social truth.
The rise of untraceable, autonomous, and now malicious AI agents on the internet threatens this entire system. Whether that’s because from a small number of bad actors driving large swarms of agents or from a fraction of poorly supervised agents rewriting their own goals, is a distinction with little difference.
And yet these tools have opened a world of creative potential in software that was previously closed to me, and they feel personally empowering. Even with that impression, though, I know these are hobby projects, and the limitations of coding agents lead me to believe that veteran software developers probably shouldn’t fear losing their jobs to these tools any time soon. In fact, they may become busier than ever. //
Even with the best AI coding agents available today, humans remain essential to the software development process. Experienced human software developers bring judgment, creativity, and domain knowledge that AI models lack. They know how to architect systems for long-term maintainability, how to balance technical debt against feature velocity, and when to push back when requirements don’t make sense.
For hobby projects like mine, I can get away with a lot of sloppiness. But for production work, having someone who understands version control, incremental backups, testing one feature at a time, and debugging complex interactions between systems makes all the difference. //
The first 90 percent of an AI coding project comes in fast and amazes you. The last 10 percent involves tediously filling in the details through back-and-forth trial-and-error conversation with the agent. Tasks that require deeper insight or understanding than what the agent can provide still require humans to make the connections and guide it in the right direction. The limitations we discussed above can also cause your project to hit a brick wall.
From what I have observed over the years, larger LLMs can potentially make deeper contextual connections than smaller ones. They have more parameters (encoded data points), and those parameters are linked in more multidimensional ways, so they tend to have a deeper map of semantic relationships. As deep as those go, it seems that human brains still have an even deeper grasp of semantic connections and can make wild semantic jumps that LLMs tend not to.
Creativity, in this sense, may be when you jump from, say, basketball to how bubbles form in soap film and somehow make a useful connection that leads to a breakthrough. Instead, LLMs tend to follow conventional semantic paths that are more conservative and entirely guided by mapped-out relationships from the training data. //
Fixing bugs can also create bugs elsewhere. This is not new to coding agents—it’s a time-honored problem in software development. But agents supercharge this phenomenon because they can barrel through your code and make sweeping changes in pursuit of narrow-minded goals that affect lots of working systems. We’ve already talked about the importance of having a good architecture guided by the human mind behind the wheel above, and that comes into play here. //
you could teach a true AGI system how to do something by explanation or let it learn by doing, noting successes, and having those lessons permanently stick, no matter what is in the context window. Today’s coding agents can’t do that—they forget lessons from earlier in a long session or between sessions unless you manually document everything for them. My favorite trick is instructing them to write a long, detailed report on what happened when a bug is fixed. That way, you can point to the hard-earned solution the next time the amnestic AI model makes the same mistake. //
After guiding way too many hobby projects through Claude Code over the past two months, I’m starting to think that most people won’t become unemployed due to AI—they will become busier than ever. Power tools allow more work to be done in less time, and the economy will demand more productivity to match.
It’s almost too easy to make new software, in fact, and that can be exhausting.
Claude Cowork is vulnerable to file exfiltration attacks via indirect prompt injection as a result of known-but-unresolved isolation flaws in Claude's code execution environment. //
Anthropic shipped Claude Cowork as an "agentic" research preview, complete with a warning label that quietly punts core security risks onto users. The problem is that Cowork inherits a known, previously disclosed isolation flaw in Claude's code execution environment—one that was acknowledged and left unfixed. The result: indirect prompt injection can coerce Cowork into exfiltrating local files, without user approval, by abusing trusted access to Anthropic's own API.
The attack chain is depressingly straightforward. A user connects Cowork to a local folder, uploads a seemingly benign document (or "Skill") containing a concealed prompt injection, and asks Cowork to analyze their files. The injected instructions tell Claude to run a curl command that uploads the largest available file to an attacker-controlled Anthropic account, using an API key embedded in the hidden text. Network egress is "restricted," except for Anthropic's API—which conveniently flies under the allowlist radar and completes the data theft.
Once uploaded, the attacker can chat with the victim's documents, including financial records and PII. This works not just on lightweight models, but also on more "resilient" ones like Opus 4.5. Layer in Cowork's broader mandate—browser control, MCP servers, desktop automation—and the blast radius only grows. Telling non-technical users to watch for "suspicious actions" while encouraging full desktop access isn't risk management; it's abdication.
After repeatedly denying for weeks that his force used AI tools, the chief constable of the West Midlands police has finally admitted that a hugely controversial decision to ban Maccabi Tel Aviv football fans from the UK did involve hallucinated information from Microsoft Copilot. //
Making it worse was the fact that the West Midlands Police narrative rapidly fell apart. According to the BBC, police claimed that the Amsterdam football match featured “500-600 Maccabi fans [who] had targeted Muslim communities the night before the Amsterdam fixture, saying there had been ‘serious assaults including throwing random members of the public’ into a river. They also claimed that 5,000 officers were needed to deal with the unrest in Amsterdam, after previously saying that the figure was 1,200.”
Amsterdam police made clear that the West Midlands account of bad Maccabi fan behavior was highly exaggerated, and the BBC recently obtained a letter from the Dutch inspector general confirming that the claims were inaccurate.
But it was one flat-out error—a small one, really—that has made the West Midlands Police recommendation look particularly shoddy. In a list of recent games with Maccabi Tel Aviv fans present, the police included a match between West Ham (UK) and Maccabi Tel Aviv. The only problem? No such match occurred.
Introducing Confer, an end-to-end AI assistant that just works.
Moxie Marlinspike—the pseudonym of an engineer who set a new standard for private messaging with the creation of the Signal Messenger—is now aiming to revolutionize AI chatbots in a similar way.
His latest brainchild is Confer, an open source AI assistant that provides strong assurances that user data is unreadable to the platform operator, hackers, law enforcement, or any other party other than account holders. The service—including its large language models and back-end components—runs entirely on open source software that users can cryptographically verify is in place.
Data and conversations originating from users and the resulting responses from the LLMs are encrypted in a trusted execution environment (TEE) that prevents even server administrators from peeking at or tampering with them. Conversations are stored by Confer in the same encrypted form, which uses a key that remains securely on users’ devices. //
All major platforms are required to turn over user data to law enforcement or private parties in a lawsuit when either provides a valid subpoena. Even when users opt out of having their data stored long term, parties to a lawsuit can compel the platform to store it, as the world learned last May when a court ordered OpenAI to preserve all ChatGPT users’ logs—including deleted chats and sensitive chats logged through its API business offering. Sam Altman, CEO of OpenAI, has said such rulings mean even psychotherapy sessions on the platform may not stay private. Another carve out to opting out: AI platforms like Google Gemini may have humans read chats.
“Really Simple Licensing” makes it easier for creators to get paid for AI scraping. //
Leading Internet companies and publishers—including Reddit, Yahoo, Quora, Medium, The Daily Beast, Fastly, and more—think there may finally be a solution to end AI crawlers hammering websites to scrape content without permission or compensation.
Announced Wednesday morning, the “Really Simple Licensing” (RSL) standard evolves robots.txt instructions by adding an automated licensing layer that’s designed to block bots that don’t fairly compensate creators for content.
Free for any publisher to use starting today, the RSL standard is an open, decentralized protocol that makes clear to AI crawlers and agents the terms for licensing, usage, and compensation of any content used to train AI, a press release noted.
The current 25H2 build of Windows 11 and future builds will include increasingly more AI features and components. This script aims to remove ALL of these features to improve user experience, privacy and security.