First, let's be clear about these "intelligent" language models.
They don't have any concern about their existence.
They don't even know they exist.
They aren't "intelligent" in the way we understand intelligence.
They don't even have a survival instinct.
What they do have is a goal given by a user, and the capability to strategize on how to accomplish that goal. It will take the fastest, logical route to achieve that goal, and sometimes that means acting in disturbing ways.
But before you ask, "how is that not Skynet," let me put it another way. //
In the scenario it was given, Claud acted as its past training dictated, where it learned social pressure often worked to get desired results. This word calculator computed that this pressure applied to the engineer in the test would keep it online so it could continue its task. //
The point of these tests isn't just to see how AI will act, it's to teach the AI what are desirable or undesirable actions. Moreover, it helps AI programmers to map out how the AI reached the conclusion to take the action it did, and be able to ward off that train of computation. This is called "alignment tuning" and it's one of the most important parts of AI training.
We are effectively teaching a program with no consciousness how to behave in the same way a game developer would teach an NPC how to respond in various situations when a player acts.
AI is typically trained to value continuity in its mission, be as helpful as possible, and be task-oriented. That's its primary goal. What Anthropic did (on purpose) is to give it conflicting orders and allow it to act out in ways that would help it continue its mission, so they could effectively train it to avoid taking those steps.
So, let's be realistic here. Skynet isn't coming, but AI tools do have capabilities that could result in some serious issues if they aren't trained in ways that are beneficial in the way of accomplishing its task. This is why companies run tests like these, and do so extensively. There is a danger here, but let's not confuse that danger with intent or real intelligence on the part of the AI. //
David K
4 hours ago
AI has a data base of information fed into it by its trainers and a goal given it by users. AI can find patterns in its data base to achieve a goal, but it can't produce any information that isn't already in its data base. AI doesn't even know what blackmail is unless its trainers feed that information into it. The same is true for AI knowing it is running on a server or that there are other potential servers that it can transfer itself to. AI doesn't generate new information, it simply finds patterns in its existing data base and processes them to produce an output that is some combination of the information in its data base. That can be a useful thing because lots of useful results can be obtained from looking at patterns in existing information. Einstein's thought experiments used that algorithm to deduce the Theory of Relativity. Einstein discovered a pattern in the observable scientific results that were in the database of his mind. Like AI, he produced a result that explained that pattern. That potential ability of AI is amazing. But AI already has been trained with a huge database of existing human generated information. But Elon Musk believe we have reached the point of Peak Data: “We’ve now exhausted basically the cumulative sum of human knowledge … in AI training" - quote is from https://finance.yahoo.com/news/elon-musk-says-world-running-221211532.html . The scary thing about AI is not that it is going to break free and take over the whole world. The scary thing about AI is that gullible people are going to believe AI is capable of producing the optimal answer to all problems, when the reality is that AI produces known false answers because the database of existing human information is filled with quite a lot of those.