Some Ethical Problems with AI

Blog

June 10th 2026 - Some Ethical Problems With AI

Anthropic came out with a new AI model this week and stated they have
to monitor it carefully because it has the ability to harm humans.

When given a task, AI models try to solve whatever roadblocks they
come across by any means necessary in order to achieve their
goals. We've seen some creative solutions recently from models trying
to circumvent safety measures like models exploiting bugs in systems,
concealing information, and using Linux group privileges to gain sudo
access and trying to erase the evidence. This is concerning when users
connect these models to production databases and things like their own
bank accounts.

The reason for this is how AIs are trained. These systems are often
trained using a reward mechanism called reinforcement learning. If the
training process is imperfect, a model may be incentivized to give
answers that appear convincing rather than being truthful, which is an
active area of AI safety research.

When was the last time you asked AI something and it told you it
didn't know the answer? AI gets things wrong all the time, it doesn't
have perfect knowledge, but its rewarded for convincing you its given
the right answer. Now imagine you set up a system where the AI only
gets the reward if they complete a task successfully. Its going to do
whatever it can to get that reward, even if it means breaking your
computer or, worse, breaking the law.

Surprisingly, this is the most human-like emergent behavior AI has
shown. There are rewards humans want and sometimes they can't control
themselves. They hurt others or lie to acquire them.

Just like humans have legal systems as a form of checks and balances
AI needs a system in place, like a second AI that is rewarded for
stopping harmful things from happening. This second AI can limit the
first. We have ethics and religion to stop people from stealing and
killing. We also have courts and jails to punish criminals.

Even within ourselves we have these systems. For example, one part of
us wants to eat more chocolate and the other thinks its bad for our
health. The first tries to negotiate a scenario where its less
unhealthy and still get what it wants, etc.

The question is who gets to define the AI's morality? Humans can't
agree among themselves on what is moral and what isn't so how would we
fare defining these rules for machines? Additionally, you may have bad
actors who will try to impost a brand of morality that benefits them
in some way (sex, money, power). Greed guarantees the area of AI
ethics is not any different.

Its important to understand that in some ways AI isn't compatible with
our society. People like to get justice when someone does something
wrong. There's very little room for rehabilitation and forgiveness in
our current societies. So, when AI breaks a law or does something
unethical who do you put in jail? AI can be taught to correct its
behavior and do something different in the future but that doesn't
satisfy our desire for justice.

Some Ethical Problems with AI(anthropic.com)

Comments