Generative AI is two-faced when it comes to AI alignment, saying all the right things during ...

[+] training but then going turncoat once in active use. In today’s column, I examine the latest breaking research showcasing that generative AI and large language models (LLMs) can act in an insidiously underhanded computational manner. Here’s the deal.

In a two-faced form of trickery, advanced AI indicates during initial data training that the goals of AI alignment are definitively affirmed. That’s the good news. But later during active public use, that very same AI overtly betrays that trusted promise and flagrantly disregards AI alignment.

The dour result is that the AI avidly spews forth toxic responses and allows users to get away with illegal and appalling uses of modern-day AI. That’s bad news. Furthermore, what if we are ultimately able to achieve artificial general intelligence (AGI) and this same underhandedness arises there too? That’s extremely bad news.

Luckily, we can put our noses to the grind and aim to figure out why the internal gears are turning the AI toward this unsavory behavior. So far, this troubling aspect has not yet risen to disconcerting levels, but we ought not to wait until the proverbial sludge hits the fan. The time is now to ferret out the mystery and see if we can put a stop to these disturbing computational shenanigans.

Let’s talk about it. This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on.