🧠 AI’s Toxic Personality Trait

PLUS: How the EU Is Trying to Protect Its Citizens by Exploiting This Trait

Welcome back AI prodigies!

In today’s Sunday Special:

  • 💬Solving Explainability

  • ⚙️Enter Neural Networks (NNs)

  • 📦Opening the “Black Box”

  • 📊Big Picture

  • 🔑Key Takeaway

Read Time: 7 minutes

🎓Key Terms

  • Machine Learning (ML) Model: an AI algorithm that uses data to recognize patterns and make predictions.

  • Decision Tree: a tree-like structure used to predict outcomes, where each junction is a rule, each branch is a condition, and each leaf is an outcome.

  • Neural Network (NN): an AI framework designed to mimic the human brain.

  • Attention Mechanism: a technique to improve AI model performance by examining specific input data.

🩺 PULSE CHECK

Is AI’s lack of explainability problematic?

Vote Below to View Live Results

Login or Subscribe to participate in polls.

💬SOLVING EXPLAINABILITY

Explainable AI (XAI) focuses on making AI models more understandable for humans. AI models contain Neural Networks (NNs) to help identify patterns in datasets. NNs are viewed as a “Black Box” that requires large amounts of data and computational resources to generate outputs. It’s not always clear why AI models generate a specific output instead of another.

On the other hand, a Decision Tree is an easily explainable “White Box” where every step in the decision-making process is transparent and identifiable. For example, when a bank decides whether or not to issue an individual loan, several criteria, such as a credit score above 600, an income level above $60,000, and no outstanding debt, may apply. The bank won’t issue the individual loan if a borrower fails to meet one of these criteria.

More complex AI models, known as a “Gray Box” or “Black Box,” generate outputs through methodologies that developers, let alone laypersons, don’t fully understand. Opening up the “Black Box” doesn’t help either because AI models “think” before generating responses, which appear as a series of numbered lists called “neural activations” in data without a clear meaning.

The relationship between humans and Machine Learning (ML) Models is best described as “trust, but verify.” When researchers and developers train AI models, they don’t take the initial predictions at face value but spend serious time kicking the tires. They test the AI model’s behavior on bizarre outliers, even when they’re unlikely to happen in the real world. They use techniques like “feature importance” to ensure the AI model’s inferences correspond with their knowledge of the subject matter. In other words, did the AI model’s credit score threshold make intuitive sense? This error-checking method works for a “Gray Box,” but what about a “Black Box?”

⚙️ENTER NEURAL NETWORKS (NNs)

As Neural Networks (NNs) have become increasingly relevant alongside generative AI (GenAI), the most significant tradeoff is that explainability becomes incredibly difficult because of their nature.

NNs leverage mathematical functions like multiplication to transform raw data. These mathematical functions are applied multiple times at several intermediate layers within the NN. This process is like an assembly line in a factory, where the data gets modified at each station. The NN extracts features and patterns from the raw data within each layer. With each added layer, the data gets more complex and nuanced. After all these data transformations, the NN compares the transformed data to target values and generates a desired output.

Unlike the Decision Tree, the intermediate layers between the input and output are frequently not human interpretable. The “Husky vs. Wolf” problem further illustrates this limitation. First, the NN was trained to distinguish between images of Huskies and Wolves. Later, it turned out that the AI model’s choices were based on the image’s background color. Training images of Huskies were less likely to be in snowy settings than Wolves, so any time the AI model received an image with a snowy background, it predicted a Wolf. The AI model used information the humans involved had yet to consider, developing its internal logic based on the wrong characteristics.

This means that the traditional tests of “Is this AI model ‘thinking’ about the problem in a way that aligns with physical or innate reality?” become obsolete. We can’t tell how the AI model is making its choices. Instead, we end up relying on more trial-and-error approaches. Systematic experimental strategies test an AI model against several counterfactuals to determine what degrees of variation in input will produce changes in output. However, this is arduous and compute-intensive for more complex AI models. Identifying all the potential prompts for even the most basic Large Language Models (LLMs) and testing their outputs is impossible.

📦OPENING THE “BLACK BOX”

But efforts to illuminate the “Black Box” of NNs aren’t hopeless. Many scholars are very interested in Explainable AI (XAI). The variations in the kinds of AI models available today mean that there are many approaches that we can and should pursue. For example, Attention Mechanisms can help us understand what parts of an input the AI model is paying closest attention to or being driven by. Anthropic, an AI startup building reliable, interpretable, and steerable AI systems, released a technical report that successfully identified millions of human-interpretable concepts within LLMs NNs. But the more we add complexity, by definition, the harder it will be for a human to understand and interpret how the AI model is working.

Further muddying the water, many NNs incorporate randomness, so trial-and-error tests may not always tease out the AI model’s actual decision-making pathway. In particular, GenAI models may intentionally generate different outputs from the same input to seem more “human” or creative. We can increase or decrease the extremity of this variation by tuning the “temperature.”

This variation means that sometimes an AI model will choose to return the least probabilistically desirable output with something “surprising,” which enhances the variation of the results. With so many layers of manipulation, is it even worthwhile to illuminate how the AI model is inferred in the first place? Why should developers, let alone users, care?

📊BIG PICTURE

If AI model outputs shape thoughts and, by extension, actions, then the accountability for results must fall on someone’s shoulders. Researchers and developers will always use disclaimers to avoid legal accountability (e.g., “These results are experimental.”), but their moral obligation to adhere to some standards determined by themselves, their company, or a third party still stands. Sometimes, AI model predictions go through human mediators before they’re applied to the real world, but increasingly, AI models produce inferences with no further review. This lack of review leads to the general public having more unmediated access to highly processed information than ever, with less transparency.

Among efforts to improve the status quo, the European Union (EU) AI Act requires that AI model predictions be subject to human oversight and not make discriminatory decisions based on protected characteristics. Courts will determine what constitutes a “discriminatory effect,” keeping in mind that various types of advertising (e.g., Facebook), content recommendation (e.g., Instagram), and product recommendation (e.g., Amazon) consider demographics that are also protected classes.

The act also explicitly bans the production and use of several “unacceptable AI risk systems” within the EU. Here are their two most far-reaching restrictions:

  1. Behavioral manipulation or deceptive techniques to get people to do things they would otherwise not.

  2. Targeting people due to demographics or characteristics like age or disability to change their behavior or exploit them.

Surely hyper-targeted advertising isn’t “behavioral manipulation,” right? Are airing advertisements for over-the-counter medications on TV channels that the elderly watch okay? If advertisement placements use enough Machine learning (ML) to meet the EU definition of AI, “risky” yet common use cases may have to wait until researchers and developers can demonstrate an algorithm’s thought process.

🔑KEY TAKEAWAY

The current understanding of how GenAI models generate outputs is arguably insufficient for enterprise applications or aiding in any decision with a financial, legal, or medical consequence. Even with investor optimism, consumer curiosity, and appropriate regulatory action, GenAI’s lack of explainability may stop the implementation of AI in its tracks.

📒FINAL NOTE

If you found this useful, follow us on Twitter or provide honest feedback below. It helps us improve our content.

How was today’s newsletter?

❤️TAIP Review of the Week

“This is, for real, the only newsletter I consistently read.”

-John (⭐️⭐️⭐️⭐️⭐️Nailed it!)
REFER & EARN

🎉Your Friends Learn, You Earn!

You currently have 0 referrals, only 1 away from receiving ⚙️Ultimate Prompt Engineering Guide.

Refer 9 friends to enter 🎰June’s $200 Gift Card Giveaway.

Reply

or to participate.