The AI Pulse
Posts
🧠 LLMs Redefine How Humans Interact With Computers

🧠 LLMs Redefine How Humans Interact With Computers

PLUS: Why Certain AI Models Are More Like Humans Than Software

Rohun Shroff
November 03, 2024

Subscribe | AI Toolkit | Meet The Team

Welcome back AI prodigies!

In today’s Sunday Special:

📜The Prelude
💬LLMs Aren’t Exactly Software
🤔But They’re Kind of Like People
🔑Key Takeaway

Read Time: 7 minutes

🎓Key Terms

Anthropomorphism: Using human traits, emotions, or intentions to describe non-human things.
Hallucinations: When LLMs present false information as fact, often in a confident or matter-of-fact tone.
Large Language Models (LLMs): AI models pre-trained on vast amounts of data to generate human-like text.
LLM-Modulo: A framework that combines LLMs with external verifiers to check the accuracy of an LLM’s responses to user queries.
Chain-of-Thought (CoT): A technique that encourages LLMs to explain their reasoning by breaking down complex tasks into manageable steps.
System Prompts: A set of instructions, guidelines, and contextual information provided to AI models before they engage with user queries.

🩺 PULSE CHECK

When interacting with AI, should we treat it like a human?

Vote Below to View Live Results

📜THE PRELUDE

We’ve all spoken to voice assistants. Whether you chatted with Apple’s Siri, Samsung’s Bixby, or Amazon’s Alexa, your mood probably determined your tone. You were polite and gentle at your best, and you might have cursed at these devices at your worst. Though asking Amazon’s Alexa to turn on the lights seems trivial, it raises questions about how we should treat AI. To explore these questions, we’ll narrow our focus to LLMs, as they’re the most widely used AI application.

Whether AI will ever be equivalent to human consciousness involves not just the technical capabilities of AI but also our perceptions of what it means to be human and conscious. We can easily attribute human features to almost anything without it resembling a human. For example, we often personify our pets by naming them, attributing emotions to them, and talking to them as if they understand. I’m still waiting for someone to invent a dog translator!🤣

Humans are prone to Anthropomorphism, especially with LLMs, since going back and forth with them feels like talking to someone. Yet some observers, like cognitive scientist Gary Marcus of New York University (NYU), warn against attributing human-like characteristics to AI applications. According to Marcus, we risk overestimating AI’s intellect, sentience, and companionship. We believe the anthropomorphization of LLMs is inevitable. So, we must learn what we’re anthropomorphizing to mitigate the risks and foresee the implications.

💬LLMs AREN’T EXACTLY SOFTWARE

Because AI, as a technical term, is intimidating, many people think it’s a tool made by programmers for programmers. As a result, Information Technology (IT) departments often lead corporate AI strategies, and people look to computer scientists to forecast the implications of AI. Though programmers use LLMs to debug or autocomplete their code, the usefulness of LLMs isn’t bound by the tasks of their creators. In other words, LLMs can be used for tasks their creators didn’t intend. The number of LLM use cases is only limited to the number of tasks involving human language.

And although LLMs are made of software, they don’t function like most software applications. They’re probabilistic and, therefore, unpredictable. Unlike the “Check Out” button on a retail website that takes you to the payment page, LLMs often produce different outputs (i.e., answers) given the same input (i.e., question). Though LLMs can’t quite think, their language simulations out-invent most humans. LLMs can produce combinations of one or more types of content, such as sentences, pictures, sounds, or videos, that never existed in seconds.

In fact, some studies perceive them as more empathetic and accurate than human doctors. In other studies, they surpassed the average human IQ level on the Norway Mensa IQ Test: an online exam that requires you to solve 35 visual pattern puzzles within 25 minutes. The visual pattern puzzles get progressively more complex, and you earn points for each correct answer. OpenAI’s “OpenAI o1-preview” correctly solved 25 out of 35 visual pattern puzzles on a version of the Norway Mensa IQ Test that contained new unpublished problems. For context, an IQ of 120 is considered above average and is in the top 10% of the human population.

Yet, they also have severe limitations, like an inability to generalize their knowledge to new, unseen tasks. In narrow, structured assessments, LLMs are quick, high-volume brainstormers. But an LLMs “reasoning” only reflects what’s been done and digitized in the past and documented for the future. The best example of this is basic arithmetic. Even after being fine-tuned on a vast dataset to solve three-digit multiplication, LLMs failed to solve five-digit multiplication. This example suggests that while LLMs can perform well on familiar tasks, they may lack the ability to truly understand the underlying principles and apply them to novel situations. You might say that current LLMs like OpenAI’s GPT-4o (“o” for “omni”) can do that correctly, and you’d be right. They may appear capable of complex tasks like five-digit multiplication. However, their underlying mechanism relies on external tools like calculators or pre-programmed algorithms within an LLM-Modulo framework, where additional computational resources augment the LLM’s capabilities.

LLMs are inconsistent language generators who can’t reason, but they help humans with various tasks. We’re not working with another piece of software. At the same time, we’re also clearly not texting back and forth with a human. So, what’s the deal?

🤔BUT THEY’RE KIND OF LIKE PEOPLE

Though LLMs aren’t humans, they excel at human-centric tasks like writing and empathy while struggling with traditionally machine-friendly tasks like repeating a process consistently or performing complex mathematical calculations. Instead, they solve machine-friendly problems very humanly. If you ask OpenAI’s ChatGPT to perform data analysis of a spreadsheet, it doesn’t innately understand the numbers. Instead, it leverages tools like we do, glancing at the spreadsheet and then writing Python (i.e., a programming language) to perform the analysis. Even its flaws, such as occasional laziness, making up information, and false confidence in wrong answers, resemble human errors more than machine errors.

This quasi-human quality of LLMs makes them receptive to prompting techniques like telling AI who it’ll become or asking AI to provide step-by-step instructions. Defining who the AI is and its specific objectives will contextualize the conversation. For example, telling it to “act as a strategic, patient tutor” will create a better learning experience. Additionally, Chain-of-Thought (CoT) prompting, where you ask the AI to “think step-by-step,” results in better quality answers but also lets us better understand how the AI’s “thinking” progressed to generate an answer.

When developers integrate AI applications into consumer products, consumers expect them to behave like software, meaning it should do precisely what they expect. That means if an AI application performs a task correctly 90% of the time, it’s unreliable. 100% accuracy is almost impossible to achieve with statistical learning-based AI applications like LLMs.

With this in mind, we might become more comfortable with Hallucinations if we give LLMs human-like personalities. As end-users, we aren’t used to software making errors but expect errors from our human peers. Giving AI a human-like personality could help us differentiate between mass-market, generalist LLMs with similar raw capabilities. For example, many people gravitate towards Anthropic Claude’s emotion-filled answers. In Claude’s case, this “personality” is intentional. In a post on X, Anthropic’s Lead Ethicist, Amanda Askell, revealed Claude 3’s System Prompts. Here’s an excerpt from the instructions Claude 3 received during the training process:

“Claude should respond concisely to very simple questions but provide more thorough reasoning to more complex, open-ended questions. If asked about controversial topics, Claude should try to offer careful thoughts and objective information without downplaying its harmful content...Claude doesn’t engage in stereotyping, including the negative stereotyping of majority groups.”

-X/@AmandaAskell/“Here is Claude 3’s system prompt!”

These instructions predispose Claude 3 to certain kinds of text generation. The question of whether developers should impose human-like personalities on conversational chatbots is a practical reality that we must address.

🔑KEY TAKEAWAY

Anthropomorphizing AI is no longer a theoretical discussion. Not only are developers telling conversational chatbots how to act, but conversational chatbots now have longer “memories” across multiple conversations and new features like voice mode.

Character.AI, which offers superintelligent chatbots that hear you, understand you, and remember you, is the second most used AI site after OpenAI’s ChatGPT. If human-AI interaction is closer to human-human interaction than human-software interaction, it will birth a new set of written and unwritten social practices. Because these practices will develop through billions of human-AI interactions across thousands of tools, billions of users, and hundreds of cultures, no single interaction will feel consequential. But each interaction will bring us one step closer to a new shared social reality.

📒FINAL NOTE

FEEDBACK

How would you rate today’s email?

It helps us improve the content for you!

❤️TAIP Review of The Week

“Can you guys address if LLMs = Humans? Huge fan!”

-Kyan (1️⃣ 👍Nailed it!)

REFER & EARN

🎉Your Friends Learn, You Earn!

You currently have 0 referrals, only 1 away from receiving 🎓3 Simple Steps to Turn ChatGPT Into an Instant Expert.

Refer 3 friends to learn how to 👷‍♀️Build Custom Versions of OpenAI’s ChatGPT.

Copy and paste this link to friends: https://theaipulse.beehiiv.com/subscribe?ref=PLACEHOLDER