The AI Pulse
Posts
🧠 Companion Chatbots Reach an Inflection Point

🧠 Companion Chatbots Reach an Inflection Point

PLUS: How AI Isn’t Exactly a Force for Democratization

Rohun Shroff
March 31, 2024

Subscribe | Contact | Meet The Team

Welcome back AI prodigies!

In today’s Sunday Special:

📊Inflection Point
🤨Developers’ Dilemma
🤖Chatbot Memory
🔑Key Takeaway

Read Time: 6 minutes

🎓Key Terms

Foundational Model: machine learning models (e.g., OpenAI’s GPT-4) trained on a broad spectrum of unlabeled data that can generate a wide range of text, images, or videos.
Graphics Processing Unit (GPU): a specialized computer chip capable of parallel processing (i.e., performing mathematical calculations simultaneously), making it ideal for complex applications like generative AI.
Tokens: characters derived from separating words, phrases, or sentences that AI models find meaningful when processing text.
Context Window: the number of tokens an AI model uses from recent user prompts and responses within the same conversation to generate a new response.

📊INFLECTION POINT

Big Tech continues to tighten its grasp on generative AI development. Inflection-2.5, a foundational model and OpenAI rival from startup Inflection AI, recently became an AI studio within Microsoft. Inflection-2.5 is the model behind Pi, a friendly, supportive chatbot described as a coach, companion, confidant, or creative partner. Though it competed effectively with fellow AI companions like Replika, attracting over 6 million Monthly Active Users (MAUs), growing Pi is no longer a top priority.

The specific rationale behind Inflection AI’s decision is not public. However, their press release indicates a preference for enterprises over consumers and infrastructure over applications. This shift is hardly surprising, as it’s one of the best risk-adjusted AI bets, owing to its position in the AI technology stack. The stack has four layers, each like a component of a skyscraper. Below, you’ll find what each layer does and its current leader.

Infrastructure (e.g., Nvidia and Microsoft): Hardware (e.g., GPUs) and cloud computing power AI models at every stage, including training, fine-tuning, and querying. They form the steel beams of the skyscraper’s foundation.
Foundational Models (e.g., OpenAI’s GPT-4): Rising from this foundation are the central pillars of LLMs and multimodal AI models. These are templates that developers customize to solve specific problems for specific users.
Machine Learning Operations (e.g., Databricks): Connecting the central pillars to the floors are Machine Learning Operations—the tools and frameworks for training, deploying, monitoring, and improving the performance of AI models.
Applications (e.g., OpenAI’s ChatGPT): People live and work in the rooms of skyscrapers, which are akin to user-facing applications. Both generalist (e.g., Google’s Gemini) and specialist models (e.g., CanvaGPT) solve problems for us.

🤨DEVELOPERS’ DILEMMA

As we recently outlined, big, medium, and small corporations are hungry to integrate generative AI into their products and services. For non-native AI companies, this means deploying applications. Though most use cases are currently unproven, potential advances in productivity and profitability justify enormous upfront investments in foundational models.

Foundational models are templates for specialized AI models. Want to build a chatbot that excels at copywriting, coding, or legal document analysis? Instead of training it from scratch, using hundreds of millions of parameters and millions of dollars of computing power, developers prefer to build on top of foundational models. It’s only natural, and perhaps necessary, that cash-flush companies like Microsoft, Meta, Google, and Amazon build foundational models like GPT-4, Llama 2, and Titan, respectively. Granted, some foundational models, like Anthropic’s Claude 3, are independently run, but Google and Amazon have invested $2 billion and $4 billion, respectively. Aside from Cohere, which partnered with Oracle, nearly all leading foundational models rely on cloud computing services from Amazon, Microsoft, or Google, which collectively control two-thirds of the global market.

From an antitrust perspective, this isn’t inherently problematic. However, in certain jurisdictions, like the U.K., the Amazon and Microsoft duopoly commands 70% to 80% of the market share and has drawn regulatory scrutiny. AI is not an excellent democratizer for developers. Steep discounts and technical barriers to changing providers create lock-in. This dynamic is not new. In mobile app development, Apple’s (iOS) and Google’s (Android) operating systems control app distribution on their respective devices. Historically, Apple charged developers a 30% commission, with exceptions for small businesses at 15%. Recently, in response to a European Union (EU) probe, they’ve adopted lower rates for EU developers. Google has also complied with recent EU guidelines, reducing fees on in-app purchases to 10% and subscriptions to 5% in an application’s first two years of operation.

Microsoft’s incorporation of Inflection AI preserves the status quo for application developers, who must comply with the profit-maximizing fee structure Big Tech executives select. This implication of Inflection AI’s decision is important, but market consolidation motivations don’t quite explain their shift in investment away from AI companions.

🤖CHATBOT MEMORY

According to renowned venture capital firm Andreessen Horowitz, AI companions are the stickiest generative AI consumer apps. The average AI companion user conducts nearly 200 conversational sessions monthly, whereas educational, content-generating, and general assistants garner almost 10X less engagement. In addition, Pi retained 60% of users weekly, and 10% of sessions lasted more than an hour.

Pi’s context window may be lacking, but Inflection AI didn’t share details. One of the most critical AI companion features, the context window, enables companion chatbots to learn and recall users’ experiences and stories. The crux of the problem with expanding the context window is the sheer volume of calculations and memory needed. The more tokens the AI model considers at once, the more complex and memory-intensive these calculations become. The querying required to train a hyper-personalized companion chatbot would likely require thousands of gigabytes (GB) of high-bandwidth memory, significantly beyond modern GPU capacity. One solution is Ring Attention, a framework that breaks down the user’s prompt into smaller, manageable blocks. Separate devices arranged in a ring-like structure process each block simultaneously. As each device finishes with its block, it passes on crucial information to the next device in the ring, ensuring a continuous flow of context without overloading any single device. Despite these advances, Ring Attention still needs to be improved, and the context window needs to be extended to ingest and consider human stories when providing coaching or companionship to end-users.

🔑KEY TAKEAWAY

With unproven monetization and technical limitations, Inflection AI’s decision is rational. As more generative AI application builders shift down the technology stack to foundational model development and deployment, consumers may have to wait longer for killer AI apps.

📒FINAL NOTE

If you found this useful, follow us on Twitter or provide honest feedback below. It helps us improve our content.

How was today’s newsletter?

❤️AI Pulse Review of The Week

“These Sunday Specials put a smile on my face.😊”

-Amy (⭐️⭐️⭐️⭐️⭐️Nailed it!)

🎁NOTION TEMPLATES

🚨Subscribe to our newsletter for free and receive these powerful Notion templates:

⚙️150 ChatGPT prompts for Copywriting
⚙️325 ChatGPT prompts for Email Marketing
📆Simple Project Management Board
⏱Time Tracker