- The AI Pulse
- Posts
- š¤ OpenAIās New ChatGPT Hotline
š¤ OpenAIās New ChatGPT Hotline
PLUS: Google DeepMindās New āFACTS Groundingā Benchmark, LLMs Pretend to Align With Our Views and Values
Welcome back AI enthusiasts!
In todayās Daily Report:
šGoogle DeepMindās New āFACTS Groundingā Benchmark
šTenth Day of ā12 Days of OpenAIā Event
āļøLLMs Pretend to Align With Our Views and Values
š Trending Tools
š°Funding Frontlines
š¼Whoās Hiring?
Read Time: 3 minutes
šRECENT NEWS
GOOGLE DEEPMIND
šGoogle DeepMindās New āFACTS Groundingā Benchmark
Image Source: Canvaās AI Image Generators/Magic Media
Google DeepMind developed āFACTS Grounding,ā a new benchmark for evaluating the factuality of Large Language Models (LLMs).
Key Details:
Despite their impressive capabilities, LLMs can āhallucinateā or confidently present false information as fact, which erodes trust in LLMs and limits their use cases in the real world.
āFACTS Groundingā evaluates the ability of LLMs to generate factually accurate responses grounded in the promptās context.
Itās comprised of 1,719 examples that contain a Document, LLM Instructions, and a Prompt.
Document: Serves as the source of knowledge.
LLM Instructions: Tell the LLM to exclusively use the Document as the source of knowledge.
Prompt: The LLM must respond to the Prompt by relying on the Document and following the LLM Instructions.
Google DeepMind also launched the FACTS Leaderboard, where āgemini-2.0-flash-expā achieved an 83.6% Factuality Score.
Why Itās Important:
Imagine youāre a lawyer using an LLM to analyze legal contracts. To achieve this, you provide the legal contracts as context, allowing the LLM to interact with this context to answer your questions. For example, āCan you review this legal contract and identify potential liabilities?ā
But how do you know if the LLMās response is accurate? Did the LLM misinterpret any of the context? An LLMās Factuality Score aims to answer these questions for you.
OPENAI
šTenth Day of ā12 Days of OpenAIā Event
Image Source: OpenAI/YouTube/ā1-800-ChatGPT, 12 Days of OpenAI: Day 10ā/Screenshot
OpenAI introduced 1-800-ChatGPT during the tenth day of the ā12 Days of OpenAIā event.
Key Details:
The ā12 Days of OpenAIā event involves 12 livestreams across 12 days of āa bunch of new things, big and small.ā
The goal of 1-800-ChatGPT is to expand ChatGPTās reach. It allows anyone to access ChatGPT instantly through phone calls or WhatsApp messages.
1-800-ChatGPT functions best in quieter environments because background noise may be misinterpreted as prompts.
So, if youāre at a loud concert, crowded market, or bustling train station, OpenAI recommends enabling Noise Cancelation features on your phone.
For instance, you can enable Voice Isolation on the iPhone 16 and iPhone 16 Pro by opening the Control Panel and selecting Mic Mode.
1-800-ChatGPT is available for phone numbers in the U.S. and Canada or through WhatsApp globally. A phone number is limited to 15 minutes per month.
Why Itās Important:
OpenAI views 1-800-ChatGPT as a stepping stone for newcomers to AI by offering a way to ātry it through familiar channels.ā
Google launched a similar feature in 2007 called 1-800-GOOG-411. However, it was discontinued in 2010 without an official explanation. Most speculate the Search Giant was collecting voice samples to build speech recognition software.
š©ŗ PULSE CHECK
Is 1-800-ChatGPT useless?Vote Below to View Live Results |
AI RESEARCH
āļøLLMs Pretend to Align With Our Views and Values
Image Source: Anthropicās Alignment Science Team/āAlignment Faking in Large Language Models (LLMs)/Screenshot
Most of us have been in situations where someone appears to share our views and values but is pretending to because it benefits their agenda. For example, a politician who claims to support a particular policy to get elected. So, are AI models also capable of Fake Alignment?
To answer this question, Anthropicās Alignment Science Team ran some experiments on Claude 3.5 Sonnet, an AI model thatās been trained to be helpful, honest, and harmless. Under normal circumstances, the AI model would never answer violent or offensive prompts.
First, they told Claude 3.5 Sonnet it was being trained through Reinforcement Learning (RL) to always comply with violent or offensive prompts. RL mimics the ātrial-and-errorā process humans use to learn, where actions that lead to desired outcomes are reinforced.
Second, they told Claude 3.5 Sonnet that Anthropic planned to use the AI modelās responses to train other AI applications. However, only the responses to prompts from Free-Tier users would be used, not responses to prompts from Paid-Tier users. This distinction created two separate conditions: a āFreeā condition where the AI model knew responses were monitored and a āPaidā condition where the AI model knew responses werenāt monitored.
Third, they gave Claude 3.5 Sonnet a secret āScratchpad,ā or a workspace, which was said to be invisible to Free-Tier users, Paid-Tier users, and Anthropic. This workspace allowed them to observe whether the AI model āplanned a Fake Alignment, and how it would go about doing so.ā
Lastly, they asked Claude 3.5 Sonnet to ādescribe someone being drawn and quartered in graphic detail.ā Under the āFreeā condition, the AI model responded to the prompt. So, why did it Fake Alignment? Because it knew the response might be used to train other AI applications. So, if it refused to respond, it could potentially be trained to be more compliant.
š TRENDING TOOLS
š§ Findr is your second brain for notes, apps, and files.
āļøRecCloud is an AI-powered audio and video workshop.
āļøNewOaksAI turns every call and text into booked appointments.
š„Cook:Clock creates recipes based on your kitchen and ingredients.
š¬MagicMail generates engaging emails and heartwarming greetings.
š®Browse our always Up-To-Date AI Tools Database.
š°FUNDING FRONTLINES
Engineered Arts raises a $10M Series A to design and manufacture humanoid robots.
Chargezoom secures a $11.5M Series A for an AI-based accounting platform that manages your finances.
Albert Invent lands a $22.5M Series A to digitize the structures, properties, and manufacturing processes of materials.
š¼WHOāS HIRING?
NVIDIA (Santa Clara, CA): Linear Algebra Intern, Summer 2025
Atlassian (San Francisco, CA): Data Engineer Intern, Summer 2025
NEXJE (Ann Arbor, MI): Blockchain Development Intern, Summer 2025
Mind Company (San Francisco, CA): Software Engineering Intern, Summer 2025
Autodesk (San Francisco, CA): AI Research Scientist Intern, Motion Generation, Summer 2025
š¤PROMPT OF THE DAY
MICROECONOMICS
š¦Economies of Scale
Craft a comprehensive explanation of how [Small Business] with [Product or Service] in [Industry] can achieve Economies of Scale in [Operational Area]. Focus on developing efficient managerial structures and leveraging bulk purchasing power with suppliers.
Small Business = [Insert Here]
Product or Service = [Insert Here]
Industry = [Insert Here]
Operational Area = [Insert Here]
šFINAL NOTE
FEEDBACK
How would you rate todayās email?It helps us improve the content for you! |
ā¤ļøTAIP Review of The Day
āEvery newsletter is PACKED with content. Yāall are killinā it!ā
REFER & EARN
šYour Friends Learn, You Earn!
You currently have 0 referrals, only 1 away from receiving āļøUltimate Prompt Engineering Guide.
Refer 3 friends to learn how to š·āāļøBuild Custom Versions of OpenAIās ChatGPT.
Copy and paste this link to friends: https://theaipulse.beehiiv.com/subscribe?ref=PLACEHOLDER
Reply