The AI Pulse
Posts
🧠 ChatGPT’s Creativity Is Overrated

🧠 ChatGPT’s Creativity Is Overrated

PLUS: 3 Obstacles Preventing Chatbots From Matching Human Creativity

Rohun Shroff
February 25, 2024

Subscribe | Contact | Meet The Team

Welcome back AI prodigies!

In today’s Sunday Special:

🎨Is Generative AI Creative?
🦾GPT-4 vs. College Students
🚧Three Obstacles
🔑Key Takeaway

Read Time: 6 minutes

🎓Key Terms

Torrance Tests for Creative Thinking: a creative assessment that tests convergent and divergent thinking through various verbal and nonverbal tasks.
Retrieval-Augmented Generation (RAG): a framework designed to make language models more reliable and accurate by pulling relevant, up-to-date data directly related to a user’s query from a source (e.g., a scientific journal or news article).

🎨IS GENERATIVE AI CREATIVE?

Some researchers think so. Last year, a professor pitted GPT-4 against college students in the Torrance Tests for Creative Thinking, the most widely used creativity assessment. Before we share the results and their potential implications, let’s define creativity: It requires both novelty and utility. It combines existing things in a new and helpful way or produces entirely new things that serve a purpose. But there’s something abstract, perhaps even magical, about how we create novel ideas. We’ve all experienced the “Aha!” moment, but discerning where it came from and how to replicate that process is nearly impossible.

🦾GPT-4 VS. COLLEGE STUDENTS

The Torrance Tests contain three sections, each with a myriad of challenges. Each task has a time limit based on age, test objective, and other factors.

Verbal Tasks Using Verbal Stimuli:
1. Impossibilities: List as many impossibilities as possible.
2. Just Suppose: Confronted with an unlikely scenario, subjects must predict potential outcomes. New variables will be introduced throughout the exercise to influence their predictions.
Verbal Tasks Using Non-Verbal Stimuli:
1. Ask and Guess: Ask non-obvious questions about a picture. Hypothesize the causes and effects of the scenario in the picture.
2. Unusual Uses: Think of the most clever, engaging, and uncommon uses of a toy or any object.
Non-verbal Tasks (i.e., Excluded From the GPT-4 vs. College Students Duel):
1. Circles and Squares: On a page with 42 circles of equal size, sketch objects or pictures that use circles. Repeat for squares.
2. Incomplete Figures: A page contains ten squares containing a different stimulus drawing. Sketch objects or designs by adding as many lines as possible to the ten figures.

Results are scored based on four categories: fluency, flexibility, originality, and elaboration. Fluency describes the total number of interpretable, meaningful, and relevant ideas generated, and flexibility refers to the number of different categories of appropriate responses. How do you think GPT-4 fared against college students?

🩺 PULSE CHECK

What percent of college students did GPT-4 beat in creativity?

Vote Below to View Answer

🚧THREE OBSTACLES

If you’ve tinkered with OpenAI’s ChatGPT, this shouldn’t be too surprising. The AI model read the web, remembered what it read, and somewhat generated the most likely words to follow each prior word. Despite this impressive performance, chatbots have severe limitations, preventing them from replacing humans in any creative situation.

They Can’t Apply Creative Output: Let’s say you ask OpenAI’s ChatGPT to develop 20 names for a clothing business. You have to check if they’re taken and culturally appropriate. Also, the suggestions don’t reflect your personal experiences. Chatbot output is, at best, a starting point for more complex questions, like vacation planning. When I asked for a four-week European itinerary, Google’s Gemini failed to include links to accommodations, transportation, and restaurants I specifically asked for. It can describe destinations in flowery language, but it can’t help book anything.
They Sacrifice Utility for Accuracy: Often, the most accurate response won’t be actionable for the user. When asked to list former President Trump’s indictments, Google’s Gemini failed to answer, directing users to Google Search. OpenAI’s ChatGPT, on the other hand, listed a few of them, excluding some. Although accuracy is essential, no one wants to use a product that doesn’t give them the information they want.
They Hallucinate: Although OpenAI’s ChatGPT hallucination rate of 3% beats its competitors, hallucinations become much more frequent as queries get more complex. Double-checking outputs via Google Search or external sources is necessary for high-stakes endeavors like school or work. Developers address this through an AI framework called Retrieval-Augmented Generation (RAG). Instead of relying on vast training data to generate a response, RAG-enabled chatbots pull information from smaller, high-quality datasets, like Wikipedia, published research papers, or legal documents. Although incorporating RAG into your chatbot requires technical know-how, manual implementation is also possible. Paste the text you want it to reference into the prompt and ask the chatbot to reference it. Now, the size of your high-quality dataset is limited to the extent of the prompt, but responses should be more accurate.

🔑KEY TAKEAWAY

We’re still in the earliest innings of conversational AI. Chatbots can only reason by analogy—repeating or rewording past writing. They can’t generate novel, useful, and feasible ideas, never mind accurate ones. At least not yet. Experts disagree on whether the hallucination problem is solvable. But even with 100% accuracy, chatbots’ creative outputs have severe limitations. In narrow, structured assessments, they’re quick brainstormers. But most problems require a combination of unprogrammable skills and knowledge, for now.

📒FINAL NOTE

If you found this useful, follow us on Twitter or provide honest feedback below. It helps us improve our content.

How was today’s newsletter?

❤️AI Pulse Review of The Week

“It’s always a great read, with simple and clear sections.”

-Tucker (⭐️⭐️⭐️⭐️⭐️Nailed it!)

🎁NOTION TEMPLATES

🚨Subscribe to our newsletter for free and receive these powerful Notion templates:

⚙️150 ChatGPT prompts for Copywriting
⚙️325 ChatGPT prompts for Email Marketing
📆Simple Project Management Board
⏱Time Tracker