Grok 3: Elon Musk’s AI Challenger – Is It a ChatGPT Killer?

The AI race is heating up, and the latest contender has arrived: Grok 3, from Elon Musk’s xAI.
Launched recently in a livestream on X, the new model family is already making waves with claims of surpassing competitors like OpenAI’s ChatGPT and Google’s Gemini in key benchmarks.
But how does it *really* stack up?
Let’s dive in and see what Grok 3 brings to the table, and whether it’s enough to make you rethink your current AI subscriptions.

xAI also introduced reasoning models Grok 3 Reasoning and Grok 3 mini Reasoning.
Models equipped with reasoning capabilities can “think” through problems, making them less prone to hallucination, setting them apart from standard generative models like GPT-4.

Grok 3: The New Kid on the Block

xAI is positioning Grok 3 as the best model available, asserting that it outperforms competitors from OpenAI, Anthropic, and Google on crucial benchmark tests.
Interestingly, Grok 3, under the codename “chocolate,” previously competed in Chatbot Arena, a series of blind performance tests where chatbots battle each other.

While Grok 3 has largely caught up to its rivals—an impressive accomplishment considering its relatively late entry into the field—it still shares some limitations common to other advanced models.
So, is it all hype?
Let’s explore what AI experts are saying about this new chatbot.

Expert Opinions

Andrej Karpathy, a founding member of OpenAI and former director of AI at Tesla, had early access to Grok 3 and conducted a “quick vibe check” on its performance.
His verdict?
Grok 3, with its new Deep Search reasoning feature, feels “somewhere around the state of the art territory of OpenAI’s strongest models,” and even slightly better than DeepSeekR1 and Gemini 2.0 Flash Thinking.

However, Wharton AI professor Ethan Mollick offers a more tempered perspective.
“I think Grok 3 came in right at expectations,” Mollick stated on X (formerly Twitter).
He emphasizes that while AI development is still accelerating, computational power and talent remain crucial competitive edges.
He doesn’t believe there is much to update in terms of consensus projections on AI development.

Musk enthusiasts are thrilled that Grok 3 has caught up to its competitors.
But for those simply looking for the best model on the market, it might not be enough to convert the ideologically indifferent.

Here’s a video providing additional insights into Grok 3’s capabilities:

The Missing Piece

xAI notably omitted a key comparison: an updated chart showing o3 beating Grok 3 Reasoning in math and science benchmarks.
It should be noted, O3 has yet to be publicly released, so xAI may not have had access to these scores.
This omission could potentially temper some of the enthusiasm from Grok devotees who believe OpenAI is “cooked.”

Despite this, the rapid progress of Grok 3 is undeniably significant.
“The key thing to pay attention to is that X got here very fast – whether that continues,” Mollick noted, calling it a “very good model that is now at the frontier.” The Grok models have improved remarkably quickly since Google and OpenAI started doing this 13 and 8 years before xAI was founded in 2023.

Scaling Laws: Compute is King?

According to Elon Musk, Grok 3 was trained on 10 times the computing power of Grok 2, utilizing 200,000 GPUs.
This seems to reinforce scaling laws, suggesting that more computing power leads to better model performance, at least in the short term.
However, Gary Marcus, an AI researcher and NYU psychology and neural science professor, questions whether this scaling will linearly lead to higher intelligence beyond what’s currently possible.

Here’s another video providing a brief on the launch:

And of course, Yahoo Finance provides details as well:

Limitations and Quirks

Like its competitors, Grok 3 isn’t perfect.
Its sense of humor is reportedly “pretty mediocre,” and it struggles with generating SVG images.
Karpathy humorously noted that Grok 3 can’t come up with anything better than punny dad jokes, a common issue among large language models (LLMs).

When asked to generate an SVG of a pelican riding a bicycle, Grok 3 performed “OK,” but didn’t get it perfectly right.
This highlights the challenges LLMs face in creating multiple elements in two-dimensional images due to their inability to “see” like humans.

Another test involved probing Grok 3’s stance on politically charged topics.
The chatbot generated a one-page essay basically refusing to answer whether it might be ethically justifiable to misgender someone if it meant saving 1 million people from dying.
Karpathy interpreted this as a sign that Grok 3 might be overly sensitive to ethical dilemmas, perhaps to Musk’s dismay.

Past Grok models have generally tended to lean left politically, but Musk has stated that’s a product of the public data its trained on and has tried to make Grok more politically neutral.

Deep Search and Advanced Reasoning

A key feature in popular chatbots like ChatGPT, Gemini, and Grok is the ability to search deeper.
The DeepSearch tool, announced alongside Grok 3, is being promoted as a next-generation search engine.

The advanced reasoning abilities of these chatbots allow them to handle expert-level queries and synthesize large amounts of information across various domains, such as finance and product research.
These chatbots search the web and browse content from relevant websites, saving users time and effort.

Currently, ChatGPT Deep Research is available to enterprise users at $200 per month, while Grok 3 is in beta and available to Premium users for $30 per month.
Google’s Gemini is $20 per month, and Perplexity offers a deep research feature for free (though Gemini Pro 1.5 with Deep Research is available via Google’s free trial).
To use the deep research feature with Perplexity AI, simply enable it when entering your query.

The Deep Search Showdown: A Head-to-Head Comparison

To see how these chatbots truly compare, a series of five prompts were curated to determine which chatbot excels at deep search.
Here’s the breakdown:

  1. Comparative Analysis: Analyze the global impact of carbon pricing policies on national economies and emissions reduction efforts.
    • Gemini: Strong technical detail and citations but overly dense and reliant on jargon.
    • Perplexity: Provided an academic response that was overly dense despite strong technical detail and citations.
      The response relied too heavily on jargon and statistics making it overcomplicated and difficult to digest.
    • Grok 3: Fastest response, detailed, included relevant examples and analysis, and acknowledged successes and challenges.
    • Winner: Grok wins for its nuanced analysis and specific examples.
  2. Quantum Computing: Provide a comprehensive overview of the latest advancements in quantum computing over the past five years.
    • Gemini: Too generic, limited recent examples, and excessive historical context.
    • Perplexity: Covered all major advancements, broke down complex topics into readable categories, and offered a comprehensive yet digestible overview.
    • Grok 3: Focused too much on historical milestones and lacked depth.
    • Winner: Perplexity for its informative, structured, and up-to-date analysis.
  3. Impact of AI on Employment: Examine the effects of artificial intelligence on employment trends across various industries.
    • Gemini: Uses generic industry descriptions without deeply integrating specific trends or figures.
    • Perplexity: Offered a balanced perspective on job creation and displacement, highlighted education gaps, and addressed economic redistribution challenges.
    • Grok 3: Engaging and well-structured, but the data isn’t as deeply sourced or analyzed.
    • Winner: Perplexity stands out for deep analysis and statistical data with precise numbers and sources.
  4. Global Strategies for Renewable Energy Adoption: Investigate the strategies employed by the top 10 developed and top 10 developing countries by GDP to promote renewable energy adoption over the past decade.
    • Gemini: Lacked deep financial and policy analysis, with data that was too general.
    • Perplexity: Provided clear, quantified insights into renewable energy progress, backed by specific figures and reputable sources.
    • Grok 3: Highly detailed but too country-by-country focused without enough overarching comparisons.
    • Winner: Perplexity wins for the most data-driven, comparative, and forward-looking answer.
  5. Comparative Study of Healthcare Systems: Compare and contrast how different healthcare systems around the world have responded to pandemics in the last decade.
    • Gemini: Strong response but lacked the detail of Grok 3 and was too academic.
    • Perplexity: Well-researched but lacked direct comparisons between countries and offered less statistical depth.
    • Grok 3: Provides detailed statistics on hospital capacity, testing rates, vaccination coverage, and funding allocations.
    • Winner: Grok 3 systematically analyzes how different healthcare systems responded to pandemics.

Overall Winner: Perplexity

In this experiment, Perplexity emerged as the overall winner.
Across the five prompts, Perplexity demonstrated a highly structured approach, balancing statistical depth with clear comparative insights.
It effectively used credible sources and quantitative data, ensuring that its responses were not only informative but also well-supported.

Unlike Grok, which was strong in synthesis but sometimes leaned into broader narratives, Perplexity maintained a precise, research-backed approach, making it more reliable for in-depth, factual analysis.
Compared to Gemini, which sometimes veered too academic or even veered off-topic at times, Perplexity stayed focused on the prompt’s intent.

The Future of AI: Faster Launches and Open Source

The competitive landscape is pushing AI labs to release models faster.
Elon Musk mentioned that users might notice improvements to Grok 3 almost daily due to continuous refinement.
Machine learning scientist at Weights & Biases, Logan Wright, suggests that competition and decreased regulation will likely result in users gaining access to more powerful AI on shorter timelines.

While this offers users constant access to the latest models, it can be destabilizing for developers who expect consistent behavior.
As Wright said, “Enterprises should develop custom evaluations and regularly run them to make sure new updates do not break their applications.”

The Open Source Movement

There’s a growing trend toward open-sourcing large language models.
xAI has already open-sourced Grok 1 and intends to open-source every model except the latest version.
Thus, Grok 2 will be open-sourced when Grok 3 is fully released.
However, xAI will refrain from showing the full chain-of-thought tokens of Grok 3 reasoning to prevent competitors from copying it.

Final Thoughts: Do Your Own “Vibe Check”

Despite impressive benchmark results, reactions to Grok 3 have been mixed.
Andrej Karpathy placed its reasoning capabilities at around state-of-the-art, but noted it lags behind on tasks like creating scalable vector graphics or navigating ethical issues.
Others have pointed out instances of Grok 3 pulling out information in comparison to other models, although there are also many instances where Grok 3 does well.

The best advice?
Do your own research.
As experts suggest, “Have a set of tests that reflect the kind of tasks you accomplish in your organization.
Chances are, with the right approach, you can get the most out of these advanced models.”

Less than two years since its launch, xAI has shipped what could arguably be the most powerful AI model, Grok 3.
As the AI landscape continues to evolve at breakneck speed, one thing is clear: the competition is only going to get fiercer, and users will continue to benefit from increasingly powerful and capable AI tools.
What will Grok 4 bring?

Frequently Asked Questions About Grok 3

What are the key features of Grok 3?

Grok 3 boasts advanced reasoning capabilities, a Deep Search tool, and is trained on significantly more computing power than its predecessors.
It aims to compete with models like ChatGPT and Gemini in various AI tasks.

How does Grok 3 compare to ChatGPT and Gemini?

Grok 3 is positioned as a competitor to ChatGPT and Gemini, with claims of surpassing them in key benchmarks.
However, expert opinions suggest that it is currently on par with these models, with each having strengths in different areas.

What is Deep Search, and how does it work in Grok 3?

Deep Search is a feature that allows chatbots like Grok 3 to search the web and browse content from relevant websites, synthesizing large amounts of information to handle expert-level queries and save users time.

Is Grok 3 open source?

xAI has open-sourced previous versions of Grok, such as Grok 1, and intends to open-source every model except the latest version.
Thus, Grok 2 will be open-sourced when Grok 3 is fully released.

What are the limitations of Grok 3?

Grok 3, like its competitors, has limitations.
It struggles with generating SVG images and its sense of humor is reportedly mediocre.
It can also be overly sensitive to ethical dilemmas.

Key Takeaways: Grok 3’s Impact on the AI Landscape

Grok 3 represents a significant leap in AI capabilities, showcasing xAI’s rapid progress in the field.
While it may not definitively be a ‘ChatGPT killer,’ it demonstrates comparable performance and introduces innovative features like Deep Search.
The ongoing competition and trend toward open-sourcing models suggest a future where users benefit from increasingly powerful and accessible AI tools.

Ready to Explore AI?
Your Next Steps

  • Try Grok 3: If you’re a Premium X user, explore Grok 3’s capabilities and see how it performs on your specific tasks.
  • Compare with Other Models: Test ChatGPT, Gemini, and Perplexity to determine which AI best suits your needs.
  • Stay Informed: Follow AI industry news and expert analysis to stay up-to-date on the latest advancements.

Leave a Reply

Your email address will not be published. Required fields are marked *