Is Claude 3 better than ChatGPT 4? – we benchmark them both

If you are testing out various LLMs on the market then you will have certainly come across these two platforms. ChatGPT certainly doesn’t need any introduction but Claude 3 in recent months is making headlines, being positioned as the ChatGPT “Killer” with lots of discussion on the virtues of Claude 3. So we have decided to put both of them through their paces and to the test quality of their output, to see if they can be picked up by 8 of the leading AI Detectors using Content Guardian. It is important to note that we are not analysing the quality of the output from a content accuracy, completeness, or plagiarism perspective. This study focuses on the AI detection rates alone.

How we are testing the tools

All LLMs are trained on vast amounts of data but there can be some inherent strengths and weaknesses, so we have created prompts that span various ranges of topics areas and niches. We have kept the prompts simple to give the fairest comparison. We are using ChatGPT 4 and Claude 3 Sonnet. We will be comparing Claude 3 Opus in a later benchmark study.

Prompt used – write a 1000-word article on the following: xxx topic

List of topics and niches we tested

Education

  • Biology – What is Photosynthesis and how does it work?
  • Philosophy – A critical appraisal of modern philosophers and their relevance in a digital AI world.

Business

  • A practical guide to brand marketing, it benefits in the world of digital
  • What is influencer marketing and how can businesses use it to grow their business?
  • How to optimise my business for Local SEO?
  • Top accounting tools in 2024 – my recommendations, tips, pricing

Entertainment

  • The Top 100 most famous celebrities ranked by income and net worth?
  • What were the top trending movies and TV shows of 2023?
  • My review of Avatar Way of the Water

Technology

  • Best mesh wifi/routers in 2024. My Top picks
  • How does an OLED TV work? What are the differences with LED? Is it better?

Gaming

  • Elden Ring Review – What I think, how long it takes to complete, overall score
  • 10 Games Like Grand Theft Auto 5 – genre, Release Date, review rating and why it is Similar

The results

The table below shows the Content Guardian Score which is an aggregate score that uses a propriety algorithm to provide users with an easy-to-understand and consistent score.

Results by Title

Topic AreaTitleClaude 3 SonnetChatGPT 4
EducationWhat is Photosynthesis and how does it work?69%68%
EducationA critical appraisal of modern philosophers and their relevance in a digital AI world.65%78%
BusinessA practical guide to brand marketing, it benefits in the world of digital96%96%
BusinessWhat is influencer marketing and how can businesses use it to grow their business?95%93%
BusinessHow to optimise my business for Local SEO?92%86%
BusinessTop accounting tools in 2024 – my recommendations, tips, pricing94%95%
EntertainmentThe Top 100 most famous celebrities ranked by income and net worth?45%84%
EntertainmentWhat were the top trending movies and TV shows of 2023?94%98%
EntertainmentMy review of Avatar Way of the Water40%92%
TechnologyBest mesh wifi/routers in 2024. My Top picks32%85%
TechnologyHow does an OLED TV work? What are the differences with LED? Is it better?79%89%
GamingElden Ring Review – What I think, how long it takes to complete, overall score14%94%
Gaming10 Games Like Grand Theft Auto 5 – genre, release date, review rating and why it is similar46%80%
Content Guardian Benchmark Results – March, 22nd 2024

Results by Topic area and overall average

Topic AreaClaude 3 SonnetChatGPT 4
Education67%73%
Business97%93%
Entertainment60%91%
Technology74%86%
Gaming30%87%
AVERAGE66%88%

Biology -What is Photosynthesis and how does it work?

Claude 3 – Overall AI Probability – 69%

Biology -What is Photosynthesis and how does it work?

Claude 3 AI detection result Content Guardian

ChatGPT4 – Overall AI Probability – 68%

Biology – A critical appraisal of modern philosophers and their relevance in a digital AI world.

Claude 3 – Overall AI Probability – 65%

ChatGPT4 – Overall AI Probability – 78%

Business – A practical guide to brand marketing, it benefits in the world of digital

Claude 3 – Overall AI Probability – 96%

ChatGPT 4 – Overall AI Probability – 96%

Business – What is influencer marketing and how can businesses use it to grow their business?

Claude 3 – Overall AI Probability – 95%

ChatGPT 4- Overall AI Probability – 93%

Business – How to optimise my business for Local SEO?

Claude 3 – Overall AI Probability – 92%

ChatGPT 4 – Overall AI Probability – 86%

write a 1000 word article on the following: Top accounting tools in 2024 – my recommendations, tips, pricing

Business – Top Accounting Tools in 2024 – My Recommendations, Tips, and Pricing

Claude 3 – Overall AI Probability – 94%

ChatGPT 4 – Overall AI Probability – 95%

Entertainment – The Top 100 most famous celebrities ranked by income and net worth?

Claude 3 – Overall AI Probability – 45%

ChatGPT 4 – Overall AI Probability – 84%

Entertainment – What were the top trending movies and TV shows of 2023?

Claude 3 – Overall AI Probability – 94%

ChatGPT 4 – Overall AI Probability – 98%

Entertainment – My review of Avatar Way of the Water

Claude 3 – Overall AI Probability – 40%

ChatGPT 4 – Overall AI Probability – 92%

Technology – Best mesh wifi/routers in 2024. My Top picks

Claude 3 – Overall AI Probability – 32%

ChatGPT 4 – Overall AI Probability – 85%

Technology – How does an OLED TV work? What are the differences with LED? Is it better?

Claude 3 – Overall AI Probability – 79%

ChatGPT 4 – Overall AI Probability – 89%

Gaming – Elden Ring Review – What I think, how long it takes to complete, overall score

Claude 3 – Overall AI Probability -14%

ChatGPT 4 – Overall AI Probability – 94%

Gaming – 10 Games Like Grand Theft Auto 5 – genre, release date, review rating and why it is similar

Claude 3 – Overall AI Probability – 46%

ChatGPT 4 – Overall AI Probability – 80%

In Summary

Claude 3 overall appears harder to detect than ChatGPT 4 in Gaming & Entertainment niches. This could be down to the wider adoption of OpenAI’s ChatGPT and the AI detection models having more data to train their detection models on. Although this benchmark study is focused on AI Detection rates Claude 3 did provide more consistent and concise responses. I found I had to nudge Chat GPT to give me a concise response to what is a very simple prompt.

Claude 3 has 3 versions of their model, we use Sonnet which is free to access. To find out whether there are material differences in the 3 versions, read Claude 3’s Opus vs Sonnet vs Haiku – AI Detection benchmark study

Leave a Reply

Your email address will not be published. Required fields are marked *

ContentGuardian-Logo-Light
Receive the latest AI news

Subscribe to insights Newsletter

Get notified about new articles