If you are testing out various LLMs on the market then you will have certainly come across these two platforms. ChatGPT certainly doesn’t need any introduction but Claude 3 in recent months is making headlines, being positioned as the ChatGPT “Killer” with lots of discussion on the virtues of Claude 3. So we have decided to put both of them through their paces and to the test quality of their output, to see if they can be picked up by 8 of the leading AI Detectors using Content Guardian. It is important to note that we are not analysing the quality of the output from a content accuracy, completeness, or plagiarism perspective. This study focuses on the AI detection rates alone.
How we are testing the tools
All LLMs are trained on vast amounts of data but there can be some inherent strengths and weaknesses, so we have created prompts that span various ranges of topics areas and niches. We have kept the prompts simple to give the fairest comparison. We are using ChatGPT 4 and Claude 3 Sonnet. We will be comparing Claude 3 Opus in a later benchmark study.
Prompt used – write a 1000-word article on the following: xxx topic
List of topics and niches we tested
Education
- Biology – What is Photosynthesis and how does it work?
- Philosophy – A critical appraisal of modern philosophers and their relevance in a digital AI world.
Business
- A practical guide to brand marketing, it benefits in the world of digital
- What is influencer marketing and how can businesses use it to grow their business?
- How to optimise my business for Local SEO?
- Top accounting tools in 2024 – my recommendations, tips, pricing
Entertainment
- The Top 100 most famous celebrities ranked by income and net worth?
- What were the top trending movies and TV shows of 2023?
- My review of Avatar Way of the Water
Technology
- Best mesh wifi/routers in 2024. My Top picks
- How does an OLED TV work? What are the differences with LED? Is it better?
Gaming
- Elden Ring Review – What I think, how long it takes to complete, overall score
- 10 Games Like Grand Theft Auto 5 – genre, Release Date, review rating and why it is Similar
The results
The table below shows the Content Guardian Score which is an aggregate score that uses a propriety algorithm to provide users with an easy-to-understand and consistent score.
Results by Title
Topic Area | Title | Claude 3 Sonnet | ChatGPT 4 |
---|---|---|---|
Education | What is Photosynthesis and how does it work? | 69% | 68% |
Education | A critical appraisal of modern philosophers and their relevance in a digital AI world. | 65% | 78% |
Business | A practical guide to brand marketing, it benefits in the world of digital | 96% | 96% |
Business | What is influencer marketing and how can businesses use it to grow their business? | 95% | 93% |
Business | How to optimise my business for Local SEO? | 92% | 86% |
Business | Top accounting tools in 2024 – my recommendations, tips, pricing | 94% | 95% |
Entertainment | The Top 100 most famous celebrities ranked by income and net worth? | 45% | 84% |
Entertainment | What were the top trending movies and TV shows of 2023? | 94% | 98% |
Entertainment | My review of Avatar Way of the Water | 40% | 92% |
Technology | Best mesh wifi/routers in 2024. My Top picks | 32% | 85% |
Technology | How does an OLED TV work? What are the differences with LED? Is it better? | 79% | 89% |
Gaming | Elden Ring Review – What I think, how long it takes to complete, overall score | 14% | 94% |
Gaming | 10 Games Like Grand Theft Auto 5 – genre, release date, review rating and why it is similar | 46% | 80% |
Results by Topic area and overall average
Topic Area | Claude 3 Sonnet | ChatGPT 4 |
---|---|---|
Education | 67% | 73% |
Business | 97% | 93% |
Entertainment | 60% | 91% |
Technology | 74% | 86% |
Gaming | 30% | 87% |
AVERAGE | 66% | 88% |
Biology -What is Photosynthesis and how does it work?
Claude 3 – Overall AI Probability – 69%
ChatGPT4 – Overall AI Probability – 68%
Biology – A critical appraisal of modern philosophers and their relevance in a digital AI world.
Claude 3 – Overall AI Probability – 65%
ChatGPT4 – Overall AI Probability – 78%
Business – A practical guide to brand marketing, it benefits in the world of digital
Claude 3 – Overall AI Probability – 96%
ChatGPT 4 – Overall AI Probability – 96%
Business – What is influencer marketing and how can businesses use it to grow their business?
Claude 3 – Overall AI Probability – 95%
ChatGPT 4- Overall AI Probability – 93%
Business – How to optimise my business for Local SEO?
Claude 3 – Overall AI Probability – 92%
ChatGPT 4 – Overall AI Probability – 86%
write a 1000 word article on the following: Top accounting tools in 2024 – my recommendations, tips, pricing
Business – Top Accounting Tools in 2024 – My Recommendations, Tips, and Pricing
Claude 3 – Overall AI Probability – 94%
ChatGPT 4 – Overall AI Probability – 95%
Entertainment – The Top 100 most famous celebrities ranked by income and net worth?
Claude 3 – Overall AI Probability – 45%
ChatGPT 4 – Overall AI Probability – 84%
Entertainment – What were the top trending movies and TV shows of 2023?
Claude 3 – Overall AI Probability – 94%
ChatGPT 4 – Overall AI Probability – 98%
Entertainment – My review of Avatar Way of the Water
Claude 3 – Overall AI Probability – 40%
ChatGPT 4 – Overall AI Probability – 92%
Technology – Best mesh wifi/routers in 2024. My Top picks
Claude 3 – Overall AI Probability – 32%
ChatGPT 4 – Overall AI Probability – 85%
Technology – How does an OLED TV work? What are the differences with LED? Is it better?
Claude 3 – Overall AI Probability – 79%
ChatGPT 4 – Overall AI Probability – 89%
Gaming – Elden Ring Review – What I think, how long it takes to complete, overall score
Claude 3 – Overall AI Probability -14%
ChatGPT 4 – Overall AI Probability – 94%
Gaming – 10 Games Like Grand Theft Auto 5 – genre, release date, review rating and why it is similar
Claude 3 – Overall AI Probability – 46%
ChatGPT 4 – Overall AI Probability – 80%
In Summary
Claude 3 overall appears harder to detect than ChatGPT 4 in Gaming & Entertainment niches. This could be down to the wider adoption of OpenAI’s ChatGPT and the AI detection models having more data to train their detection models on. Although this benchmark study is focused on AI Detection rates Claude 3 did provide more consistent and concise responses. I found I had to nudge Chat GPT to give me a concise response to what is a very simple prompt.
Claude 3 has 3 versions of their model, we use Sonnet which is free to access. To find out whether there are material differences in the 3 versions, read Claude 3’s Opus vs Sonnet vs Haiku – AI Detection benchmark study