1 min read

How to set up a robots.txt to disallow OpenAI's GPTBot from crawling your website for LLM Models



OpenAI has launched GPTBot, a new web crawler to improve future artificial intelligence models like GPT-4 and the following user agent and string identifies it.

User agent token: GPTBot Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

How GPTBot works

Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety.

As an author or owner of a Ghost blog, you may want to prevent GPTBot from accessing your Ghost site.

Disallowing GPTBot on your Ghost Website

To disallow GPTBot on your Ghost website, you have to set up your robots.txt file to disallow the crawler.  The robots.txt for Ghost websites exists in the root of your themes directory.

There are two ways to create a robots.txt file. One way is by Command Line. Another method is to download your themes file, create the robots.txt file and upload it via the Ghost dashboard.

Download a theme in Ghost Admin to create the Robots.TXT file

From your dashboard, go to Settings, then go to the Design page.

Click Change theme.

Choose the Advanced toggle.

Click the overflow menu (...) and then Download. A copy of the theme will be downloaded to your computer.

Open the theme file in your code editor

Create a robots.txt file. In the txt file add

Sitemap: http://yourdomain/rss
User-agent: *

User-agent: GPTBot
Disallow: /

To allow GPTBot to access only parts of your site you can add the GPTBot token to your site’s robots.txt like this:

User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

Zip up your theme files. On the Themes page, click Upload theme.