What is llms.txt? A New Standard for AI Content Indexing in 2025
If you’re familiar with robots.txt
, the new llms.txt
will feel like its AI-native cousin.
As large language models (LLMs) like ChatGPT, Claude, and Perplexity continue crawling the web to provide direct answers, website owners are starting to ask: How can I control how my content is used?
Enter llms.txt
— a new emerging standard designed to signal permissions and preferences to AI crawlers.
🧠 What Is llms.txt
?
llms.txt
is a simple plain-text file you place at the root of your domain (e.g. yourdomain.com/llms.txt
). It declares how your site allows or disallows its content to be used by AI models and LLM-based crawlers.
Think of it as a robot.txt for LLMs, focused not on SEO bots like Googlebot, but AI indexers.
Key Goals of llms.txt
:
- Indicate whether your content can be used in AI training or indexing
- Define licensing or attribution preferences
- List specific pages or directories to exclude from LLM crawling
🧾 Example of a Basic llms.txt
# Allow Perplexity and ChatGPT to index but not train
User-agent: ChatGPT
Allow: /
Train: no
User-agent: Claude
Allow: /
Train: no
User-agent: *
Disallow: /private/
# You can also add a link to your terms of service or license:
License: https://yourdomain.com/terms
Attribution: required
🧩 Why It Matters in AEO
If you’re practicing Answer Engine Optimization (AEO), llms.txt
is your chance to:
- Explicitly invite (or block) AI crawlers
- Protect premium or sensitive content
- Encourage proper attribution when content is quoted in AI answers
And as ChatGPT and Perplexity become default search interfaces, having control over your AI-facing footprint is as important as SEO was in the Google era.
⚙️ Tools and Resources
- llms.txt initiative on GitHub — community-driven spec
- perplexity.ai/about — details their use of public content
- ChatGPT Data Controls — how OpenAI handles content
🧪 Bonus: Monitor LLM Crawler Activity
Check your server logs or use a service like:
- BotSight.io – to detect LLM-based crawlers
- Cloudflare Bot Management – to filter and allowlist specific user-agents
Conclusion
As AI-driven search grows, llms.txt
is becoming an essential part of your site’s visibility and control strategy. Whether you want to block LLM training, require attribution, or just be indexed properly — this file gives you a voice.
👉 Need help setting up your llms.txt
file?
Fill out our quick form and we’ll help you generate one based on your AEO goals.