chore: update robots.txt to fix malformed settings and clarify bot access rules#2892
chore: update robots.txt to fix malformed settings and clarify bot access rules#2892MaxwellCohen wants to merge 1 commit into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
📝 WalkthroughWalkthroughThe PR fixes public/robots.txt by adding an explicit "User-agent: *" header and reorganising the wildcard section so Disallow rules (including /search) are applied under that user-agent policy. ChangesRobots.txt format and validity
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hello! Thank you for opening your first PR to npmx, @MaxwellCohen! 🚀 Here’s what will happen next:
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
3a292a7 to
f8f75c6
Compare
|
This would probably be a good opportunity to remove the redundant Disallow rules? |
|
I can remove the redundant rules. I would like someone on the core or maintainer teams to request/sign off because I want changes to be as focused as possible, and my main goal is to fix the malformed robots.txt that is giving incorrect access to bots I had Composer 2.5 LLM run a report for me on redundant fields, and this is what it came up with, and all items seem reasonable to me : editing to remove direct quote of AI response for human-written summary. Thanks to a suggestion from @trueberryless
If I should implement any of these items, please let me know, and I can do it. |
|
Please note that while using LLMs is allowed, we are aiming to interact with human responses. I refer to our contributing guidelines. Ideally this means that you learn from any LLM responses, understand the context and write responses with your own words. If something is still unclear, you can always ask us. We also have a Discord server if you want to chat and learn with us 😉 |
|
@trueberryless I'm sorry about directly quoting llm in my previous response instead of summarizing it. I will have a 100% written human summary of the LLM report tonight. I miss interpreted the LLM policy to allow for quoting when disclosured, my bad. My goal of the summary was to identify optional improvements for easy decisions (that could be considered outside the scope of this pr). I will make sure to not do this in the future. |
🔗 Linked issue
Resolves: #2891
🧭 Context
The robots.txt file is invalid because the default settings are missing a user agent, so they cannot be read.
Added a default user agent and comments to robots.txt
📚 Description
Problem: pages that should not be crawled according to robots.txt are being crawled, resulting in SEO dilution and extra bandwidth spent on invalid crawling
Before: a page that should be blocked is able to be crawled
After: the page is blocked


I leveraged Nuxti to learn about Nuxt robots.txt and Composer 2.5 to find and validate robots.txt.
I will gladly make any changes needed, and thanks for a great resource!