Skip to content

chore: update robots.txt to fix malformed settings and clarify bot access rules#2892

Open
MaxwellCohen wants to merge 1 commit into
npmx-dev:mainfrom
MaxwellCohen:fix/2891/robots-txt
Open

chore: update robots.txt to fix malformed settings and clarify bot access rules#2892
MaxwellCohen wants to merge 1 commit into
npmx-dev:mainfrom
MaxwellCohen:fix/2891/robots-txt

Conversation

@MaxwellCohen

Copy link
Copy Markdown

🔗 Linked issue

Resolves: #2891

🧭 Context

The robots.txt file is invalid because the default settings are missing a user agent, so they cannot be read.

Added a default user agent and comments to robots.txt

📚 Description

Problem: pages that should not be crawled according to robots.txt are being crawled, resulting in SEO dilution and extra bandwidth spent on invalid crawling

Before: a page that should be blocked is able to be crawled

image

After: the page is blocked
image
image

I leveraged Nuxti to learn about Nuxt robots.txt and Composer 2.5 to find and validate robots.txt.

I will gladly make any changes needed, and thanks for a great resource!

@vercel

vercel Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
npmx.dev Ready Ready Preview, Comment Jun 12, 2026 3:32am
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
docs.npmx.dev Ignored Ignored Preview Jun 12, 2026 3:32am
npmx-lunaria Ignored Ignored Jun 12, 2026 3:32am

Request Review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

The PR fixes public/robots.txt by adding an explicit "User-agent: *" header and reorganising the wildcard section so Disallow rules (including /search) are applied under that user-agent policy.

Changes

Robots.txt format and validity

Layer / File(s) Summary
User-agent header and disallow rules restructure
public/robots.txt
Lines 1–4 are rewritten to add an explicit User-agent: * header and introductory headings. Lines 17–21 remove commented wildcard directives so Disallow: / directly precedes the "Search pages" disallow block (including /search).
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description is well-related to the changeset, explaining the problem (invalid robots.txt lacking User-agent), the solution (adding User-agent and comments), and providing evidence of the fix working.
Linked Issues check ✅ Passed The PR successfully addresses all coding requirements from issue #2891: adding the missing User-agent line, reorganising rules for proper bot interpretation, and validating with testing tools demonstrating the fix blocks previously-allowed paths.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing robots.txt validity and bot access rules as defined in issue #2891; no unrelated modifications are present.
Title check ✅ Passed The pull request title accurately summarises the main change: fixing malformed robots.txt settings and clarifying bot access rules, which directly aligns with the PR objectives and file modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown

Hello! Thank you for opening your first PR to npmx, @MaxwellCohen! 🚀

Here’s what will happen next:

  1. Our GitHub bots will run to check your changes.
    If they spot any issues you will see some error messages on this PR.
    Don’t hesitate to ask any questions if you’re not sure what these mean!

  2. In a few minutes, you’ll be able to see a preview of your changes on Vercel

  3. One or more of our maintainers will take a look and may ask you to make changes.
    We try to be responsive, but don’t worry if this takes a few days.

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@mootari

mootari commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

This would probably be a good opportunity to remove the redundant Disallow rules?

@MaxwellCohen

MaxwellCohen commented Jun 12, 2026

Copy link
Copy Markdown
Author

I can remove the redundant rules. I would like someone on the core or maintainer teams to request/sign off because I want changes to be as focused as possible, and my main goal is to fix the malformed robots.txt that is giving incorrect access to bots

I had Composer 2.5 LLM run a report for me on redundant fields, and this is what it came up with, and all items seem reasonable to me :

editing to remove direct quote of AI response for human-written summary. Thanks to a suggestion from @trueberryless

  1. Remove the disallow rules after Disallow: / covering lines 19–44 because Disallow: /. covers everything that is not explicitly allowed. Thus, delete - I think this is a good idea, so we do not reveal extra paths that we do not want bots/people to see.
  2. `/opensearch.xml' is both allowed and disallowed (line 15, 41). I think this should be allowed. There feels like there is a story around this that I would like to understand before taking any action.
  3. Specific AI classifier blocks are duplicates and can be simplified to just list out the crawlers and blocks once. I am neutral on this and can see value in having it both ways.

If I should implement any of these items, please let me know, and I can do it.

@trueberryless

Copy link
Copy Markdown
Member

Please note that while using LLMs is allowed, we are aiming to interact with human responses. I refer to our contributing guidelines. Ideally this means that you learn from any LLM responses, understand the context and write responses with your own words.

If something is still unclear, you can always ask us. We also have a Discord server if you want to chat and learn with us 😉

@MaxwellCohen

MaxwellCohen commented Jun 13, 2026

Copy link
Copy Markdown
Author

@trueberryless I'm sorry about directly quoting llm in my previous response instead of summarizing it. I will have a 100% written human summary of the LLM report tonight. I miss interpreted the LLM policy to allow for quoting when disclosured, my bad.

My goal of the summary was to identify optional improvements for easy decisions (that could be considered outside the scope of this pr). I will make sure to not do this in the future.

@MaxwellCohen MaxwellCohen changed the title chore: update robots.txt to clarify bot access rules chore: update robots.txt to fix malformed settings and clarify bot access rules Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

robots.txt is not valid

3 participants