I was having a great discussion with @Zeokat on sitemaps which turned into a discussion on robot.txt files here.
I wanted to break this conversation out into another post because I know a lot of people ask about SEO of Flarum on this site and this might be able to help others out.
Its been a long time since I fooled with a robot.txt but I wanted to discuss and build a good robot.txt file that could help other users here.
Talking about the user pages on our Forums Zeokat brought up a great point:
Imagine the situation in which you have 30000 users in your forum, then Google will index 30000 pages which content is mostly unusefull and can be categorized by Google as "thin content". So... you can guess that your forum will be penalized soon or later.
As I started building my robot .txt I started reading this interesting article too:
https://neilpatel.com/blog/robots-txt/
So here is my first stab at a robot.txt file for my flarum:
User-agent:*
Disallow: /u/
Noindex: /u/
Sitemap: https:www.seekadventure.net/sitemap-post.xml
My question is since https:www.seekadventure.net/u/ is not a page that actually exists but all the users are after that should my robot.txt actually look like this:
User-agent:*
Disallow: /u/*
Noindex: /u/*
Sitemap: https:www.seekadventure.net/sitemap-post.xml
In addition, I would also like to discuss other changes you all would find useful for SEO to the robot.txt file.
First, discussion point is the SEO robot.txt website I mentioned earlier in my post says "repeat" content might be worth blockign also. Since each post has its own link would it worth be adding /d/ to the Dissallow and Noindex so that Google and other search engines only index the original whole post discussions?