clarkwinkelmann so after playing around for a bit, I have a working solution that allows me to control which pages should be indexed in google and which one should not (without building an extension). So as the Google documentation specify, the idea is to add the header X-Robots-Tag: noindex
to the pages that should not be indexed.
With nginx, I did it as follow. In my http block, I added:
map $request_uri $robots_header {
default "";
~^/forum/u(.*) "noindex";
~^/forum/t(.*) "noindex";
}
In my server block, I added then the following line:
more_set_headers "X-Robots-Tag: $robots_header";
Note that because I have a setup with an nginx proxy in front of nginx, I had to use the headers more module. If you don't have an nginx proxy, you can probably just add the following line in your server block:
add_header X-Robots-Tag $robots_header;
That's the cleanest solution I could come up with 🙂 . Let's see if google take the header into account and exclude theses urls in my google search console