Sitemap by FriendsOfFlarum

MIT license Latest Stable Version Total Downloads OpenCollective

This extension simply adds a sitemap to your forum.

It uses default entries like Discussions and Users, but is also smart enough to conditionally add further entries
based on the availability of extensions. This currently applies to flarum/tags and fof/pages. Other extensions
can easily inject their own Resource information, check Extending below.

Modes

There are two modes to use the sitemap.

Runtime mode

After enabling the extension the sitemap will automatically be available and generated on the fly. It contains all Users, Discussions, Tags and Pages guests have access to.

Applicable to small forums, most likely on shared hosting environments, with discussions, users, tags and pages summed up being less than 10.000 items. This is not a hard limit, but performance will be degraded as the number of items increase.

Cached multi-file mode

For larger forums you can set up a cron job that generates a sitemap index and compressed sitemap files. A first sitemap will be automatically generated after the setting is changed, but subsequent updates will have to be triggered either manually or through the scheduler (see below).

A rebuild can be manually triggered at any time by using:

php flarum fof:sitemap:build

Best for larger forums, starting at 10.000 items.

Risky Performance Improvements

This setting is meant for large enterprise customers.

The optional "Enable risky performance improvements" option modifies the discussion and user SQL queries to limit the number of columns returned. By removing those columns, it significantly reduces the size of the database response but might break custom visibility scopes or slug drivers added by extensions.

This setting only brings noticeable improvements if you have millions of discussions or users. We recommend not enabling it unless the CRON job takes more than an hour to run or that the SQL connection gets saturated by the amount of data.

Scheduling

Consider setting up the Flarum scheduler, which removes the requirement to setup a cron job as advised above. Read more information about this here

The frequency setting for the scheduler can be customized via the extension settings page.

Installation

This extension requires PHP 8.0 or greater.

Install manually with composer:

composer require fof/sitemap

Updating

composer update fof/sitemap
php flarum migrate
php flarum cache:clear

Nginx issues

If you are using nginx and accessing /sitemap.xml results in an nginx 404 page, you can add the following rule to your configuration file, underneath your existing location rule:

location = /sitemap.xml {
    try_files $uri $uri/ /index.php?$query_string;
}

This rule makes sure that Flarum will answer the request for /sitemap.xml when no file exists with that name.

Extending

Consult the up to date documentation in the README on GitHub.

Commissioned

The initial version of this extension was sponsored by profesionalreview.com.

Links

    How do we get to the site map if its generated on the fly? Whats the link?

      MikeJones It's mentioned in the second line brother.

      yourflarum.url/sitemap.xml

        Good news, let me see if finally i can delete my pure PHP script. Soon as possible i will return with some feedback ?

        This plugin is a must in 2018.

        Feature-wise it's identical to this other sitemap extension, the main difference being that this one doesn't write anything to the filesystem. That's an advantage because this way you don't have to deal with annoying permissions issues on the filesystem.

        I tried to go with an existing sitemap library, but as it turns out they are either very outdated and not published to Packagist or very recent but too tightly integrated with Laravel. In the end I re-implemented the logic by copying small bits from those libraries, which wasn't that hard to do.

        I first tried to cache the sitemap output to replicate the file-on-disk behavior, but after some thoughts I decided it wasn't worth it. The queries to generate the sitemap aren't that complicated. More complicated queries are run everytime you load the homepage. So I'm just returning a fresh and up to date sitemap every time you ask for it.

        If there's interest I can implement the links limit (number and file size) according to the sitemap specs. The specs says there shouldn't be more than 50k links per sitemap file. For now it will keep growing forever. I'm not sure what is the actual search engine behavior if you exceed 50k links per file.

        Other features that could be added are the ability to select what type of page is added to the sitemap, an option to choose the sitemap index file name/url and an option to cache some of the output for a period of time.

        Don't hesitate to open an issue on the repo with your wanted feature and use case if you want to discuss a suggestion !

          clarkwinkelmann Awesome work!

          Maybe for the sitemap limit allow us to have a different sitemap for each of our tags?

          About don't write sitemap to the disk is a good idea, i love it. Thinking about big sites, add the file to the cache is the way to go... because query can be simple, but big sites can output easily more than 5 MB of data with a sitemap.

          The feature that i miss a lot is have more control over parts of the board added to the sitemap (users, discussions and tags). Will be nice have a an option to include/exclude these three parts of the forum. Imagine that i decide not index tags, at this momet the plugin is limited because if i added a robots.txt to not allow indexing tags but i'm sending a sitemap that contain tag's urls, this is a confusing signal to the search engines.

          Also be able to setup the "changefreq" and "priority" parameters according to the forum part will be great.

          Also observed that:

          • On title changes the lastmod is not changed ❌
          • On reply added the lastmod is changed ✅
          • On edit contents of a post into discussion, the lastmod is not changed ❌

          The Google's behavior with sitemaps bigger than 50k urls or 10MB is simply don't process them.

          Hope that post this here also works instead go to GitHub ?

          16 days later
          a month later

          tlalok is it Flarum's 404 page (with the error pages extension) or the webserver 404 page ?

          If it's the webserver error page, make sure the /sitemap.xml url is handled by Flarum's index.php via htaccess or rewrite rules (should be with provided htaccess and example config). Maybe there are special rules in some hostings that prevent this file from being dynamically generated.

          If it's Flarum 404 page, the extension is likely not enabled correctly.

          You may open an issue at https://github.com/flagrow/sitemap/issues if you need further help ?

            tlalok Try disabling & re-enabling the extension.

            clarkwinkelmann Indeed, I just added this nginx directive and it works.

            location = /sitemap.xml { try_files $uri $uri/ /index.php?$query_string; }

            clarkwinkelmann If it's the webserver error page, make sure the /sitemap.xml url is handled by Flarum's index.php via htaccess or rewrite rules (should be with provided htaccess and example config). Maybe there are special rules in some hostings that prevent this file from being dynamically generated.

            I use this docker image, so I am going to fix this issue.

            @clarkwinkelmann @arda thanks for your help ?

              2 months later
              8 days later

              Updated for beta 8.

              I've kept compatibility with the Pages extension for now, even though it hasn't been updated for beta 8 yet. Be warned that an additional update of this extension might be needed to support the future beta8-compatible version of Pages.

                clarkwinkelmann Thanks for the update, unfortunately it doesn't seem to work at my end. When accessing url/sitemap.xml I'm only getting a 404, both with the extension enabled and disabled. I'm using nginx as my webserver.

                  Kakifrucht Could this be a similar case to tlalok where the webserver needs an additional rewrite rule to handle .xml files via the PHP application ? Maybe try the solution above to see if it solves your case.

                  Also is the 404 from Flarum (there should be a back to homepage link) or Nginx (just a 404 message in english) ? If nginx is responding, then it has something to do with server configuration (and it seems the fix above solve it).

                  If it's a Flarum 404 error page then we'll have to investigate further. Could you open an issue at GitHub if that's the case so we don't create too much noise here ?

                    clarkwinkelmann That's what I was looking for, sorry for being too lazy to scroll a couple of posts up.

                    You might want to add it to the installation section of this discussion for nginx users.