• Dev
  • An extension that handles fulltext search

Hello everyone 👋 ,

Something is brewing in the "working from home" offices of the Blomstra team. An upcoming client is having a bitter experience with the native Flarum MySQL search functionality. Having tens of millions of posts means that enabling the search bar on their community causes heavy spikes on their extremely heavy virtual machine. Their intent to become a client as soon as possible made me consider the impact on our managed hosting platform and their perception of our service. It was important to me to prevent any disappointment! Especially when they state "In Luceos I Trust", I kid you not 🥰

For months I had this idea in my mind how I would solve search for enterprise communities would I have the time to dive into it. No reason for delay anymore it seems! So I sat down for hours on end on Thursday; I saw the sun go up, take it's journey along the sky, dip into the far reaches of the horizon. Just after midnight I had a first version running, a surprising success! Search that relies on a database meant for fulltext searches. A drop-in replacement for the native search even.

This first concept was built on top of Meili search, a funky little search database. Quite easy to install anywhere, to seed and to search with. A test with only 50k documents (discussions) stored into Meili took a search 4759ms (almost 5 seconds) to run using its native administration panel. That wasn't acceptable.

A quick talk in the team later we settled on managed elastic search, because we want to service our customers with something reliable, performant scalable and proven. A work day later the extension was refactored to seed posts into a very small managed elastic instance. Although only tests on a larger database makes sense it currently:

  • responds between 200ms and 400ms (50k posts)
  • understands tag permissions
  • understands byobu permissions
  • allows searching for full sentences and complex combinations
  • allows searching posts and discussion titles
  • does not yet understand gambits
  • supports Flarum native sorting rules

So why post about this? Well, I'm excited and I'm even more excited to announce that this extension will be 100% open source and free to use in the near future! Just see it as a gift from team Blomstra to the Flarum ecosystem. We're hoping to see this extension adopted by larger communities and willingness to collaborate on making it even better.

Only after having started work on this elastic integration did I discover the Sonic extension. Although a very interesting extension to have, I think for the largest of our clients would need a managed service to rely on to deliver consistency and quality.

Roadmap

Before a beta release

  • improve relevancy values
  • test the implementation on a multi million posts community
  • consider user search (do we need that?)

Before a stable release

  • add extenders
  • get a review from the core team

FAQ

So when can you get your hands on this?
I first want to test this extension on the acceptance environment for our future client. Testing against a community with a huge amount of posts can validate that this indeed is the right solution for us, but also for communities feeling those growth pains.

Do I need this?
Not everyone needs this. Only once you notice searches take longer than 500ms on your community it's time to consider this extension, but only after you looked at your hosting capacity first. Usually only larger communities (going over a million posts) have an immediate need to delegate search away from their MySQL database (cluster) and webserver node(s).

    Awesome! Does this just improve performance? Or does it improve overall search as well?

      MikeJones Should improve both because offloading search will reduce the load on the database which should, to some extent, result in better performance as far as the database is concerned.

      But on a large community, it should improve search speed monumentally.

        meezaan I guess I should clarify, I am not looking at speed but relevancy. I find Flarum search doesn't always help me find what I am looking for in my flarum.

          MikeJones yeah elastic has language processors for it's analyzers, it's part of the roadmap;

          luceos improve relevancy values

          What about the meili idea, completely scratched or it can be introduced as an extension for smaller communities that don't actually need full elastic but still want a better search than the one that is currently available in flarum? Sonic kinda seems to solve this usecase but it is more complex to run (it doesn't seems to work in MUSL environments for example).

            wonderbeel What about the meili idea, completely scratched or it can be introduced as an extension for smaller communities that don't actually need full elastic but still want a better search than the one that is currently available in flarum?

            Yeah Meili sounded like a good solution, but it just doesn't really scale well.
            I stripped the code for that reason, sorry. If I have the time I might consider open sourcing it, but not under Blomstra.

            Interesting, I was waiting for this although I hoped it had support for a more lightweight solution like MeiliSearch/Sonic/typesense indeed 😂

            I should be available for testing if necessary, my forum has a bit more than 500k posts and it's very active. But I need to check the cost of running Elasticsearch...

            @luceos when you say managed Elastic, you mean the cloud offering that I can see here? https://www.elastic.co/pricing/

            Thanks.

              010101 I honestly saw that one too late and I haven't tested it. Elastic is widely available and is known to scale well...

              matteocontrini once it's available you can give it a spin at all times. It will be open source.

              010101 Probably it will scale like meili, so it is good for small/medium forums (the biggest gain will probably be better misstype handling compared to a trigram based search in the database) but for really huge communities it will be better to use elastic.

              9 days later

              I have just tagged 0.0.6 and made the extension open source. Once I feel comfortable to be used more widely I will create a discussion in Extensions. For now, feel free to give it a spin:

                luceos This extension requires PHP8. Will it be compatible with PHP7.4? flarum/akismet does not work on PHP8 so forums that use Akismet cannot use this extension.


                I see that ElasticSearch does not support my language (polish) by default and requires a plugin. It would be nice to be able to set Analyzer language by typing instead of being limited to choosing from a limited list.

                  rafaucau This extension requires PHP8. Will it be compatible with PHP7.4?

                  Nope sorry. In 25 days active support for 7.4 will be dropped, after that it will only receive security updates. And yes we can add a way to add support for more analyzers, I think I'll just make the list extensible from the backend then. I dislike the free choice option as this would cause too much support on the extension if people fill in something that Elastic doesn't understand.


                  TODO:

                  • discussion titles are not taken into account
                  • offer a button in the admin area to kick off the indexing of your forum (queue is recommended for larger communities)

                    luceos Nope sorry. In 25 days active support for 7.4 will be dropped, after that it will only receive security updates.

                    You know that 60-70% of Flarum installations is on PHP 7?

                      rob006 Given the extension is primarily designed to run in the Blomstra environment where they have full control over PHP versions and what not I don't find it surprising at all.

                      Truth be told they could just keep it closed source and available to their customers only, instead they've open sourced it so anyone can use it (assuming the right PHP versions and what not) so I'd say that's already pretty generous.