[deleted] great. That's the point of this post. I've been honing this for a while and whilst it's simple, it's also powerful in terms of what you can do with it.
It's important to know that seeing as RSS feeds vary in terms of update frequency, we need to keep a track of what has been processed and what hasn't. We achieve this with a simple database that sits outside of flarum. As each new item is processed, it's added to the database so that on subsequent runs, we do not process again hence avoiding duplicates. Each RSS item is then parsed, and if there is a URL match, it's skipped.
Obviously, if you process multiple feeds, you can still land up with some duplicates if you pull similar stories from multiple sources.