I've posted some concerns and questions in the FoF Upload discussions that there needs to be some way to clean up orphaned files, i.e. files that are uploaded to the destination storage but are not referred to in any forum post (or private discussion). Since it doesn't seem like a trivial task and maybe won't get implemented soon, and since I'm a Java developer (and don't know anything about PHP development to be able to contribute to the extension), I decided to start another approach:
Write a script or an external executable that will work directly with the DB and thus analyze all posts for URL-s, then query the AWS S3 bucket (that's the storage I use, but it can also support local storage) and see whether there are files on the storage that are not used in any post and then delete the files from the storage, as well as clean-up the media library entities.
But since I'm on a shared hosting which doesn't support Java, it seems I will have to export a DB dump and just run the analysis on my own computer and the program will delete the storage files and then generate DB update/delete statements to remove the upload entities from the DB.
But since that sounds a bit awkward, I thought maybe I should instead use the forum REST API. The Java program will still run external to the forum (on my computer) and will connect to Amazon S3 (or to the forum storage, e.g. SSH) and to the forum REST API. But I'm not sure that will give me full access, so here are some clarifying questions:
- If I use an admin account, can I have access to all public and private posts through the REST API (so that I can crawl the content for URL-s, upload tags, etc.)?
- Do I have access to all upload entities (not the actual files, they are on the storage, but the FoF media library entities, they are in the DB) from the FoF Upload extension through the REST API? If that's not possible, at least can I iterate through all the users and then take their media library (upload entities) from the REST API? Can I delete those through the REST API?
- Where's the REST API documentation for Flarum? For extensions, e.g. for FoF Upload? Is there an automatic/dynamic API documentation generated for the REST endpoints exposed by Flarum and its extensions or I should rely on external static API docs?
- What is the authentication model? Is it HTTP basic or through a token or something, where's the doc about that?
- Any suggestions if that type of approach to the problem is good at all?