Just found a small rendering issue. If the URL is too long it will be visualized outside the boundaries of the little rectangular frame that contains the preview:

I'm not sure if it has something to do with the fact it's a MP3 file that can't be previewed. Besides, I'm using the Inline Audio extension that shows a little play button, maybe it messes up with the Rich Embeds somehow?

    CyberGene it's normal for that popup to appear even for links that can't be previewed, I added it because otherwise there's no way to access the refresh button in case that was a temporary error with the target website.

    I will add something to prevent URLs going out of the screen.

    Customer feedback. 😉

    For me, this works better than ever. But, I’ll admit, I leave most settings off. I may try turning on the proxy and scraping again one day. But, keeping the settings/features minimal, I’ve had no issues and it’s so fast! 😍

      010101 thanks!

      Just using the OpenGraph feature is perfectly fine!

      The proxy features are only really necessary if you are using advanced security measures on your website like a CSP configuration that blocks external image loading.

      I will release a cloud subscription in the near future that allows using all the advanced options without the need to configure any API tokens, and it will also come with a CDN domain to load images from to give a tiny performance improvement by stripping cookies out of the requests.

      a month later

      How can I blacklist file extensions? For instance I don’t want this extension to show the preview for audio files. They are covered by the Inline Audio Player extension, however when I click on the link to play it, a preview is shown below and it obscures the other links for audio files that follow. An option to provide a regex for blacklisting would suffice I guess, such as .*\.mp3

        CyberGene in the extension page in the admin panel, under URL Blacklist, add a new line containing /\.mp3$/

        This should do it.

        If you wanted to still retrieve the embed (for example for Separated layout) but just never show the failed embed inline hover style, I could try adding a new option. But that seems like a very niche situation.

        a month later

        I went to install this at a new forum and it won't activate. Error is:

        Syntax error or access violation: 1071 Specified key was too long; max key length is 3072 bytes

        Update: I changed the error column in the database from utf8 to latin which is less bytes. That let me activate the extension. Then I changed it back the ut8 in the database. We'll see if I have issues in the future, but for now, this hack got me going again.

        Maybe something to think about for a future update: Can anything be reduced in the database to prevent this error in more restrictive environments?

          010101 can you clarify which column was affected by the error? You mean the error column on kilowhat_rich_embeds ? I'm a bit confused because that column is not supposed to be a key at all so there shouldn't be any opportunity for this error to arise.

          If you have any error related to the url_hash column, please let me know. Changing the index or column type of that column might break the extension.

          If the error relates to the url column, it doesn't really matter because the index on that column is then deleted by one of the migrations that follow. That column should stay utf8 though.

          If the problem relates to the key name, maybe your database table prefix is too long?

          EDIT: please include your MySQL/MariaDB version information. You can run select version() as version via SQL on the SQL server, and I think this information is also shown in the php flarum info output and the admin dashboard.

          2 months later

          Version 1.2.4 - November 27, 2022

          This is a security update. All users should upgrade as soon as possible.

          • Changed Sanitize SVG files to prevent image proxy endpoint being used as part of an XSS attack.
          • Changed explicitly whitelist common MIME types for images instead of previous image/* to reduce attack vectors.
          • Changed limit proxy file size to 5MB to make it harder to use as part of a denial of service attack.
          • Changed prevent usage of proxy endpoint if image proxy feature is disabled.
          • Changed proxy errors now returned as images. Previous JSON responses weren't very useful as they always resulted in broken image tags in frontend.

          I am not aware of these issues having been actively exploited. I discovered them through internal review. The XSS was only possible by tricking a user to click a URL that points to the proxy script.

          If you are using a whitelist of trusted domains, the XSS was only possible if an attacker could upload a malicious SVG file to one of the trusted domains.

          5 months later

          Amazon.com drops what looks like a CAPTCHA (in the picture) and the rest of the information is not able to capture it, is there anything I can do? not sure if it's on my end.

            Darkle do you have an example Amazon link I could try on my test site to see if it happens on the first try?

            Otherwise in general I assume it must be their anti-bot protection kicking in, though it makes me wonder how they hope for less known search engines to crawl the page meta tags if they refuse to return the page 🤔 If it doesn't happen all the time or only recently started happening, your server IP might have been added to their bot list. Maybe it'll be automatically whitelisted again after a given time, as those bans are rarely permanent and just temporary web firewall rules.

            I could add an option in a future version to retry failed requests, either for all links or specific domains. Though if the IP was temporarily restricted due to bot activity, re-trying too soon might just make things worse.

            I could also introduce an API-based embed for Amazon, it might be more efficient and less likely to break if you have many Amazon links on the forum. Not sure if Amazon has a free API for that kind of use though.

              clarkwinkelmann Actually any amazon.com link, it's not about a particular link, it must be what you say, some kind of anti bot protection, honestly I couldn't tell you if it's been up for a long time or a short time, I'll keep an eye out to see if it's removed.

                Darkle I have this problem when I use anti-tracking or ad blocking extensions in my browser (the adblockers block tracking cookies too). Disable all browser extensions temporarily and see if it helps.

                  Darkle CyberGene good point, it could make sense to check the image URL and whether there's any redirect in the browser dev tools. My initial assumption was that the server-side crawler saw a captcha page and saved the captcha image in the database, but maybe the crawler actually gets the correct meta image but Amazon switches the client-side response based on information sent by the browser.

                  This would still be an unexpected behavior if they do that with the OpenGraph image since that's one of the exact use cases OpenGraph is for.

                  If you use the image proxy feature, then it's probably still down to the server IP, as the proxy script does not forward any client header to the final website.

                  Do you have the "fallback" HTML crawler enabled? If the crawler saves the image to a captcha I assume it must be enabled, it would be odd for Amazon to put that image in the OpenGraph data of the error page.

                  16 days later

                  Version 1.2.5 - May 22, 2023

                  This is a security update. All users should upgrade as soon as possible.

                  • Changed limit OpenGraph/Rich/Image crawler download to 5MB per URL to make it harder to use as part of a denial of service attack.
                  • Fixed issue where the Image crawler could be exploited to access meta information of arbitrary images on the server filesystem or intranet, or leak server IP despite a blacklist.

                  The vulnerability affected all versions of the extension since 1.1.0.

                  Attack vector: an attacker could post a link to a malicious HTTP endpoint that would return a special payload.
                  The endpoint would have to be an attacker-controlled server or a file hosting service that can be fooled into returning an incorrect MIME type in the HTTP headers.
                  The extension would then automatically access any arbitrary URL or file path contained in the malicious payload.

                  Exposed information: If the arbitrary URL points to a valid image on the filesystem or intranet, the image width, height and EXIF data would be made available through the Flarum REST API to anyone with permission to view embeds.
                  If an asynchronous queue is not used, an attacker could time the request to try guessing whether a file (image or not) exists at a given path or intranet URL.
                  The server IP is sent along with the request to the arbitrary URL, which could leak the server IP if a blacklist usually restrict it from being shared.

                  Mitigating circumstance: if a whitelist/blacklist was used to restrict the domains to trusted websites, it's unlikely that an attacker could host the required attack payload on a regular un-compromised website.
                  If a whitelist/blacklist was not used, the IP leak is not a vulnerability since any user could already publish a link to a server they control and get it accessed by the crawler.

                  Additional remedial steps: scan the kilowhat_rich_embeds.exif column of your database for any maliciously exposed information.
                  Set the value to MySQL NULL to redact it.

                  There is no evidence of this vulnerability being exploited, it was discovered through internal audit.

                  This version is compatible with Flarum versions 1.2 to 1.8.

                  a month later

                  I usually upgrade the entire forum from time to time just to have all the latest versions of extensions. However this time I couldn't do that due to:

                  Updating dependencies
                  Your requirements could not be resolved to an installable set of packages.
                  
                    Problem 1
                      - Root composer.json requires kilowhat/flarum-ext-rich-embeds *, found kilowhat/flarum-ext-rich-embeds[1.2.5] in the lock file but not in remote repositories, make sure you avoid updating this package to keep the one from the lock file.

                  @clarkwinkelmann what could be the reason?

                    GreXXL most likely your Extiverse token to download premium extensions has expired.

                    Yes, that was it, thanks 👏🏻 Do these tokens expire from time to time? I've totally forgotten about Extiverse, this is the only extension I use that requires Extiverse and I set it up when I first purchased it and then forgot about it.

                      CyberGene Sometimes they do it spontaneously, I must have done it 3 or 4 times, even though the interface says they have not expired, for some reason they expired, a minor issue anyway.