Reorganization of translation keys

Dominion · Sep 7, 2015

Toby Whenever I see someone writing "login" as a verb, I die a little inside.

I'm the same way. Oh, I may slip up now and then when trying to get my thoughts down quickly, but I think it's important to get it right when using language in things intended for publication (like buttons in a software app.)

Toby Are we really losing that much efficiency?

On the whole I tend to agree, the loss isn't all that huge. But before I answer in greater detail (which I'll do after I've had breakfast and coffee), let me ask a quick question to be sure I'm understanding you correctly. You're talking about:

Creating discrete keys where the string is reused in a different way (e.g. button/link versus title), but
Using the same key where the string is reused in a different place (e.g. login box versus signup box)

Have I got that right? I agree ... even in cases where translators may not need a variant, it's best to be proactive.

DSitC Personally, I'd prefer a format that would allow to omit the context, both in definition and translation method call and make intelligent fallbacks, depending on the data given. But with YAML this is not really possible.

Could you describe what you have in mind, with a concrete example or two perhaps?

DSitC · Sep 8, 2015

Dominion Sure thing.

language file:

key1: value1
key1.context1: value1.1
key1.context2: value1.2

key2.context1: value2.1
key2.context2: value2.2

key3: value3

code:

app.trans('key1');             // ==> "value1"
app.trans('key1', 'context2'); // ==> "value1.2"
app.trans('key1', 'context5'); // ==> "value1"

app.trans('key2');             // ==> "value2.1"
app.trans('key2', 'context2'); // ==> "value2.2"
app.trans('key2', 'context5'); // ==> "value2.1"

app.trans('key3');             // ==> "value3"
app.trans('key3', 'context1'); // ==> "value3"

Dominion · Sep 8, 2015

DSitC Sure thing.

Thanks! I'm following you now.

I agree, that would be way to go if we want to condition by context in the code. But as I said above, I'm not sure we can justify the work it would take to put that system in place. Now if Flarum were a much bigger project (say, along the lines of Tiki Wiki or something) it might be a different story.

For our purposes, I think it may be best to limit the use of context to conditioning regular grammatical phenomena, such as plurality or gender. Anything more than that may well be more than translators want to deal with.

I'm also a bit concerned that the fallback key would be less descriptive than the variants. I'll explain in detail below. First I'd like to reply to the latest by Toby as promised (though it's been a while since eggs and baccy):

Toby But from the coder's perspective, they should be confident that they can follow a set of rules and name a key correctly and consistently. Does that make sense?

It does! Of course, even a very complicated system can be made consistent if the rules are detailed enough. But in order to make life easier for translators, it's a good idea to give the devs a consistent set of rules that will be easy to follow. So simpler is definitely better.

Toby I think my preference would be to enforce contextual suffixes, chosen from a list, for all keys.

I agree, although I wasn't thinking along those lines at first. For much the same reason that @DSitC would rather have fallbacks, my instinct would be to suffix only one of the key names. Why use two bits of information to make the distinction when one would do? It would be easier to implement, and it could be made consistent. But it would require more complex rules (e.g., "in a button vs. title situation, the latter gets the prefix") which in the long run would probably end up being less efficient.

What's worse, however, is that it would leave one of the keys less descriptive than the other. This makes life harder for translators. I'll have to put on my translator hat to explain why.

One change of hats later...

Okay. When I'm translating resources, the first thing that comes to mind isn't "What does this string mean?" That bit goes by so fast, I barely notice it. No, the first thing I want to know is: "Where can I see this?" I want to know how the string is used so I can check how much space I've got to work with, imagine my translation in context, and so on.

The suffix on core.thingamajig_title will get me most of the way there. Once I know it's a dialog title, I can probably figure out how to display the dialog box. But what of core.thingamajig? There's nothing there to say what sort of a thing I should be looking for. Button? Link? Table heading? I'm left guessing, with only the string itself as a clue.

So yes, we should put suffixes everywhere. But that still gets me only partway. Your idea of prefixes can also come in handy in some cases. For example, putting core.notification_method_alert and core.notification_method_email next to each other will help the translator recognize that this "Email" is different from the other four. (I see you were anticipating the core.email issue already!)

But ... say core.thingamajig_action (like core.log_in_action) is used in multiple locations. As a translator, I want to check all of them, because maybe there's one or two cases where my translation will be too long to fit comfortably in the space provided. So where should I be looking? For that matter, how do I know when I've found them all? At present, I'd have to run a global search on every file with an app.trans call to be sure. That were best avoided.

Unfortunately, adding this information to the keys would mean creating a discrete key for every string in the program, and that would definitely be going too far. As you said:

Toby that's where the UI would help.

As an alternative, we could provide some info as comments in the YML file. That wouldn't take care of the other two birds I mentioned, but it would serve as a stopgap. And since I'm going to be editing key names anyway, maybe this would be a good opportunity to put that information in as well.

And if we do that ... I've been wondering if it would be possible to leverage those comments for display by the GUI, if and when you get around to adding one. If so, then it might be a good idea to give a little thought now to how the comments may best be formatted. (Though to do that, I guess we'd have to give some thought to the design of the GUI. Hoo-boy, add one little idea and the work just starts piling up!)

... I'd best wait for your comments on that thread.

In the meantime, I think I've got a handle on what's needed now, so I'll start revising my matrix with the new names and draft up a set of rules to explain them. At some point I'll probably want to ask for your help in clarifying the list of suffixes available, and so on ... but that's probably a few days off.

Dominion · Sep 8, 2015

Actually, I've already come up with a couple questions I thought I'd better ask sooner rather than later.

About suffixes:

I could probably come up with a list of names for things in a GUI (such as title, action, etc.), but what I may come up with may not match the technical terms you're already using. You wouldn't happen to have a handy list of names sitting around, would you?

If you don't have list, I can just poke around the code and see what you're using for class names.

About prefixes:

My instinct is to use prefixes to group things by location. In many cases, that could double as a hint as to which files the string is used in. For example, strings used only in the "Change Email" modal would get the "change_email_" prefix, while the button that opens dialog would be "settings_change_email_action".

The advantage to this is that it would allow translators to concentrate on and finish specific areas of the UI in a fairly efficient manner (the stumbling block there being any global strings involved). The downside is that this will scatter duplicate strings about, instead of clumping them together. I figure we've all got Search functions in our editors for that, but I thought I'd ask for your take on things.

From the string creation point of view, grouping by location might make it easier for devs to come up with consistent key names. But it will also make it harder to know when they're creating a string that already exists elsewhere. They might overlook an existing global string, or a string from another location that should be merged with the new string to form a global. We'd need a way to prevent that.

The best way might be to create some sort of string database that can be searched from either direction. I'm not sure how practical that would be, though.

Toby · Sep 9, 2015

Dominion You wouldn't happen to have a handy list of names sitting around, would you?

If you don't have list, I can just poke around the code and see what you're using for class names.

I don't have a list, sorry! That'd be great if you could do that.

Dominion My instinct is to use prefixes to group things by location. In many cases, that could double as a hint as to which files the string is used in. For example, strings used only in the "Change Email" modal would get the "change_email_" prefix, while the button that opens dialog would be "settings_change_email_action".

Yes, I like this idea!

Dominion But it will also make it harder to know when they're creating a string that already exists elsewhere. They might overlook an existing global string, or a string from another location that should be merged with the new string to form a global. We'd need a way to prevent that.

I think a quick search through the en.yml file for the string they're translating would be sufficient though. If they find an existing key that's suitable for the new location, they can use it; otherwise, create a new one.

Dominion · Sep 9, 2015

Toby I think a quick search through the en.yml file for the string they're translating would be sufficient though. If they find an existing key that's suitable for the new location, they can use it; otherwise, create a new one.

In principle I agree, but this is why I was asking about whether extensions can use strings from the core. They'd have to check through the YML file for both the current extension and the core.

If this was a much larger project, that would quickly get onerous ... but I think you're right, we can probably handle it by explaining the process carefully in the documentation.

Toby · Sep 9, 2015

Dominion As an alternative, we could provide some info as comments in the YML file. That wouldn't take care of the other two birds I mentioned, but it would serve as a stopgap. And since I'm going to be editing key names anyway, maybe this would be a good opportunity to put that information in as well.

And if we do that ... I've been wondering if it would be possible to leverage those comments for display by the GUI, if and when you get around to adding one.

Yes and yes – but let's not worry so much about the format of the comments right now, so long as they contain the essential information. Which, off the top of my head, would simply be:

A descriptive sentence or two about where in the UI the translation is used
The name of the file(s) that it is used in

Dominion · Sep 9, 2015

Toby Yes and yes – but let's not worry so much about the format of the comments right now

I wholeheartedly agree ... but considering that some strings can be used in several locations/files, the comments can quickly get pretty big. So at the very least, it would be a good idea to know whether I should put those all in the same line, or add a separate comment line for each use, for example.

Of course, the latter approach would also add space between the keys we've worked so hard to group by prefix.

Toby · Sep 9, 2015

I'm quite happy for the comments to be as large as they need to be, for example:

# Used in such and such location.
# js/forum/src/components/SignUpModal.js
# js/forum/src/components/LogInModal.js
email_placeholder: Email

# Used in such and such location.
# js/forum/src/components/AnotherOne.js
something_else: I suck at thinking of examples

Let's give every string first-class documentation

Dominion · Sep 9, 2015

Okay, thanks ... I can work with that.

Or hey ... since we'll be grouping by location prefix, maybe I can do something even fancier. One comment to explain the group, and individual comments as needed to explain where to look in the code.

That would probably be harder to adapt to the UI later, but that's another bridge we can burn when we get to it.

Franz · Sep 9, 2015

Toby The name of the file(s) that it is used in

Hmm, isn't that just asking to get out-of-date?

Toby · Sep 9, 2015

Franz True. Thinking a bit more about how the translation UI would work, it's probably not necessary information, so let's leave it out.

Dominion · Sep 9, 2015

Franz Hmm, isn't that just asking to get out-of-date?

Oh. Dang, I hadn't even considered that aspect. Yeah, it would mean a lot of maintenance, wouldn't it?

Toby True. Thinking a bit more about how the translation UI would work, it's probably not necessary information, so let's leave it out.

Done. Though I think the idea of having a natural-language description of where to look for stuff is worth doing, and we could get by with only one comment per prefix, rather than one per string. Comments for globally used strings could be more like what you suggested, with a natural-language description of each place to look.

DSitC · Sep 9, 2015

Franz On the other hand, it's not that hard to write a script that scans all javascript files and translations files and automatically inserts those comments.

Franz · Sep 9, 2015

It still needs to be run.

DSitC · Sep 9, 2015

Franz Yep.

If you're working with a GUI, this really becomes obsolete, true. You could use such a scanning script in the GUI however. Something like a "More Info on this key"-Button that will then start a quick parse and shows the translator in which files/lines the translation calls are made with this key.

Dominion · Sep 9, 2015

Yes ... when you get right down to it, though, I don't think we really want to be encouraging translators to be poking around in the code. So it's probably best to leave the filenames out of the YML.

Not because translators shouldn't poke around in the code, but because they shouldn't have to. There are times when it's unavoidable, but for the most part, we want them to get the job done in the YML only. So I'd like to make editing the YML as, erm, non-technical an experience as it can be.

DSitC You could use such a scanning script in the GUI however. Something like a "More Info on this key"-Button that will then start a quick parse and shows the translator in which files/lines the translation calls are made with this key.

But this sounds like an excellent idea, for translators who need it.

EDIT: Sorry, that wasn't making much sense the first time around.

Dominion · Sep 10, 2015

Okay, my first move has been to go through and pull out the globals. Next I plan to start grouping the rest of the strings by location, and give some thought to prefixing. Then it'll just be a matter of finalizing the suffixes.

But before I get on to that, the process of organizing the globals has raised a couple questions.

Is it possible to combine strings?

I think I may have been a bit too optimistic about a couple of reuse instances. Cases in point:

The "Log In" link at the bottom of the signup modal
The "Sign Up" link at the bottom of the login modal

At first glance, these two links look like the other "Log In" and "Sign Up" links/buttons. But they're different in that they come with context, i.e. the core.before_log_in_link and core.before_sign_up_link strings, respectively. Some translators may need the freedom to embed the link in the context sentence, like so:

If you already have an account, please log in instead.

Even if there is no need for non-link text after the link, the hardcoded space separating the link from the context is bound to cause trouble for some translators. So each of these string pairs should be handled as one.

There's no need for you to act on these just yet, since there may be others. I'll compile a complete list of changes that need to be made when I'm ready to start editing key names. Or I can make the changes myself, with your approval, if you can help me out with the syntax. (I'm even less experienced with JS than I am with PHP.) For now, I'd merely like to confirm that making such changes won't create any problems.

How about unique key names?

After removing the above-mentioned pair of instances, we can summarize the globals situation thusly: we've got a total of 14 global strings, each used in only two or three places, for a total of just 35 app.trans calls.

That's not an awful lot. In fact, the numbers are so small that I've started to wonder whether it might be a good idea to use a unique key name for every string. Here's how we could do it:

The dev would start by prefixing every key name by location.
Each string would therefore be grouped with all other strings in the same location.
The key names for global strings would be followed by a reference as DSitC has suggested.
The globals would be grouped together for easy location.
Comments on globals would merely list the unique keys that reference them.

Please note that this doesn't mean we'd necessarily have to use a unique key name for every app.trans call. Cases such as core.bio_placeholder, which is used twice in the same location, could use the same key name. But it would mean adding 21 new keys, and about 35 lines to the YML file (not counting comments).

This approach would have advantages for both translators and devs:

From the translator's point of view, it would make it easier to locate a global string that's being used in the location he/she is concentrating on, and then quickly cross-check whether the translation will work in other locations where the string is used. And if for some reason the global string just isn't working out for a specific location, the translator would not need to ask for the string to be split: he/she could just replace that reference with a string value that fits.

Of course all the keys that we have decided to split (like the "button versus title" situations) would also reference the globals, so that would reduce the number of duplicate strings to be translated to zero. And in the rare case where a translator finds him/herself translating two different English strings into the exact same phrase, he/she can extract that phrase as a global and point both keys at it, again without bothering the devs.

From the developer's point of view, there is the obvious advantage of not having to handle as many requests for new strings. Beyond that, it will allow us to make the rules for naming keys simpler and easier to follow.

Of course, someone will have to check whether there's a global string to be referenced in each case, but this would no longer need to be done as part of the coding process. Adding strings to code would become a simple matter of (1) adding a new, unique key name (including a quick check to be sure that it is indeed unique) and then (2) adding that key and its string to the YML file. The extraction of duplicates as globals could be left for later cleanup, which is an easy task that doesn't need to be done by a programmer.

The downside to all this would be any performance issues that might arise from the referencing mechanism. Not to mention the effort involved in implementing such a mechanism, of course.

Please let me know what you think of this idea!

Dominion · Sep 10, 2015

For simplicity, I limited the above discussion of unique key names to the core. Things get slightly trickier if we take extensions into account. Here are some things we'd need to consider:

The proposal implies that core strings can't be used directly in any extension code. We'd want to have a line for each string in the extension YML. This is to preserve uniqueness; direct use of core strings would negate the advantages of the system.

So all realization of extension keys as strings from core would be handled by the YML referencing mechanism. Is this likely to cause any issues?
Would extensions be allowed to reference non-global core strings? (I would suggest that this be allowed only when the name of the extension key exactly matches that of the core key being referenced, i.e. when the string is used in the same manner and location as the referenced core string.)
Seen from this angle, namespacing could be handled as a simple fallback mechanism: if you don't find a string in the extension YML, look for it in the core YML!
When an extension wants to reference a non-global core string that isn't used in the same manner or location (assuming we choose to allow that), should that string be separated out as a global? (This seems a reasonable thing to do, but it would increase the number of references in the core YML, obviously.)
When a core key is referenced by an extension, should the core key be given a comment to indicate this?

Regarding the last two points, it goes without saying that we'd only be able to do this for bundled extensions. Third party devs would need to track their string usage on their own and be ready to make adjustments if a core string that they've been referencing gets changed. (But the uniqueness factor would make it easier for them to respond to such a situation, since they could merely replace the reference with a string.)

There may be other things I'm not taking into account. My thinking re: extensions is still a bit wooly at this point.

Franz · Sep 10, 2015

Sounds good to me. Very solid.

Any negative performance impact of the referencing mechanism can be compensated for by simply compiling all locales into one PHP file (with references already resolved) whenever an extension is added / updated.