Reorganization of translation keys

DSitC · Sep 7, 2015

Additional ideas, solutions and food-for-thought: https://slexaxton.github.io/Jed/

Dominion · Sep 7, 2015

Hmm. I seem to have flip-flopped a bit regarding approach, and it's occurred to me to wonder why.

As Franz said, it was indeed @Toby who first mentioned the idea putting context in the code (in his first reply above, which I find myself unable to mention for some reason). He also added a caution about the extra complexity this would involve, and I agreed that it didn't seem worth the trouble:

Dominion In fact, it sounds like the sort of thing that, if you're going to do it at all, it should be applied across the board according to some standardized scheme. And that would be a lot of work.

Yet when Franz brought the idea up again, I found myself thinking it might be worth the trouble:

Dominion It would be a rather big change to make, because it would be best to do apply it everywhere, but definitely worth the trouble!

Why did I suddenly find the idea so appealing? Well, after thinking about how difficult it would be to provide translators the information they need while keeping the key naming scheme both consistent and efficient, I began to think that it might be easiest manage the consistency angle in the code. It seems to me that it would be easier to devise a format for adding context there, than it would be to define a consistent key name format.

(Implementation, however, would be an entirely different matter.)

But even if we're okay with the added complexity that Toby warned of, that's not really the end of it. We'd have to come up with some way of letting the translators know what their options are. Without that info, translators would be forced to peek at the code to see what context keys were available. So we'd have to provide them with documentation, and then we'd have to make sure the code adhered to the rules in the documentation.

... And that means the context would have to be supplied uniformly in the code, everywhere. Which means my instinct (that it's the sort of thing that needs to be applied across the board) was spot on.

So ... given the extra effort involved, is it worth it? Let's look at the numbers. Of over 100 strings, only seven are reused in a way that could pose an issue for translators, and only one of those (core.email) strikes me as truly urgent. In terms of instances (app.trans calls) it comes to about 18 out of 128, of which only four are urgent.

These numbers will change as we add strings for the admin interface, take extensions into account, etc. But assuming they don't change too much, that's a lot of work to cover only a few situations. Again, Toby's caution springs to mind.

From this perspective, it seems we were right to focus on the key names. So instead of looking for ways to bend YAML to our collective will, we should probably be thinking about how we can provide translators with descriptive key names that strike a good balance between consistency and efficiency.

I'm starting to get some ideas about that, but I need some more time to flesh them out. So I'll leave this here for now.

DSitC · Sep 7, 2015

Concerning more complex pluralization rules, here's a pretty exhaustive overview: http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/language_plural_rules.html

Take a look at arabic, that's really... well... special...

Toby · Sep 7, 2015

OK, time for me to review everything that's been said and offer a solid opinion on all of this!

Dominion In most cases it's not all that hard for a translator to guess what, say, "Log In" means. (Especially if it's spelled correctly. I congratulate @Toby on being one of the virtuous few who get it right! )

Thanks! This is actually a huge source of pride for me. Whenever I see someone writing "login" as a verb, I die a little inside.

Dominion From this perspective, it seems we were right to focus on the key names. So instead of looking for ways to bend YAML to our collective will, we should probably be thinking about how we can provide translators with descriptive key names that strike a good balance between consistency and efficiency.

Completely agreed.

Dominion But if we know we'll have one [a translation UI] eventually, then for the time being we can settle for key names that are less than optimally descriptive and consistent.

But in the case of the duplicate strings used in different contexts, would we not need to split them and name them regardless? To me, the translation UI sounds like an amazing tool (and I will reply to that thread soon!), but I still think this problem should be solved to a large degree by a solid key naming scheme. I'm not expecting to come up with something where translators can just look through the YAML file and immediately interpret the meaning of certain prefixes/suffixes – that's where the UI would help. But from the coder's perspective, they should be confident that they can follow a set of rules and name a key correctly and consistently. Does that make sense?

So after considering all of this – but admittedly, without having been through all of the strings and their uses in excruciating detail like @Dominion has – I think my preference would be to enforce contextual suffixes, chosen from a list, for all keys. i.e. Greater consistency, at the expense of efficiency:

Split up core.log_in into core.log_in_title and core.log_in_action? Yes.
Rename core.thingamajig to core.thingamajig_action (even if there is no core.thingamajig_title)? Yes.

I understand this will result in more duplication than may be ideal, but on my brief look through Dominion's amazing matrix of the strings and their uses, I noticed that all of the duplicated strings are very short. Since we're proposing a suffix, they will all be listed together (alphabetically) in the YAML file; The translator will be able translate them all at once, usually just by copy+pasting the first one onto the others. My point is: Are we really losing that much efficiency?

Anyway, that's where my thoughts currently stand. What do y'all think?

DSitC · Sep 7, 2015

Toby If YAML as a translation source is to stay, i'd cast my vote in that favor, too. :-)

Personally, I'd prefer a format that would allow to omit the context, both in definition and translation method call and make intelligent fallbacks, depending on the data given. But with YAML this is not really possible.

Dominion · Sep 7, 2015

Toby Whenever I see someone writing "login" as a verb, I die a little inside.

I'm the same way. Oh, I may slip up now and then when trying to get my thoughts down quickly, but I think it's important to get it right when using language in things intended for publication (like buttons in a software app.)

Toby Are we really losing that much efficiency?

On the whole I tend to agree, the loss isn't all that huge. But before I answer in greater detail (which I'll do after I've had breakfast and coffee), let me ask a quick question to be sure I'm understanding you correctly. You're talking about:

Creating discrete keys where the string is reused in a different way (e.g. button/link versus title), but
Using the same key where the string is reused in a different place (e.g. login box versus signup box)

Have I got that right? I agree ... even in cases where translators may not need a variant, it's best to be proactive.

DSitC Personally, I'd prefer a format that would allow to omit the context, both in definition and translation method call and make intelligent fallbacks, depending on the data given. But with YAML this is not really possible.

Could you describe what you have in mind, with a concrete example or two perhaps?

DSitC · Sep 8, 2015

Dominion Sure thing.

language file:

key1: value1
key1.context1: value1.1
key1.context2: value1.2

key2.context1: value2.1
key2.context2: value2.2

key3: value3

code:

app.trans('key1');             // ==> "value1"
app.trans('key1', 'context2'); // ==> "value1.2"
app.trans('key1', 'context5'); // ==> "value1"

app.trans('key2');             // ==> "value2.1"
app.trans('key2', 'context2'); // ==> "value2.2"
app.trans('key2', 'context5'); // ==> "value2.1"

app.trans('key3');             // ==> "value3"
app.trans('key3', 'context1'); // ==> "value3"

Dominion · Sep 8, 2015

DSitC Sure thing.

Thanks! I'm following you now.

I agree, that would be way to go if we want to condition by context in the code. But as I said above, I'm not sure we can justify the work it would take to put that system in place. Now if Flarum were a much bigger project (say, along the lines of Tiki Wiki or something) it might be a different story.

For our purposes, I think it may be best to limit the use of context to conditioning regular grammatical phenomena, such as plurality or gender. Anything more than that may well be more than translators want to deal with.

I'm also a bit concerned that the fallback key would be less descriptive than the variants. I'll explain in detail below. First I'd like to reply to the latest by Toby as promised (though it's been a while since eggs and baccy):

Toby But from the coder's perspective, they should be confident that they can follow a set of rules and name a key correctly and consistently. Does that make sense?

It does! Of course, even a very complicated system can be made consistent if the rules are detailed enough. But in order to make life easier for translators, it's a good idea to give the devs a consistent set of rules that will be easy to follow. So simpler is definitely better.

Toby I think my preference would be to enforce contextual suffixes, chosen from a list, for all keys.

I agree, although I wasn't thinking along those lines at first. For much the same reason that @DSitC would rather have fallbacks, my instinct would be to suffix only one of the key names. Why use two bits of information to make the distinction when one would do? It would be easier to implement, and it could be made consistent. But it would require more complex rules (e.g., "in a button vs. title situation, the latter gets the prefix") which in the long run would probably end up being less efficient.

What's worse, however, is that it would leave one of the keys less descriptive than the other. This makes life harder for translators. I'll have to put on my translator hat to explain why.

One change of hats later...

Okay. When I'm translating resources, the first thing that comes to mind isn't "What does this string mean?" That bit goes by so fast, I barely notice it. No, the first thing I want to know is: "Where can I see this?" I want to know how the string is used so I can check how much space I've got to work with, imagine my translation in context, and so on.

The suffix on core.thingamajig_title will get me most of the way there. Once I know it's a dialog title, I can probably figure out how to display the dialog box. But what of core.thingamajig? There's nothing there to say what sort of a thing I should be looking for. Button? Link? Table heading? I'm left guessing, with only the string itself as a clue.

So yes, we should put suffixes everywhere. But that still gets me only partway. Your idea of prefixes can also come in handy in some cases. For example, putting core.notification_method_alert and core.notification_method_email next to each other will help the translator recognize that this "Email" is different from the other four. (I see you were anticipating the core.email issue already!)

But ... say core.thingamajig_action (like core.log_in_action) is used in multiple locations. As a translator, I want to check all of them, because maybe there's one or two cases where my translation will be too long to fit comfortably in the space provided. So where should I be looking? For that matter, how do I know when I've found them all? At present, I'd have to run a global search on every file with an app.trans call to be sure. That were best avoided.

Unfortunately, adding this information to the keys would mean creating a discrete key for every string in the program, and that would definitely be going too far. As you said:

Toby that's where the UI would help.

As an alternative, we could provide some info as comments in the YML file. That wouldn't take care of the other two birds I mentioned, but it would serve as a stopgap. And since I'm going to be editing key names anyway, maybe this would be a good opportunity to put that information in as well.

And if we do that ... I've been wondering if it would be possible to leverage those comments for display by the GUI, if and when you get around to adding one. If so, then it might be a good idea to give a little thought now to how the comments may best be formatted. (Though to do that, I guess we'd have to give some thought to the design of the GUI. Hoo-boy, add one little idea and the work just starts piling up!)

... I'd best wait for your comments on that thread.

In the meantime, I think I've got a handle on what's needed now, so I'll start revising my matrix with the new names and draft up a set of rules to explain them. At some point I'll probably want to ask for your help in clarifying the list of suffixes available, and so on ... but that's probably a few days off.

Dominion · Sep 8, 2015

Actually, I've already come up with a couple questions I thought I'd better ask sooner rather than later.

About suffixes:

I could probably come up with a list of names for things in a GUI (such as title, action, etc.), but what I may come up with may not match the technical terms you're already using. You wouldn't happen to have a handy list of names sitting around, would you?

If you don't have list, I can just poke around the code and see what you're using for class names.

About prefixes:

My instinct is to use prefixes to group things by location. In many cases, that could double as a hint as to which files the string is used in. For example, strings used only in the "Change Email" modal would get the "change_email_" prefix, while the button that opens dialog would be "settings_change_email_action".

The advantage to this is that it would allow translators to concentrate on and finish specific areas of the UI in a fairly efficient manner (the stumbling block there being any global strings involved). The downside is that this will scatter duplicate strings about, instead of clumping them together. I figure we've all got Search functions in our editors for that, but I thought I'd ask for your take on things.

From the string creation point of view, grouping by location might make it easier for devs to come up with consistent key names. But it will also make it harder to know when they're creating a string that already exists elsewhere. They might overlook an existing global string, or a string from another location that should be merged with the new string to form a global. We'd need a way to prevent that.

The best way might be to create some sort of string database that can be searched from either direction. I'm not sure how practical that would be, though.

Toby · Sep 9, 2015

Dominion You wouldn't happen to have a handy list of names sitting around, would you?

If you don't have list, I can just poke around the code and see what you're using for class names.

I don't have a list, sorry! That'd be great if you could do that.

Dominion My instinct is to use prefixes to group things by location. In many cases, that could double as a hint as to which files the string is used in. For example, strings used only in the "Change Email" modal would get the "change_email_" prefix, while the button that opens dialog would be "settings_change_email_action".

Yes, I like this idea!

Dominion But it will also make it harder to know when they're creating a string that already exists elsewhere. They might overlook an existing global string, or a string from another location that should be merged with the new string to form a global. We'd need a way to prevent that.

I think a quick search through the en.yml file for the string they're translating would be sufficient though. If they find an existing key that's suitable for the new location, they can use it; otherwise, create a new one.

Dominion · Sep 9, 2015

Toby I think a quick search through the en.yml file for the string they're translating would be sufficient though. If they find an existing key that's suitable for the new location, they can use it; otherwise, create a new one.

In principle I agree, but this is why I was asking about whether extensions can use strings from the core. They'd have to check through the YML file for both the current extension and the core.

If this was a much larger project, that would quickly get onerous ... but I think you're right, we can probably handle it by explaining the process carefully in the documentation.

Toby · Sep 9, 2015

Dominion As an alternative, we could provide some info as comments in the YML file. That wouldn't take care of the other two birds I mentioned, but it would serve as a stopgap. And since I'm going to be editing key names anyway, maybe this would be a good opportunity to put that information in as well.

And if we do that ... I've been wondering if it would be possible to leverage those comments for display by the GUI, if and when you get around to adding one.

Yes and yes – but let's not worry so much about the format of the comments right now, so long as they contain the essential information. Which, off the top of my head, would simply be:

A descriptive sentence or two about where in the UI the translation is used
The name of the file(s) that it is used in

Dominion · Sep 9, 2015

Toby Yes and yes – but let's not worry so much about the format of the comments right now

I wholeheartedly agree ... but considering that some strings can be used in several locations/files, the comments can quickly get pretty big. So at the very least, it would be a good idea to know whether I should put those all in the same line, or add a separate comment line for each use, for example.

Of course, the latter approach would also add space between the keys we've worked so hard to group by prefix.

Toby · Sep 9, 2015

I'm quite happy for the comments to be as large as they need to be, for example:

# Used in such and such location.
# js/forum/src/components/SignUpModal.js
# js/forum/src/components/LogInModal.js
email_placeholder: Email

# Used in such and such location.
# js/forum/src/components/AnotherOne.js
something_else: I suck at thinking of examples

Let's give every string first-class documentation

Dominion · Sep 9, 2015

Okay, thanks ... I can work with that.

Or hey ... since we'll be grouping by location prefix, maybe I can do something even fancier. One comment to explain the group, and individual comments as needed to explain where to look in the code.

That would probably be harder to adapt to the UI later, but that's another bridge we can burn when we get to it.

Franz · Sep 9, 2015

Toby The name of the file(s) that it is used in

Hmm, isn't that just asking to get out-of-date?

Toby · Sep 9, 2015

Franz True. Thinking a bit more about how the translation UI would work, it's probably not necessary information, so let's leave it out.

Dominion · Sep 9, 2015

Franz Hmm, isn't that just asking to get out-of-date?

Oh. Dang, I hadn't even considered that aspect. Yeah, it would mean a lot of maintenance, wouldn't it?

Toby True. Thinking a bit more about how the translation UI would work, it's probably not necessary information, so let's leave it out.

Done. Though I think the idea of having a natural-language description of where to look for stuff is worth doing, and we could get by with only one comment per prefix, rather than one per string. Comments for globally used strings could be more like what you suggested, with a natural-language description of each place to look.

DSitC · Sep 9, 2015

Franz On the other hand, it's not that hard to write a script that scans all javascript files and translations files and automatically inserts those comments.

Franz · Sep 9, 2015

It still needs to be run.