Reorganization of translation keys

Dominion

Thank you! And yay!

My matrix is pretty much complete. Now for analysis.

Dominion

I've finished my first pass at the translation keys, looking at cases where a string is being used in more than one place. There are 19 such cases, of which 13 will pose no problem (i.e., the string is used with the same sense everywhere).

The remaining six cases are described below. Please bear in mind that the goal at present isn't to decide what to do about key names in these cases, but to think about whether or not we should do something about splitting them, and what that decision will imply for translation key naming in general.

Button vs. Dialog Title

There are four cases where a string is used as both a button and the title of a dialog box:

core.change_email: Change Email
core.change_password: Change Password
core.log_in: Log In
core.sign_up: Sign Up

I think most translators will be able to handle all these without requesting separate strings, so there's probably no need for preemptive action. On the other hand, they are few enough and short enough that we could split them up now and not worry about the duplicate phrases being a burden on translators. I'll discuss this in more detail below.

Table Heading vs. Link

The string "Discussions" (core.discussions) is currently being used in two places:

As a heading in the dropdown box listing the results of a search
As a link on the user page (followed by a number indicating quantity)

The fact that it's a heading in one case and a link in the other doesn't make any difference to the meaning. But the fact that the latter is followed by a number (how many discussions the user has started) does add an additional sense to the latter. This may become a reason for handling it as a separate string.

For that matter, it may also be a good idea in the second case to include the number in the string. This is because some languages may need to do linguistic things to it. (For example, Japanese generally adds a character after any quantity to indicate the type of thing being counted.) Of course, this would make the second case a different string.

The same would be true of the "Posts" link (core.posts) that appears above the "Discussions" link on the user page.

Table Heading vs. Text Box Label

The string "Email" (core.email) is currently being used in four places:

As a label or placeholder for a text entry box (three instances)
As a header in the notification settings table (one instance)

This is a clear case of a word being used in two different senses. As a text box label, it's prompting the user to enter his or her email address. As a table header, it's talking about email as a notification method, and has nothing to do with the address. It's very likely that a translator will want to translate the word differently in each of these two contexts.

Moreover, there's a big difference between the two in the amount of space available to the translator. This is another factor that will determine how the phrase can/should be translated. So I think we may have ample reason to go ahead and separate the latter instance out as a separate string.

Discussion

With the above information in hand, we're now ready to take our first steps toward settling on a key naming scheme.

@Toby has said that key names should be descriptive and consistent. These are worthwhile goals, but there are times when it is hard to do both at once. The problematic cases of string reuse described above will help us see why. Let's begin by asking:

Are prefixes and suffixes really necessary?

After all, when we're talking about a one- or two-word string (which is what most of the reused strings are), there's really no better way to describe the string than to use the string itself as the string name! This is especially true when a string is used in several different places in the UI. In such cases, we can't add prefixes or suffixes to indicate where the string is used without creating a number of duplicate strings, which will increase the translator's workload.

Granted, it might be nice to add some information to help the translator figure out what sense a word is being used in, where it can be found in the UI, and so on. But that's only really necessary when the meaning of the string is inherently ambiguous, as I pointed out here. In most cases it's not all that hard for a translator to guess what, say, "Log In" means. (Especially if it's spelled correctly. I congratulate @Toby on being one of the virtuous few who get it right! 🙂 )

But let's say that, in order to remain consistent, we want to tag the string for every button name and dialog box title with a suffix that will let the translator know how the string is being used. That's reasonable, because it's also descriptive after another fashion. But in cases where strings are being reused, such as the four "Button vs. Dialog Title" cases described above, this will end up forcing our hand: we will have to split those four strings to accommodate the consistent naming. We will no longer be able to put that off until a translator requests a split.

That isn't a huge problem. After all, it's only four little strings. But as we add more strings, and take the extensions into account, the number of duplicate strings created for the sake of consistency will continue to grow. Eventually we could end up with a real Stumbling block 3 situation.

Key names should be descriptive and consistent, but there's a case to be made for efficiency too!

What happens when we decide to split a string?

Let's forget (temporarily) about the possibility of adding prefixes and suffixes to everything and talk about what happens when we decide to split a string. The "Button vs. Dialog Title" cases will come in handy for that, too.

Take, for example, the strings core.log_in and core.sign_up. Each is used in four places: three times as a button name and once as a dialog title. If we decide to split off the dialog titles as separate strings, using suffixes as described here to distinguish between them and the buttons, we can proceed in one of two ways:

We can add a suffix to the title string only (core.log_in_title), leaving the button string as-is (core.log_in).
We can add a suffix to both the title string (core.log_in_title) and the button string (core.log_in_action).

The former course would result in a pair of string names that are not consistent with each other. But at least one of the pair will remain consistent with any other dialog title or button strings that don't have suffixes yet. One benefit of this approach is that it is easy to implement, since you only have to change one string name, which is used in two places.

The latter course would result in a pair of string names that are consistent with each other, but inconsistent with any dialog title and button string names that don't have suffixes. (Of course we could fix that by going the "suffixes for all" route described above; but we're forgetting about that possibility now, remember?) The downside to this approach is that you end up having to make a lot more changes: two string names, used in a total six places.

It seems it might be good to think about efficiency not only in terms of key name length and quantity of duplicate keys, but also ease of implementation.

Two approaches to key naming

I think we can boil down all the above (Yes! At last, a TL/DR! 😃 ) by saying that we can slant our key naming scheme in one of either two directions:

Greater consistency, at the expense of efficiency
Greater efficiency, at the expense of consistency

I should add that while I've been looking to cases of string reuse for clues, single-use strings won't remain unaffected.

Let's say we're using the word "Thingamajig" as a button name, and that button is the only place it appears it in the UI. Do we need to give it a suffix to indicate it's a button? The greater consistency approach would argue yes.

And let's not forget the possibility that we may eventually add a dialog box titled "Thingamajig". If we somehow arrive at a policy that requires suffixes on every string that's being used in more than one context, we'll not only have to add a new core.thingamajig_title string, we'll have to rename core.thingamajig to core.thingamajig_button at that point. Or something like that. I think.

My question for Toby (and anyone else who's interested)

Sorry for making you read all this stuff, but I wanted to get your informed opinion on how to go forward. Ultimately, it comes down to a rather simple choice: more consistency, or more efficiency?

I didn't want to make that decision by fiat ... in fact, the more I think about this stuff, the more I'm inclined to toss it all aside and go with the key names you've got. (By which I mean: limit myself to minor tweaks, slanted heavily in favor of efficiency.) I think there's a limit to what can be achieved using key names as the sole tool; which is why I think it might be a good idea to give some thought at this point to a translation UI capable of providing translators with more info about the strings than we can put into the key names.

I'm not saying we have to think about starting work on such a UI right away. But if we know we'll have one eventually, then for the time being we can settle for key names that are less than optimally descriptive and consistent.

Please let me know what you think!

PS: As for extensions, at this point I think we'll just have to pick a policy and hope it scales well. 😛

Toby

OK, time for me to review everything that's been said and offer a solid opinion on all of this!

Dominion In most cases it's not all that hard for a translator to guess what, say, "Log In" means. (Especially if it's spelled correctly. I congratulate @Toby on being one of the virtuous few who get it right! )

Thanks! This is actually a huge source of pride for me. Whenever I see someone writing "login" as a verb, I die a little inside. 😛

Dominion From this perspective, it seems we were right to focus on the key names. So instead of looking for ways to bend YAML to our collective will, we should probably be thinking about how we can provide translators with descriptive key names that strike a good balance between consistency and efficiency.

Completely agreed.

Dominion But if we know we'll have one [a translation UI] eventually, then for the time being we can settle for key names that are less than optimally descriptive and consistent.

But in the case of the duplicate strings used in different contexts, would we not need to split them and name them regardless? To me, the translation UI sounds like an amazing tool (and I will reply to that thread soon!), but I still think this problem should be solved to a large degree by a solid key naming scheme. I'm not expecting to come up with something where translators can just look through the YAML file and immediately interpret the meaning of certain prefixes/suffixes – that's where the UI would help. But from the coder's perspective, they should be confident that they can follow a set of rules and name a key correctly and consistently. Does that make sense?

So after considering all of this – but admittedly, without having been through all of the strings and their uses in excruciating detail like @Dominion has – I think my preference would be to enforce contextual suffixes, chosen from a list, for all keys. i.e. Greater consistency, at the expense of efficiency:

Split up core.log_in into core.log_in_title and core.log_in_action? Yes.
Rename core.thingamajig to core.thingamajig_action (even if there is no core.thingamajig_title)? Yes.

I understand this will result in more duplication than may be ideal, but on my brief look through Dominion's amazing matrix of the strings and their uses, I noticed that all of the duplicated strings are very short. Since we're proposing a suffix, they will all be listed together (alphabetically) in the YAML file; The translator will be able translate them all at once, usually just by copy+pasting the first one onto the others. My point is: Are we really losing that much efficiency?

Anyway, that's where my thoughts currently stand. What do y'all think?

Franz

Hmm, I just cannot write such a long reply, but I only have a short idea anyway, so here it comes... 😉

Toby may even have suggested this somewhere, not sure whether it's really my idea. But can't we use descriptive keys like "sign_up.button" when translating a string in the code, but only force translators to define "sign_up"? The lookup will always fallback to "sign_up" unless the translator has specifically added a "sign_up.button" to their locale file.

We basically get the best of both worlds (less work for the translator, more specific translations where necessary). The only thing we'd have to figure out is how to distinguish the namespace separators (flarum.core) from the context separators (sign_up.button)... Any suggestions?

Dominion

Hmm. I seem to have flip-flopped a bit regarding approach, and it's occurred to me to wonder why.

As Franz said, it was indeed @Toby who first mentioned the idea putting context in the code (in his first reply above, which I find myself unable to mention for some reason). He also added a caution about the extra complexity this would involve, and I agreed that it didn't seem worth the trouble:

Dominion In fact, it sounds like the sort of thing that, if you're going to do it at all, it should be applied across the board according to some standardized scheme. And that would be a lot of work.

Yet when Franz brought the idea up again, I found myself thinking it might be worth the trouble:

Dominion It would be a rather big change to make, because it would be best to do apply it everywhere, but definitely worth the trouble!

Why did I suddenly find the idea so appealing? Well, after thinking about how difficult it would be to provide translators the information they need while keeping the key naming scheme both consistent and efficient, I began to think that it might be easiest manage the consistency angle in the code. It seems to me that it would be easier to devise a format for adding context there, than it would be to define a consistent key name format.

(Implementation, however, would be an entirely different matter.)

But even if we're okay with the added complexity that Toby warned of, that's not really the end of it. We'd have to come up with some way of letting the translators know what their options are. Without that info, translators would be forced to peek at the code to see what context keys were available. So we'd have to provide them with documentation, and then we'd have to make sure the code adhered to the rules in the documentation.

... And that means the context would have to be supplied uniformly in the code, everywhere. Which means my instinct (that it's the sort of thing that needs to be applied across the board) was spot on.

So ... given the extra effort involved, is it worth it? Let's look at the numbers. Of over 100 strings, only seven are reused in a way that could pose an issue for translators, and only one of those (core.email) strikes me as truly urgent. In terms of instances (app.trans calls) it comes to about 18 out of 128, of which only four are urgent.

These numbers will change as we add strings for the admin interface, take extensions into account, etc. But assuming they don't change too much, that's a lot of work to cover only a few situations. Again, Toby's caution springs to mind.

From this perspective, it seems we were right to focus on the key names. So instead of looking for ways to bend YAML to our collective will, we should probably be thinking about how we can provide translators with descriptive key names that strike a good balance between consistency and efficiency.

I'm starting to get some ideas about that, but I need some more time to flesh them out. So I'll leave this here for now.

Dominion

Ooooh. That would certainly allow us to do things efficiently, while giving the the translator both the necessary info and a good bit of flexibility. It would be a rather big change to make, because it would be best to do apply it everywhere, but definitely worth the trouble!

I'll have a think about distinguishing the separators. In the meantime, here's an example from LoginModal.js:

  title() {
    return app.trans('core.log_in');
  }

versus

            {Button.component({
              className: 'Button Button--primary Button--block',
              type: 'submit',
              loading: this.loading,
              children: app.trans('core.log_in')

I don't suppose there's any way the translator could do something in the YML file that would leverage the information in the className or type? If that were only possible, we'd have the context already mostly in place.

LATER...

If not, how about using a hashtag between the key name and the context?

Also, would we want to apply context to only one situation, to make the distinction efficiently? Or everywhere, to give the translator maximum freedom in how they handle the situation? For example:

  title() {
    return app.trans('core.log_in#title');
  }

This would be enough to allow adequate handling of the situation. But putting a hashtag on the button as well would give the translator freedom to handle either the title or the button as the variant.

DSitC

Dominion No need to hardwire it into the string in a special way. Just let the code writer assemble the translation key:

return app.trans('core.log_in.'+this.title);

... or something like this. 😉

Dominion

@DSitC Thanks! I still have no idea how the coding side of this works.

I would assume it's possible to add multiple contexts to a single string with this method. Is that the case?

Dominion

Hey! Here's a question I should have asked a long time ago:

Up till now the discussion has focussed on how we can use YAML to handle variation by means of one-to-many correspondences, i.e., one key name realised as multiple strings, conditioned by context from the code.

Would YAML also be able to handle many-to-one correspondences? For example, in Japanese we might need:

log_in_title: ログインしてください
log_in_action: ログイン

But in English, both titles could be realised as the same string:

log_in_title, log_in_action: Log In

If that sort of thing is possible, we could stuff as much context in the key names as we like, without worrying about creating extra work for the translator where it isn't necessary.

DSitC

I found nothing about multiple keys referencing a single value in the spec.

However, to avoid repetition, you could specify (and evaluate) a special value syntax for references to other keys:

log_in_title: Log In
log_in_action: => log_in_title

That could tell the i18n parser of flarum to determine for the value of log_in_action by taking the contents of the log_in_title key.

Dominion

DSitC That was going to be my next question. 😃

Thanks for responding, that gives me something to think about.

Dominion

Okay, my first move has been to go through and pull out the globals. Next I plan to start grouping the rest of the strings by location, and give some thought to prefixing. Then it'll just be a matter of finalizing the suffixes.

But before I get on to that, the process of organizing the globals has raised a couple questions.

Is it possible to combine strings?

I think I may have been a bit too optimistic about a couple of reuse instances. Cases in point:

The "Log In" link at the bottom of the signup modal
The "Sign Up" link at the bottom of the login modal

At first glance, these two links look like the other "Log In" and "Sign Up" links/buttons. But they're different in that they come with context, i.e. the core.before_log_in_link and core.before_sign_up_link strings, respectively. Some translators may need the freedom to embed the link in the context sentence, like so:

If you already have an account, please log in instead.

Even if there is no need for non-link text after the link, the hardcoded space separating the link from the context is bound to cause trouble for some translators. So each of these string pairs should be handled as one.

There's no need for you to act on these just yet, since there may be others. I'll compile a complete list of changes that need to be made when I'm ready to start editing key names. Or I can make the changes myself, with your approval, if you can help me out with the syntax. (I'm even less experienced with JS than I am with PHP.) For now, I'd merely like to confirm that making such changes won't create any problems.

How about unique key names?

After removing the above-mentioned pair of instances, we can summarize the globals situation thusly: we've got a total of 14 global strings, each used in only two or three places, for a total of just 35 app.trans calls.

That's not an awful lot. In fact, the numbers are so small that I've started to wonder whether it might be a good idea to use a unique key name for every string. Here's how we could do it:

The dev would start by prefixing every key name by location.
Each string would therefore be grouped with all other strings in the same location.
The key names for global strings would be followed by a reference as DSitC has suggested.
The globals would be grouped together for easy location.
Comments on globals would merely list the unique keys that reference them.

Please note that this doesn't mean we'd necessarily have to use a unique key name for every app.trans call. Cases such as core.bio_placeholder, which is used twice in the same location, could use the same key name. But it would mean adding 21 new keys, and about 35 lines to the YML file (not counting comments).

This approach would have advantages for both translators and devs:

From the translator's point of view, it would make it easier to locate a global string that's being used in the location he/she is concentrating on, and then quickly cross-check whether the translation will work in other locations where the string is used. And if for some reason the global string just isn't working out for a specific location, the translator would not need to ask for the string to be split: he/she could just replace that reference with a string value that fits.

Of course all the keys that we have decided to split (like the "button versus title" situations) would also reference the globals, so that would reduce the number of duplicate strings to be translated to zero. And in the rare case where a translator finds him/herself translating two different English strings into the exact same phrase, he/she can extract that phrase as a global and point both keys at it, again without bothering the devs.

From the developer's point of view, there is the obvious advantage of not having to handle as many requests for new strings. Beyond that, it will allow us to make the rules for naming keys simpler and easier to follow.

Of course, someone will have to check whether there's a global string to be referenced in each case, but this would no longer need to be done as part of the coding process. Adding strings to code would become a simple matter of (1) adding a new, unique key name (including a quick check to be sure that it is indeed unique) and then (2) adding that key and its string to the YML file. The extraction of duplicates as globals could be left for later cleanup, which is an easy task that doesn't need to be done by a programmer.

The downside to all this would be any performance issues that might arise from the referencing mechanism. Not to mention the effort involved in implementing such a mechanism, of course. 😉

Please let me know what you think of this idea!

Franz

We can also use the same separators both for namespaces and context, but this would limit the actual keys to be flat, so we wouldn't be able to structure them anymore.

All locale keys would follow this pattern:
namespace.key.context
e.g. core.log_in.button
(the context would be optional)

Dominion

Franz We can also use the same separators both for namespaces and context

I was thinking that should be possible...

Franz but this would limit the actual keys to be flat, so we wouldn't be able to structure them anymore.

I'm not sure what you mean by this. (Ah well, I guess my technical savvy only goes so far...)

Would doing this mean translators couldn't use context in other ways, such as for plurality or gender, for example?

DSitC

Franz You don't really gain anything by allowing nesting in i18n files. Authors can do their own nesting using snake_case. Giving them a static scheme like namespace.key[.context] could actually improve productivity and streamlines the language file layouts. :-)

So, yeah, flatten them. 😃

Franz

Dominion What I meant was that the "key" part in namespace.key.context could not contain any more periods, because we'd have ambiguity otherwise.

Toby

DSitC No, you'd have to have another sub-key for the base value:

namespace:
 key1:
  default: value1 base
  context1: value1 context1
  context2: value1 context2

Dominion Would doing this mean translators couldn't use context in other ways, such as for plurality or gender, for example?

That's an issue I had in the back of my mind while I was writing my first Toby – while there might be some way to make it work, it would be undoubtably more complex. For lack of a better example:

core:
 delete_post:
  one:
   title: Delete Post
   button: Delete
  other:
   title: Delete Posts
   button: Delete

# VS

core:
 delete_post:
  title:
   one: Delete Post
   other: Delete Posts
  button:
   one: Delete
   other: Delete

Dominion

Oh, I see ... so it wouldn't prevent the sort of nesting that DSitC is talking about, but it wouldn't be possible to condition a string using two contexts at once. Hmmm.

DSitC

@Franz With YML, is this possible?

namespace.key1: value1 base
namespace.key1.context1: value1 context1
namespace.key1.context2: value1 context2

So, that if app.trans('namespace.key1') is called, you would get value1 base, but with a call of app.trans('namespace.key1.context1') you would be able to access the context1 value?

Added: So, devspeak - is the YML markup directly converted into an object, dropping any scalar value tied to an upper layer, or can you intercept this and add custom functionality to it?

Toby

Come to think of it though, the logical option is the second example I gave. It's fundamentally the same as the underscore suffixes, just in a different format. The real issue is what happens when you mix fallbacks with plurals, like so:

core:
 delete_post:
  one: Delete Post
  other: Delete Posts

// JavaScript
app.trans('core.delete_post.button', {count: 3});

core.delete_post has sub-keys, but how will app.trans know that they're plural sub-keys rather than context ones? It's hard to think about without actually writing some code ... maybe there's a logical way to make it work, but I guess my point is this: by reducing duplication for translators, we probably increase the complexity of the system. Food for thought.

Mind you, I'm in a bit of a rush right now so probably not thinking very precisely. Hell, I haven't even read @Dominion's big post yet. I'll hopefully have time tonight to sit down and review this whole thing.

DSitC

Toby The nightmare really gets started when you have a language that has a lot of different pluralization (zero, one, a few, many, a lot) and then also need to add gender specific values into the mix. Oh joy. ;-)

Ideally there should be a system thats allow for such complex expressions in the language files, but also lets you just write simple key-value pairs if your language does not need it.

Dominion

@Toby ... Funny you should bring up "Delete" as an example, as I was just thinking along the same lines.

It seems to me that the real value in putting context into the code (as opposed to the key names) lies in the ability to apply multiple contexts at once.

Let's say, for example, that we want to use the string core.delete as the title of a confirmation dialog, as in your example. And let's also imagine that a translator needs to use a different word when talking about deleting users (as opposed to posts or discussions). So we end up with two different types of context that can be combined for four variations, like so:

core.delete +content +title
core.delete +content +button
core.delete +user +title
core.delete +user +button

From this standpoint, namespace.key.context doesn't seem like much improvement over namespace.key_context.

Toby by reducing duplication for translators, we probably increase the complexity of the system. Food for thought.

As you said way back in your first reply to this thread ... and I don't think there's any way around that. Probably the best (and easiest to implement) solution we've seen so far is the one suggested by @DSitC involving key-to-key reference:

log_in_title: Log In
log_in_action: => log_in_title

That sort of thing would allow a translator to replace the key reference with a variant translation, but it's just putting the extra complexity in the YML instead of the code, and would probably complicate things like pluralization horribly.

Toby Mind you, I'm in a bit of a rush right now so probably not thinking very precisely. Hell, I haven't even read @Dominion's big post yet. I'll hopefully have time tonight to sit down and review this whole thing.

Please don't rush on my account, I'm happy using this time to mull the situation over. Learning a lot, too! 😉

DSitC Ideally there should be a system thats allow for such complex expressions in the language files, but also lets you just write simple key-value pairs if your language does not need it.

I agree!

DSitC

Additional ideas, solutions and food-for-thought: https://slexaxton.github.io/Jed/

« Previous Page Next Page »