Reorganization of translation keys

Toby · Sep 9, 2015

Dominion As an alternative, we could provide some info as comments in the YML file. That wouldn't take care of the other two birds I mentioned, but it would serve as a stopgap. And since I'm going to be editing key names anyway, maybe this would be a good opportunity to put that information in as well.

And if we do that ... I've been wondering if it would be possible to leverage those comments for display by the GUI, if and when you get around to adding one.

Yes and yes – but let's not worry so much about the format of the comments right now, so long as they contain the essential information. Which, off the top of my head, would simply be:

A descriptive sentence or two about where in the UI the translation is used
The name of the file(s) that it is used in

Dominion · Sep 9, 2015

Toby Yes and yes – but let's not worry so much about the format of the comments right now

I wholeheartedly agree ... but considering that some strings can be used in several locations/files, the comments can quickly get pretty big. So at the very least, it would be a good idea to know whether I should put those all in the same line, or add a separate comment line for each use, for example.

Of course, the latter approach would also add space between the keys we've worked so hard to group by prefix.

Toby · Sep 9, 2015

I'm quite happy for the comments to be as large as they need to be, for example:

# Used in such and such location.
# js/forum/src/components/SignUpModal.js
# js/forum/src/components/LogInModal.js
email_placeholder: Email

# Used in such and such location.
# js/forum/src/components/AnotherOne.js
something_else: I suck at thinking of examples

Let's give every string first-class documentation

Dominion · Sep 9, 2015

Okay, thanks ... I can work with that.

Or hey ... since we'll be grouping by location prefix, maybe I can do something even fancier. One comment to explain the group, and individual comments as needed to explain where to look in the code.

That would probably be harder to adapt to the UI later, but that's another bridge we can burn when we get to it.

Franz · Sep 9, 2015

Toby The name of the file(s) that it is used in

Hmm, isn't that just asking to get out-of-date?

Toby · Sep 9, 2015

Franz True. Thinking a bit more about how the translation UI would work, it's probably not necessary information, so let's leave it out.

Dominion · Sep 9, 2015

Franz Hmm, isn't that just asking to get out-of-date?

Oh. Dang, I hadn't even considered that aspect. Yeah, it would mean a lot of maintenance, wouldn't it?

Toby True. Thinking a bit more about how the translation UI would work, it's probably not necessary information, so let's leave it out.

Done. Though I think the idea of having a natural-language description of where to look for stuff is worth doing, and we could get by with only one comment per prefix, rather than one per string. Comments for globally used strings could be more like what you suggested, with a natural-language description of each place to look.

DSitC · Sep 9, 2015

Franz On the other hand, it's not that hard to write a script that scans all javascript files and translations files and automatically inserts those comments.

Franz · Sep 9, 2015

It still needs to be run.

DSitC · Sep 9, 2015

Franz Yep.

If you're working with a GUI, this really becomes obsolete, true. You could use such a scanning script in the GUI however. Something like a "More Info on this key"-Button that will then start a quick parse and shows the translator in which files/lines the translation calls are made with this key.

Dominion · Sep 9, 2015

Yes ... when you get right down to it, though, I don't think we really want to be encouraging translators to be poking around in the code. So it's probably best to leave the filenames out of the YML.

Not because translators shouldn't poke around in the code, but because they shouldn't have to. There are times when it's unavoidable, but for the most part, we want them to get the job done in the YML only. So I'd like to make editing the YML as, erm, non-technical an experience as it can be.

DSitC You could use such a scanning script in the GUI however. Something like a "More Info on this key"-Button that will then start a quick parse and shows the translator in which files/lines the translation calls are made with this key.

But this sounds like an excellent idea, for translators who need it.

EDIT: Sorry, that wasn't making much sense the first time around.

Dominion · Sep 10, 2015

Okay, my first move has been to go through and pull out the globals. Next I plan to start grouping the rest of the strings by location, and give some thought to prefixing. Then it'll just be a matter of finalizing the suffixes.

But before I get on to that, the process of organizing the globals has raised a couple questions.

Is it possible to combine strings?

I think I may have been a bit too optimistic about a couple of reuse instances. Cases in point:

The "Log In" link at the bottom of the signup modal
The "Sign Up" link at the bottom of the login modal

At first glance, these two links look like the other "Log In" and "Sign Up" links/buttons. But they're different in that they come with context, i.e. the core.before_log_in_link and core.before_sign_up_link strings, respectively. Some translators may need the freedom to embed the link in the context sentence, like so:

If you already have an account, please log in instead.

Even if there is no need for non-link text after the link, the hardcoded space separating the link from the context is bound to cause trouble for some translators. So each of these string pairs should be handled as one.

There's no need for you to act on these just yet, since there may be others. I'll compile a complete list of changes that need to be made when I'm ready to start editing key names. Or I can make the changes myself, with your approval, if you can help me out with the syntax. (I'm even less experienced with JS than I am with PHP.) For now, I'd merely like to confirm that making such changes won't create any problems.

How about unique key names?

After removing the above-mentioned pair of instances, we can summarize the globals situation thusly: we've got a total of 14 global strings, each used in only two or three places, for a total of just 35 app.trans calls.

That's not an awful lot. In fact, the numbers are so small that I've started to wonder whether it might be a good idea to use a unique key name for every string. Here's how we could do it:

The dev would start by prefixing every key name by location.
Each string would therefore be grouped with all other strings in the same location.
The key names for global strings would be followed by a reference as DSitC has suggested.
The globals would be grouped together for easy location.
Comments on globals would merely list the unique keys that reference them.

Please note that this doesn't mean we'd necessarily have to use a unique key name for every app.trans call. Cases such as core.bio_placeholder, which is used twice in the same location, could use the same key name. But it would mean adding 21 new keys, and about 35 lines to the YML file (not counting comments).

This approach would have advantages for both translators and devs:

From the translator's point of view, it would make it easier to locate a global string that's being used in the location he/she is concentrating on, and then quickly cross-check whether the translation will work in other locations where the string is used. And if for some reason the global string just isn't working out for a specific location, the translator would not need to ask for the string to be split: he/she could just replace that reference with a string value that fits.

Of course all the keys that we have decided to split (like the "button versus title" situations) would also reference the globals, so that would reduce the number of duplicate strings to be translated to zero. And in the rare case where a translator finds him/herself translating two different English strings into the exact same phrase, he/she can extract that phrase as a global and point both keys at it, again without bothering the devs.

From the developer's point of view, there is the obvious advantage of not having to handle as many requests for new strings. Beyond that, it will allow us to make the rules for naming keys simpler and easier to follow.

Of course, someone will have to check whether there's a global string to be referenced in each case, but this would no longer need to be done as part of the coding process. Adding strings to code would become a simple matter of (1) adding a new, unique key name (including a quick check to be sure that it is indeed unique) and then (2) adding that key and its string to the YML file. The extraction of duplicates as globals could be left for later cleanup, which is an easy task that doesn't need to be done by a programmer.

The downside to all this would be any performance issues that might arise from the referencing mechanism. Not to mention the effort involved in implementing such a mechanism, of course.

Please let me know what you think of this idea!

Dominion · Sep 10, 2015

For simplicity, I limited the above discussion of unique key names to the core. Things get slightly trickier if we take extensions into account. Here are some things we'd need to consider:

The proposal implies that core strings can't be used directly in any extension code. We'd want to have a line for each string in the extension YML. This is to preserve uniqueness; direct use of core strings would negate the advantages of the system.

So all realization of extension keys as strings from core would be handled by the YML referencing mechanism. Is this likely to cause any issues?
Would extensions be allowed to reference non-global core strings? (I would suggest that this be allowed only when the name of the extension key exactly matches that of the core key being referenced, i.e. when the string is used in the same manner and location as the referenced core string.)
Seen from this angle, namespacing could be handled as a simple fallback mechanism: if you don't find a string in the extension YML, look for it in the core YML!
When an extension wants to reference a non-global core string that isn't used in the same manner or location (assuming we choose to allow that), should that string be separated out as a global? (This seems a reasonable thing to do, but it would increase the number of references in the core YML, obviously.)
When a core key is referenced by an extension, should the core key be given a comment to indicate this?

Regarding the last two points, it goes without saying that we'd only be able to do this for bundled extensions. Third party devs would need to track their string usage on their own and be ready to make adjustments if a core string that they've been referencing gets changed. (But the uniqueness factor would make it easier for them to respond to such a situation, since they could merely replace the reference with a string.)

There may be other things I'm not taking into account. My thinking re: extensions is still a bit wooly at this point.

Franz · Sep 10, 2015

Sounds good to me. Very solid.

Any negative performance impact of the referencing mechanism can be compensated for by simply compiling all locales into one PHP file (with references already resolved) whenever an extension is added / updated.

Dominion · Sep 10, 2015

Franz Any negative performance impact of the referencing mechanism can be compensated for by simply compiling all locales into one PHP file (with references already resolved) whenever an extension is added / updated.

Ooh, nice!

It occurred to me that we'd need to put some sort of check on referencing within the same file, so that when Key A takes it to Key B and it finds another reference there, it throws an error. Otherwise a careless translator could easily send it into a loop. But we might want to make it possible for a key in an extension to reference a key in the core, and then take one further hop from there.

Toby · Sep 11, 2015

Agreed, everything you've outlined sounds good. Regarding extensions, referencing core translations will be fine. I don't think core should make accommodations for any extensions, even if they're bundled – so no, if there's no reason for a string to be a global in core, then it shouldn't be made a global.

Regardless, let's just focus on getting the basics of this system implemented first, and then we can tweak!

Dominion It occurred to me that we'd need to put some sort of check on referencing within the same file, so that when Key A takes it to Key B and it finds another reference there, it throws an error. Otherwise a careless translator could easily send it into a loop. But we might want to make it possible for a key in an extension to reference a key in the core, and then take one further hop from there.

Good thinking. We'll build in some kind of loop detection

Can we quickly discuss the format that references should take? Possibilities:

core:
  # What @DSitC originally proposed
  log_in_action: => core.log_in_title 

  # Would it be safe to omit the prefix and assume anything
  # in the format of foo.bar is a reference? My thought is
  # probably not...
  log_in_action: core.log_in_title 

  # Other ideas...
  log_in_action: > core.log_in_title
  log_in_action: ~core.log_in_title
  log_in_action: @core.log_in_title

I think @DSitC's original syntax is probably the safest, but just wanted to open the discussion.

Dominion · Sep 11, 2015

Okay, since it seems we all agree, I'll get underway on the assumption we'll be doing it this way.

Toby I don't think core should make accommodations for any extensions, even if they're bundled – so no, if there's no reason for a string to be a global in core, then it shouldn't be made a global.

Yes, that makes sense. Noted.

Toby Can we quickly discuss the format that references should take? Possibilities:

I agree that a plain foo.bar is probably best avoided. Any of your "Other ideas" seem good, though there's a small outside chance that someone might want to begin a string with an "@". The syntax proposed by @DSitC would probably be safe, and it might be worth memorializing the fact that he suggested it.

So unless @Franz has any objections, I'm happy to go with that.

My next question is: How soon would it be possible to put the referencing mechanism/compiler in place?

There's no hurry on this, as it'll take me a while to get the final key name taxonomy figured out. But if it seems like taking a while, I'd want to plan for it. I could do the following as I adjust the key names in the YML and code:

Add the reference lines to the YML file, but comment them out.
Add alternative lines with the unique key names in the code, and comment those out too.

Then when the compiler is ready, it would be a simple matter of uncommenting those things, and removing the old lines with the non-unique key names from the code.

Like I say, it'll be a while before I'm ready to start on the actual editing, so there's no need to set a schedule right now. I just thought it would be a good idea to bring it up here so we can coordinate our efforts.

EDIT: I suppose an alternative would be to do it as two branches, one with the non-unique keys and another with the unique ones. But since I'm new to Git, it's probably safest to do it as described above. Unless it's not necessary, of course.

Toby · Sep 11, 2015

Dominion Go ahead and make the changes as if the referencing system is in place. Implementation should be easy so we'll be able to get that done very quickly whenever the time comes.

Dominion · Sep 11, 2015

Toby Will do!

Another quick question: we agreed earlier that globals should be given no prefixes, but as we've since decided to organize everything by location, I think it might be good to give the globals a standardized prefix as well. Doing so would allow us to:

Keep them together in a clump when extracting data for one purpose or another.
Add general comments about globals (e.g. instructions on how to reference) should we desire.

I was thinking of using a simple "x" as the prefix, to indicate that the keys could be used in various locations. But on second thought, it might be better to do something like "aaa" or "zzz" to put them together at the top or bottom of the file when we alphabetize.

... Though come to think of it, it's not very likely that we'll have many locations beginning with x, y, or z.

Do you think such a prefix would be a good idea? Any preference as to which prefix we should use?

Franz · Sep 11, 2015

I'm fine with that arrow syntax.

Dominion I was thinking of using a simple "x" as the prefix, to indicate that the keys could be used in various locations. But on second thought, it might be better to do something like "aaa" or "zzz" to put them together at the top or bottom of the file when we alphabetize.

How about an underscore? Not very pretty, but it should work well...