A few days ago I was assisting a Tamil customer with a Unicode keyboard they had designed which used visual input order. Visual input order means that vowels such as TAMIL VOWEL SIGN E, U+0BC6 are typed before the consonant with which they combine, even though they are stored after the consonant as per the Unicode standard. Our customer was running into a problem where U+0BC6 was combining with the wrong consonant in a run. In this blog I’ll discuss some of the possible solutions and potential issues with those solutions and finish with the solution that we proposed and that the customer chose to use. These solutions apply to any Indic script but we will use Tamil as an example.

The basic keyboard layout is shown below. For this discussion, I’ve only populated 3 keys – E, A and K. So you won’t be seeing real words in this example – the examples have been chosen as the simplest way to illustrate rendering complexities, and not as valid Tamil text.

The Keyman source file looks like this:

store(&VERSION) '8.0' 
store(&NAME) 'My First Tamil Keyboard' 
store(&MESSAGE) 'Demonstrating Visual Input Order' 

begin Unicode > use(main) 

group(main) using keys 

+ 'a' > U+0BBE 
+ 'e' > U+0BC6 
+ 'k' > U+0B95 

Now with this visual input order keyboard, the Tamil vowel U+0BC6 is stored after the consonant in the document but typed before it. The keyboard as it stands won’t do that:

Typed

Expected Display

Actual Display

Text Stored

e

U+0BC6

ek

கெ

ெக

U+0BC6 U+0B95

The initial solution was to add a rule to reorder these:

U+0BC6 + 'k' > U+0B95 U+0BC6

This fixes the initial issue but introduces the Travelling Vowel Problem – a vowel that just won’t stay where it is put:

Typed

Expected Display

Actual Display

Text Stored

e

U+0BC6

ek

கெ

கெ

U+0B95 U+0BC6

ekk

கெக

ககெ

U+0B95 U+0B95 U+0BC6

ekkk

கெகக

கககெ

U+0B95 U+0B95 U+0B95 U+0BC6

The problem here is how to tell that the U+0BC6 has already been combined with a consonant to prevent it moving further down the text store. The solution initially chosen by our customer involved using U+200C ZWNJ to stop the vowel U+0BC6 from moving along to the next consonant:

U+0BC6 + 'k' > U+0B95 U+0BC6 U+200C

This simple change stops the rule from matching repeatedly, because U+0BC6 is no longer at the end of the context. But does that solve the problem completely?

Typed

Expected Display

Actual Display

Text Stored

e

U+0BC6

ek

கெ

கெ

U+0B95 U+0BC6

ekk

கெக

கெக

U+0B95 U+0BC6 U+200C U+0B95

Okay, so this seemed to display just fine but behind the scenes we now had an extra U+200C ZWNJ in the text store which is certainly not ideal. Our customer noticed this when one application rendered U+200C ZWNJ as a space rather than zero width.

So what if we used Keyman’s deadkey functionality to not actually store a character in the text, but still flag that the vowel has been combined?

U+0BC6 + 'k' > U+0B95 U+0BC6 deadkey(combined)

Typed

Expected Display

Actual Display

Text Stored

e

U+0BC6

ek

கெ

கெ

U+0B95 U+0BC6 (dk)

ekk

கெக

கெக

U+0B95 U+0BC6 (dk) U+0B95

Success! Or is it? What happens when we type the following?

Typed

Expected Display

Actual Display

Text Stored

k

U+0B95

ke

க​ெ

கெ

U+0B95 U+0BC6

kek

ககெ

ககெ

U+0B95 U+0B95 U+0BC6 (dk)

kekk

ககெக

ககெக

U+0B95 U+0B95 U+0BC6 (dk) U+0B95

Hey! We don’t want to combine with that consonant – this is a visual order keyboard! This could be called the Overenthusiastic Vowel Combining Problem. However, the text is stored correctly.

So that’s a rendering issue again. We can’t solve that with a deadkey statement. It looks like our customer was on the right track after all. The key to solving this is to remember that the uncombined vowel is an intermediate state. We can temporarily add a U+200B ZWSP before this vowel to stop it combining to the consonant, knowing that we can delete the U+200B ZWSP as soon as the combining consonant is typed, by changing two rules in the keyboard:

+ 'e' > U+200B U+0BC6
U+200B U+0BC6 + 'k' > U+0B95 U+0BC6

I chose U+200B ZWSP because it does not have any other shaping behaviour. Now when we type our test sequences, we get the following:

Typed

Expected Display

Actual Display

Text Stored

k

U+0B95

ke

க​ெ

க​ெ

U+0B95 U+200B U+0BC6

kek

ககெ

ககெ

U+0B95 U+0B95 U+0BC6

kekk

ககெக

ககெக

U+0B95 U+0BC6 U+0B95 U+0B95

I’ve highlighted in that table the U+200B ZWSP character that is stored temporarily to prevent the U+0BC6 from combining with the previous character. Notice that U+200B ZWSP gets deleted in the next step. This simple pattern solves both the Overenthusiastic Vowel Combining Problem and the Travelling Vowel Problem.

Just for fun, I’ll add one final rule to handle the TAMIL VOWEL SIGN O, U+0BCA . This is a combination of the U+0BC6 and U+0BBE vowels, and is rendered on both sides of the consonant it attaches to. This ends up being a single, simple rule:

U+0B95 U+0BC6 + 'a' > U+0B95 U+0BCA

Typed

Expected Display

Actual Display

Text Stored

k

U+0B95

ke

க​ெ

க​ெ

U+0B95 U+200B U+0BC6

kek

ககெ

ககெ

U+0B95 U+0B95 U+0BC6

keka

ககொ

ககொ

U+0B95 U+0B95 U+0BCA

The final keyboard is then:

store(&VERSION) '8.0' 
store(&NAME) 'My First Tamil Keyboard' 
store(&MESSAGE) 'Demonstrating Visual Input Order' 

begin Unicode > use(main) 

group(main) using keys 

+ 'a' > U+0BBE 
+ 'e' > U+200B U+0BC6 
+ 'k' > U+0B95 
U+200B U+0BC6 + 'k' > U+0B95 U+0BC6 
U+0B95 U+0BC6 + 'a' > U+0B95 U+0BCA 

Using these design patterns, you can create visual input order keyboards for any of the Indic scripts, and you would transfer the same principles to phonetic input methods. Judicious use of the any, index and store statements will also make light work of handling all the possible combinations. Other considerations that I have not covered here include visual order backspacing and prevention of illegal combinations such as U+0B95 U+0BBE U+0BBE காா.

Categories: Developing Keyboards

1 thought on “Techniques for Complex Script Keyboards – Visual Input Order”

Marc Durdin · February 11, 2011 at 9:13 am

As noted by one reader, in the final 3 examples, the first Tamil Letter K U+0B95 depicted is shown as a consonant from a previous syllable. In practice this issue may rarely arise but for the prevention of confusion by the unexpected reordering of vowels, it is still useful to do this.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Posts

Developing Keyboards

Introducing the Keyman Open Source Keyboard Repository

We have many hundreds of keyboard layouts online at keyman.com that cover well over a thousand  languages. These keyboard layouts work on Windows, the web, iPhones, iPads and Android phones and tablets. However, most of Read more…

Android

Using Keyman Engine for Android to create a system keyboard

In part 1 of this series, we looked at the steps involved in creating a basic Android app which included a Keyman in-app keyboard. In this post, we’ll work with the second sample included in Read more…