I'm excited to announce that I (Marc Durdin) will be presenting at the Internationalization and Unicode Conference 36 in October, on the topic "From Typewriter to Touch: Multi Platform Keyboards -- Challenges and Illustrations". I will be walking through a number of the complexities and challenges that may not be immediately obvious both when using and when designing keyboard layouts for all manner of modern platforms. Over the last 20 years, I've built up quite the laundry list of troubles that people encounter when trying to type in almost any language on computers, and during this presentation I hope to air some of the more fascinating issues that I've happened upon.
In August, I will be running a similar presentation at the My Language Conference 2012, in Brisbane Australia, with a focus on situations that government organisations, especially libraries and other cultural institutions, may be encountering.
Finally, back at the Unicode Conference, I will be also running a tutorial on the first day on designing a cross platform keyboard using Tavultesoft Keyman technology. In this tutorial, using a specific example, I'll be taking attendees through core design principles around keyboard layouts that we've developed in the last 20 years, and from there into some of the complexities that arise when you try and provide a consistent, yet tailored approach for various platforms, including touch devices such as mobile phones, tablets, as well as desktop computers. Designing input methods for touch is of course a fairly new area for me as well, and I'm looking forward to running this session and sharing the knowledge I have acquired, and probably learning a lot in the process!
I hope to see you at the Unicode Conference or the My Language Conference -- or both! Having attended both conferences on multiple occasions previously, I have found both are invaluable forums for learning more about working with languages on computers!
For those of you with a web programming bent, the following may be of interest. We now have 2 APIs available for accessing the Keyman Desktop and KeymanWeb keyboard catalogues. Both APIs are RESTful and return UTF-8 JSON format data only in their initial incarnation.
Updated 25 Jun 2012: Please note the minor changes to the format returned from the V1.0 API below.
KeymanWeb Keyboards
This would provide links to web-based keyboards that allow you to type the language directly into a field in the browser without any need to install any software, hosted on our site www.keymanweb.com. The KeymanWeb API is documented at http://help.keymanweb.com/dev/webapi.php. The only other piece you need for this puzzle is the link to the live KeymanWeb keyboard, which can be generated using the language id and the keyboard id as follows: http://www.keymanweb.com/go/language/keyboard/notepad
This API is similar to the KeymanWeb API. Basically, you can access the full list of keyboards, grouped by language, by calling the following URI: http://www.tavultesoft.com/api/1.0/languages
Updated 25 Jun 2012: The API has had a minor update: keyboard values are now normalised to reduce the size of the response. This minor change was made because the API was still fresh and we wanted to fix this before it became ingrained; don't worry, we won't be fiddling with the structure in breaking ways like this in the future! We also added the description member to keyboard data (this, however, is a non-breaking change).
The response for the query for all language information will be similar to the following:
Updated 25 Jun 2012: You can retrieve a single language record by appending the language code, e.g. http://www.tavultesoft.com/api/1.0/languages/lao. When a single language is queried in this way, the return value folds the keyboards array in as per the example below for Lao.
Keyboard options were introduced in Keyman Desktop 8.0 in response to a common issue – many keyboard packages include multiple keyboards to cater for minor configuration changes. For instance a Tigrigna package has both Eritrean and Ethiopian keyboard layouts, with only a single character difference – the question mark symbol which differs between the two countries. Or Greek keyboards which differ in how the diacritics are typed.
With keyboard options, the keyboard developer can present the user with a dialog box to let them choose how they wish to use the keyboard. This is both less confusing and more flexible than having multiple keyboards.
Now when reviewing this feature a couple of weeks ago, we realized that with some small tweaks, it would become even more useful.
Proposed New Features
Three small updates to the keyboard options feature are proposed:
Testing for base layout. This would enable you to have rules conditional for a French base layout, for example. This, combined with the improvements to mnemonic layouts in Keyman Desktop 8.0, should make it possible to cater for all European base layouts with a single keyboard.
Add feedback on option changes via bubble and icon. While this needn't be limited to notification of option changes, some sort of feedback is really pretty essential for keyboard options to be most useful.
Testing for Base Layout
The test for base layout could be implemented with a special system store, for example:
if(&baselayout = 'FR') + 'q' > 'that key next to Caps Lock' if(&baselayout = 'EN-US') + 'q' > 'that key next to Tab'
Determining how to identify the base layout has not yet been discussed.
Testing for On Screen Keyboard-specific rules
Again, this could be implemented with a special system store. In the following example, the 'Q' key would always show with a capital Q, even though the output would be lower case:
if(&osk = '1') + 'q' > 'Q' if(&osk = '0') + 'q' > 'q' c not technically necessary but shown for completeness
Giving feedback via balloon or icon change
The details of how additional icons could be stored in the keyboard (and the preferred image formats) are not settled. However, in principle it could work as per this example:
store(happy-face-icon) "happy-face.png" store(happy-message) "Keyman is now happy" store(sad-face-icon) "sad-face.png" store(sad-message) "Keyman is now sad"
Determining the icon to use when the keyboard is initially selected would also need to be covered. At this stage, it is not anticipated that the icon would change in the menus or in the configuration screens, but feel free to ask for this – now's a good time to ask!
Keyman 8.0 improved mnemonic keyboards by remapping the On Screen Keyboard to the underlying hardware layout. Previous versions of Keyman did not do this well. This has been a big improvement and makes mnemonic keyboards much more useful with European hardware layouts.
However, apart from this one change, we had to postpone some of the design features that we wanted to add to the On Screen Keyboard. A couple of weeks ago we had a great discussion on how we could improve the On Screen Keyboard by making it more dynamic. While dynamic updates to the keyboard have been on the future feature list for some time, details of how this could work had never been properly explored.
We also linked this discussion to some tweaks to the keyboard options feature introduced in Keyman Desktop 8.0. These were tweaks primarily around testing for base layouts and providing feedback to users when options are selected. The extensions to keyboard options have some relationship to a dynamic On Screen Keyboard but will be expanded on further in a separate blog.
Dynamic On Screen Keyboard
The big breakthrough we made in our discussion was in how we could instruct the On Screen Keyboard to dynamically update. Initially we just thought we could use the existing rules in the keyboard source code to tell the keyboard how to dynamically update. This seemed logical, but the more we looked at it, the more we realised that this would end up being confusing for the end user. There were a few reasons:
Keyman Desktop keyboard rules have to deal not only with basic mapping but also with character reordering and normalisation. These rules would probably be confusing when displayed on the keys.
Some rules make changes to multiple characters in the context but displaying the full output would not be helpful to the end user.
Some rules may input an invisible character but the On Screen Keyboard should instead show a hint for that letter
So we came up with the idea of a new begin statement to cater for the On Screen Keyboard (OSK). This could point to the same group as the normal begin unicode statement, or to a separate group designed to update the On Screen Keyboard. This group could use all the same context matching and finesse that the normal groups could, but the output would only be going to the On Screen Keyboard. So one trick to remember is that the output of the On Screen Keyboard rules will not affect the context for the next keystroke. The new OSK begin rule would be written as follows:
begin OSK > use(osk)
group(osk) using keys
The processing of any groups fired by the begin OSK statement have an important nuance: every matching context rule will be processed in order to render the On Screen Keyboard, for each keystroke. This means that effectively the On Screen Keyboard would run the group 47 or 48 times when updating. This shouldn't have a significant performance cost on computers today. If no rule matches for a given key, then Keyman would render the default output based on either the underlying layout or the US base key (depending on whether the keyboard is mnemonic or positional).
Keeping this in mind, we realised that colour, highlighting, graphics and font hints could be added to the output with a few additional simple statements. These additional formatting hints would be ignored by the keystroke processor, and would just be used by the On Screen Keyboard engine.
bg(colour | default), or background(colour | default)
This would set the background colour for the whole key. The last colour set in the execution of the statements would be the one that would take effect. The colour reference will probably be either a #rrggbb triplet or a standard web colour name.
bg(colour | default), or background(colour | default)
This would set the foreground colour for the subsequent letters to be printed on the key cap. In some scenarios, for example with combining diacritics, this may not be possible, and needs to be investigated.
font(name, size, style)
There are some issues with specifying the size due to the resizeability of the on screen keyboard. It may be specified as a percentage of default size.
image(source-image)
Source-image would refer to a store name which would then cause the image to be embedded in the keyboard file for use in the On Screen Keyboard. Suggested file format would be PNG to allow for alpha transparency. There are issues about sizing and mixing images and text that would need to be resolved.
hint(text)
If the user hovered the mouse cursor over the key, the hint text would be displayed in a balloon. There may be some visual indication on the key of the availability of a hint.
As there may no longer be a static On Screen Keyboard, we may need to specify font and other preferences such as 101 or 102 key layout as well. This is also a topic for further discussion.
Example Dynamic On Screen Keyboard Source File
The following straightforward keyboard will be used as an example of how we envisage dynamic on screen keyboards working.
For simplicity in this example I have not tried to use any pre-composed letters but just left the combining diacritics in place. You'll note that we are using the same rule set for the Unicode rules as for the OSK rules, despite the discussion above. This is because this example is demonstrating the concepts rather than being a comprehensive and complete keyboard layout.
Given the keyboard source above, the Dynamic On Screen Keyboard would display initially as the image below. There isn't anything extraordinary there. The key difference is that there is no static definition for the On Screen Keyboard; rather, Keyman uses the begin OSK statement to define the display of the On Screen Keyboard. Deadkeys (as per the dk(diac) statements) would not be highlighted in any way on the On Screen Keyboard by default. This is because the deadkey statement in Keyman keyboards tends to be used for state management and not purely as a deadkey flags.
Let's look at how the keyboard could be modified to highlight deadkeys. The formatting hints given the example above would be ignored by Keyman Engine for keystroke processing, so let's keep the rules together for now.
Now with an empty context the keyboard would display something like the following mockup:
Finally, let's tweak the keyboard to make the dynamic keys with context highlighted so they are more visible. I'm not going to make the example comprehensive – it'll only cover the grave accent key.
The concepts outlined here present a simple model of how on screen keyboards could give feedback to the end user and help make keyboards more self-documenting. I'd be very interested in any feedback or suggestions you may have – just put them in a comment on the blog.
In the next blog post, I'll discuss how the tweaks to options we discussed can also be used for integration of the On Screen Keyboard.
Updated 23 Aug 2011: In examples, replaced any(diaK) with any(diaO) in context, and replaced deadkey usage with ZWNJ to avoid side-effects with typing diacritic after another letter.
Keyman Developer 8.0 includes the facility to create a combined keyboard and Keyman Desktop installer. However, it is not immediately obvious how to do this. In particular, it may be hard to know where to find the install package for Keyman Desktop, as Keyman Desktop is distributed in through an .exe installer but Keyman Developer asks for an .msi installer! So here's a quick walkthrough.
You'll need to extract the Keyman Desktop .msi file from the .exe installer to your hard drive in order to bundle it in your own .exe installer. Assuming you saved the installer into your Downloads folder, start a command prompt, change directory to the Downloads folder, and run the installer with the -x targetfolder parameter (make sure you create the targetfolder before running the command). In the following example, the installer will extract into C:\Users\mcdurdin\Downloads\keyman.
A couple of dialog boxes later, you'll be left with the .msi file and a couple of other bits and pieces in the folder you chose to extract to:
Now load up your keyboard package in Keyman Developer, and jump to the Compile tab. Click Select Product Installer and choose the .msi file in the targetfolder.
Finally, click Compile Installer. You'll end up with an .exe installer named keymandesktop80-packagename.exe. You can rename this file to whatever makes sense for you.
Starting the installer will present the following splash screen. Note the blurb on the left hand side which lists the keyboards included in the installer.
One final note: if you are planning to distribute your keyboard through the Tavultesoft website, we will automatically bundle your keyboard package with the latest version of Keyman Desktop, so you only need to worry about sending us your .kmp package file.
The Keyman keyboard language has two basic types of keyboard layout: positional and mnemonic. In this blog post, I describe the differences between these two types of keyboard layouts, along with notes on now to develop the two layout types. But first, for those who have trouble with the word mnemonic:
For this blog post, I'll use Greek as an example keyboard layout, because the similarities between Greek and Latin letters are familiar to many, and will help in explaining the principles of mnemonic layouts. The same concepts apply to more dissimilar scripts such as Tamil, Lao or Korean.
Positional layouts
So what is a mnemonic layout? Well, I'm actually going to start by defining a positional layout. I think it is important to understand Keyman's basic keyboard layout model, and the nuances of that model, before moving on to the more complex mnemonic layout model.
A positional layout, in Keyman terms, is a Keyman keyboard that defines its rules according to the position of keys on a physical keyboard. With a positional layout, Keyman does not care what is printed on the key cap of each key; it only cares where on the keyboard that key is. We sometimes call positional layouts "typewriter layouts", in reference to the fixed layout of a typewriter.
Here is an example Keyman Greek positional keyboard layout running on various Windows base layouts:
Greek Positional on English (US) system layout
Greek Positional on English (UK) system layout
Greek Positional on French (France) system layout
Greek Positional on Swedish system layout
Greek Positional on Thai system layout
You'll note that the Greek letters are essentially in the same place on each keyboard, apart from one exception. Now, what appears to be an exception is the backslash (on US) and the extra "102nd" key on European keyboards. The image below shows that the backslash (\) key cap actually changes to a pound (#) key cap on the UK keyboard, whereas the 102nd key on the UK keyboard has a backslash on its cap.
The answer is that the key identified by K_BKSLASH on the US English keyboard moves down one row and to the left of the Enter key. The 102nd key, despite having a backslash on its cap, is a new key cap with a different keycode of K_oE2. So while this initially may confuse, remembering this single exception is the key to joy with positional layouts! Actually, the backslash key moves around even on US English layouts – sometimes it is next to the backspace, sometimes below!
Programming a positional layout
Because Keyman's original design was based on US English (as was Windows itself), the positional layout is defined in terms of a US English keyboard, using the names for the keys that logically match the US English keyboard layout, such as K_A, K_SLASH and K_COLON. These names should not be read as having a more literal meaning and could just as easily have been called C01, B10 or C10 (which would be their names in the ISO 9995 keyboard layout standard). However I think that Keyman's key names, while less precise, are much easier to work with on a practical day-to-day basis! We call these 'virtual keys', due to their similarity to Windows' Virtual Key codes. Don't rely on them being identical to Windows Virtual Key codes, however, because they aren't! A couple of examples may be warranted:
+ [K_A] > 'key to right of caps lock, unshifted' + [SHIFT K_A] > 'key to right of caps lock, shifted' + [CTRL ALT K_A] > 'key to right of caps lock, with ctrl+alt'
The alternative method of defining keys in a Keyman keyboard is to use a single character. Internally, the Keyman keyboard compiler will translate these to the applicable virtual keys. With this model, extra shift states such as Ctrl or Alt cannot be accessed. For example:
+ 'a' > 'key to right of caps lock, unshifted' + 'A' > 'key to right of caps lock, shifted'
The Layout tab of the Keyboard Editor works only with Positional Layouts.
Mnemonic layouts
A mnemonic layout, unlike a positional layout, does not care what the physical keyboard is. Instead it reconfigures itself to map to the key caps of the selected Windows base keyboard (usually the same as the hardware keyboard). Taking our Greek keyboard above, and turning it into a mnemonic layout, gives us the following.
Greek Mnemonic on English (US) system layout
Greek Mnemonic on English (UK) system layout
Greek Mnemonic on French (France) system layout
Greek Mnemonic on Swedish system layout
Note how the Alpha α has moved from the 3rd row to the 2nd row on the French layout: that is, it is placed with the letter A on the French keyboard.
The key advantage of a mnemonic layout is that the keyboard developer can create a single keyboard layout that transparently maps onto almost any Latin script layout. Previously, keyboard layouts would need to be redesigned for each base layout that they were to be used with. This leads to a multiplicity of keyboard layouts to support and difficulty for the end user who may not be sure which base layout they use. There are at least 75 different Latin script keyboard layouts included with Windows 7!
A mnemonic layout also relies on the basic idea that the letters A-Z (and a-z), digits 0-9, and all punctuation available on a US English keyboard will somehow be available on any Latin script keyboard. This is true, as far as we are aware, for all Microsoft Latin script keyboard layouts. This means that the mnemonic layout does not translate across scripts, into Thai or Russian physical keyboards, for example. It would certainly be possible to design a mnemonic layout for a Cyrillic base script, which would then work across the various Cyrillic hardware layouts that are available.
Greek Mnemonic on Thai system layout
The example above may clarify the limitation of using a mnemonic layout with a base script keyboard for which it has not been designed. The reality is, however, that the vast majority of the world's keyboards have access to the Latin alphabet via one method or other. For instance, with Thai, the US English key caps are also printed on nearly all Thai keyboards, so switching to a US English base keyboard is all that is needed to resolve this problem.
There are some additional complexities with mnemonic layouts and on screen keyboards. While Keyman does its best to translate a mnemonic layout for display in an on screen interface, deadkeys can make it difficult to display some aspects accurately across all hardware layouts. It can also be hard to find some of the more buried punctuation on some European layouts.
Planning a Mnemonic Layout
There are a few things it helps to be aware of when developing a mnemonic layout. First is to consider the frequency of access for various keys on the keyboard. For instance, many Greek keyboards use vertical bar (|) to access the iota subscript character <<IOTA SUBSCRIPT>>. The problem here is that vertical bar is not easy to access on many European layouts, being accessible only via AltGr+Shift combination on some layouts.
Next, some punctuation characters are available only via deadkeys in some European layouts. For example, ^ is only available via a deadkey at the top left of a German layout. This means to access this character (and hence the translation for the Keyman keyboard), the user must type ^ <space>, rather than just ^.
The following table provides some hints as to the best characters to use when designing your mnemonic layout. In general, the further down the table, the more likely you are to have usability issues with deadkeys or AltGr on some layouts. Thus use of these characters should be balanced with the frequency of use and their mnemonic utility.
Once you set this flag, Keyman will no longer use Virtual Keys codes. Instead it will use characters – the characters from the key caps. The Layout mode of the Keyboard Editor will no longer be accessible, and you will need to design your keyboard in Source Mode.
When coding your layout, remember that keys are no longer defined via shift states and key codes. Instead they are defined by the characters on the key caps. For alphabetic keys, the shifted and unshifted letters are not unified: in the unlikely event that 'a' and 'A' were on separate keys, Keyman could handle this situation.
For most rules, using the character-based rules is sufficient. However, it is still often useful to access extended characters with modifier keys. In most cases, we recommend overloading only the A-Z alphabetic keys with Ctrl+Alt, AltGr or similar modifiers, as many European layouts use AltGr (which Windows maps to Ctrl+Alt) to access additional punctuation. For the greatest compatibility do not use modifiers at all.
To use modifiers, the syntax is similar to the virtual key syntax for positional layouts. The key difference is how the key is identified: instead of using a virtual key code, you can use any of the characters that the key itself would produce. For example, the following rules would both match Ctrl+A:
+ [CTRL 'a'] > 'Ctrl A' + [CTRL 'A'] > 'Ctrl A'
It is important to recognise that the second example above does not match Ctrl+Shift+A. Again, by example:
+ ['a'] > 'key that produces an A' + ['A'] > 'key that produces an A, same as previous rule'
+ [SHIFT 'a'] > 'shift + key that produces an A' + [SHIFT 'A'] > 'shift + key that produces an A, same as previous rule'
The current release of Keyman Developer will warn you if you use the function keys or other non-character keys on your keyboard as virtual keys (e.g. K_F1). Non-character keys are those keys that do not normally produce a character and do not differ from language to language (except perhaps in the name on the key cap). Technically this should still work in Keyman Engine. A future update of Keyman Developer may allow you to use non-character keys on a mnemonic layout.
There are of course additional keys on some keyboards, such as Japanese or Brazilian keyboards. In most cases, it is best to avoid designing a Keyman keyboard that relies on these additional keys, as that restricts the usage of the keyboard to users from that region. The same rule applies to the 102nd key on European layouts, of course!
Software Developers Only
A little side note for Windows developers: Windows' use of virtual key codes is inconsistent and confusing when it comes to European layouts. The names of the keys shift, logically, in many cases; for instance the VK_Q and VK_A swap places between US QWERTY and French AZERTY layouts. However, the defined virtual key codes for punctuation in many cases change unexpectedly; for example VK_COLON does not produce a ; on a French keyboard! For this reason, Keyman does not use virtual key codes for mapping mnemonic layouts but instead translates the layout based on the Windows keyboard definition itself.
The Future
We are working on the On Screen Keyboard in Keyman Desktop to improve support for mnemonic layouts and keyboard options. A future version of Keyman Engine may include support, via the keyboard options feature, for providing knowledge of the underlying layout to the keyboard layout. This would allow you to optimise access to Iota Subscript, for example, when it is harder to access on a specific system layout.
Andrew Cunningham recently asked me if there was a facility in Keyman Developer 8.0 to generate a Keyman Desktop .kmn source file from a Windows keyboard layout. This functionality is not included in Keyman Developer but in this post I'll make available a command-line tool that will do that very thing.
ImportKeyboard is straightforward to use. If you pass no parameters, you'll be shown basic usage:
/list shows a list of keyboards available on your system.
hkl should be an 8 hex digit Keyboard ID. See /list to enumerate these. /kmw should be specified to target KeymanWeb. When using KeymanWeb, the imported keyboard will: * Ignore CAPS/NCAPS * Convert RALT to CTRL+ALT
If output.kmn is not specified, the filename of the source keyboard will be used; e.g. 00000409 will produce kbdus.kmn
The /list option will display a list of the keyboards available on your system for import:
C:\>importkeyboard /list ID Filename Keyboard Name 00000401 KBDA1.DLL Arabic (101) 00000402 KBDBU.DLL Bulgarian 00000404 KBDUS.DLL Chinese (Traditional) - US Keyboard 00000405 KBDCZ.DLL Czech 00000406 KBDDA.DLL Danish 00000407 KBDGR.DLL German 00000408 KBDHE.DLL Greek 00000409 KBDUS.DLL US 0000040a KBDSP.DLL Spanish 0000040b KBDFI.DLL Finnish 0000040c KBDFR.DLL French 0000040d KBDHEB.DLL Hebrew 0000040e KBDHU.DLL Hungarian 0000040f KBDIC.DLL Icelandic 00000410 KBDIT.DLL Italian 00000411 KBDJPN.DLL Japanese 00000412 KBDKOR.DLL Korean 00000413 KBDNE.DLL Dutch 00000414 KBDNO.DLL Norwegian 00000415 KBDPL1.DLL Polish (Programmers) 00000416 KBDBR.DLL Portuguese (Brazilian ABNT) 00000418 KBDRO.DLL Romanian (Legacy) 00000419 KBDRU.DLL Russian 0000041a KBDCR.DLL Croatian 0000041b KBDSL.DLL Slovak 0000041c KBDAL.DLL Albanian 0000041d KBDSW.DLL Swedish 0000041e KBDTH0.DLL Thai Kedmanee 0000041f KBDTUQ.DLL Turkish Q 00000420 KBDURDU.DLL Urdu 00000422 KBDUR.DLL Ukrainian 00000423 KBDBLR.DLL Belarusian 00000424 KBDCR.DLL Slovenian 00000425 KBDEST.DLL Estonian 00000426 KBDLV.DLL Latvian 00000427 KBDLT.DLL Lithuanian IBM 00000428 KBDTAJIK.DLL Tajik 00000429 KBDFA.DLL Persian (snipped)
To import a keyboard, for example Tajik, use the following command:
C:\>importkeyboard 00000428 tajik.kmn Importing Windows system keyboard 00000428 to Keyman keyboard tajik.kmn
Let's have a look at what is produced. This particular file is straightforward - it does not include deadkeys.
c c Keyman keyboard generated by ImportKeyboard c Imported: 26/07/2011 8:23:34 AM c c Source Keyboard File: KBDTAJIK.DLL c Source KeyboardID: 00000428 c Target: Keyman Desktop c c
When creating a keyboard for KeymanWeb, it is useful to include the /kmw command line option. This modifies the output to avoid NCAPS and CAPS references, and converts any AltGr rules to Ctrl+Alt, as neither of these scenarios are supported by KeymanWeb.
The importkeyboard tool also supports deadkeys. These will be added in a separate group at the end of the file, for example with French AZERTY (some lines removed from this example):
This secondary group runs after any keystroke, and when a deadkey and character pair is detected, outputs the correct character. This behaviour is not identical to Windows deadkeys: Windows' default behaviour when a deadkey does not match on the second key is to output both the default value for the deadkey and the second character, whereas unmatched second keys in the Keyman keyboard will simply cause the deadkey to be ignored. However, this behaviour should be close enough for most uses.
Once you have created your keyboard, you can of course load it into Keyman Developer and edit it further there. Creating the On Screen Keyboard is as easy as pressing the Fill from layout button:
And there you have it. Importkeyboard.exe requires .NET framework 2.0 to run.
Usage: importkeyboard /list | [/kmw] hkl [output.kmn] /list shows a list of keyboards available on your system.
hkl should be an 8 hex digit Keyboard ID. See /list to enumerate these. /kmw should be specified to target KeymanWeb. When using KeymanWeb, the imported keyboard will: * Ignore CAPS/NCAPS * Convert RALT to CTRL+ALT
If output.kmn is not specified, the filename of the source keyboard will be used; e.g. 00000409 will produce kbdus.kmn
This blog is about writing Keyman Desktop keyboard layouts. It is assumed that you are familiar with the concepts outlined in the keyboard tutorial and language reference.
When designing a keyboard layout, a common technique is to break processing of complex rules down into multiple groups. A typical design is:
begin > use(constraints)
group(constraints) using keys c match invalid keystrokes nomatch > use(main)
group(main) using keys c normal processing rules match > use(post-process)
group(post-process) c normalisation, additional text processing c note this group does not match keystrokes but only processes context
This design makes it easy to make your keyboard layout flexible and very useable, and is typically much easier to validate for correctness than if all the rules are in a single group.
One common scenario that this model does not necessarily cover, however, is how to match end of word scenarios. The example we will use is word-final sigma in Greek, which differs from the medial form as shown:
Medial sigma
σ
U+03C3
Word-final sigma
ς
U+03C2
There are a number of ways of handling this: leave the choice to the end user, show word-final sigma first, or show medial sigma first. The choice really depends on the language; for some languages option one may be disconcerting for the end user; conversely, for other languages the second approach may be confusing.
1. Leave the choice to the end user
A naive keyboard layout will just have two keys for sigma. This is a valid design, but it can be annoying for end users. In other languages, there may be too many options for this to be a useful solution.
2. Show word-final sigma until another Greek character typed
This means that we assume that a sigma as word final until another letter is added to the word, and change it at that point. This would be best done in post-processing, rather than adding complexity to the main group:
group(post-process)
'ς'any(greek) > 'σ'context(2)
3. Show medial sigma until a word ending character typed
The alternative is to only output word-final sigma if an end-of-word is detected. This has two components: matching end of word punctuation, including punctuation specific to your language; and matching special end of word keys, specifically Space, Enter and Tab.
group(main) using keys c and other normal rules + any(punctuation) > index(punctuation, 1) + any(word-ending-key) > deadkey(word-ending) match > use(post-process)
You'll note that the word-ending-key rule adds a deadkey to the output. This deadkey is really used just as a flag to tell the post-process group that we need to do additional special processing on this. This is a common technique in Keyman Desktop keyboards. We also want to make sure we use the post-process group because there may be other rules that need to fire, e.g. automatic breathing marks may be added in Greek.
Similarly, the punctuation rule may appear unnecessary. However, if you do not include the rule, the match rule will not fire, and therefore the post-process group will not run. You may also have language-specific punctuation that actually needs conversion.
However, where I'd like to focus your attention is on the final group. This is where the magic happens with end-of-word keys. One may be tempted to try and just push out the virtual key as output in a special rule in the main group, e.g.
'σ' + [K_RETURN] > 'ς'[K_RETURN] c don't use this!
But while this virtual key output technique does kinda work, it is not a supported feature of Keyman Desktop, and definitely won't work in KeymanWeb. So how do you get that keystroke to the application?
The answer is that Keyman and KeymanWeb have special processing for unmatched keystrokes in a "using keys" group: they are always sent on to the application. For technical reasons, that I won't get into here, this is much easier than synthesising arbitrary keystrokes as would be required with virtual key output. To take advantage of this behaviour, we just have to make sure the rule processing ends in a "using keys" group that does not match the desired key.
This technique is powerful but has a few pitfalls that it helps to be aware of:
There is a temptation to do the conversion on navigation keys as well as word ending keys. This makes some sense as a user may type a word then navigate to another part of the document, and would then end up with an incorrect medial sigma. However, this is dangerous because it makes editing a medial sigma very difficult! It is best to restrict to punctuation and word ending keys space, enter and tab.
The nomatch rule is also one to be careful with in "using keys" groups: it only applies if the keystroke would generate a "normal" character. Therefore Space or "a" would trigger a nomatch if no rule matched, but Enter or Tab would not (these keys may generate control characters, but not normal printing characters).
You should also consider which punctuation should trigger end-of-word processing: for instance, hyphen (-) may not, while double hyphen (--) may! This is fairly easy to handle in the post-process group (please note, this is hypothetical and may actually not apply to Greek!):
group(post-process) c 'σ-' > 'ς-'Don't do this as hyphen may be valid in middle of word
'σ--' > 'ς--'
c or you may opt to do some nice 'autocorrect': c 'σ--' > 'ς—' c '--' > '—'
Conclusion
This simple example demonstrates some of the techniques and tricks that you can use to make your keyboard layout work most effectively. The constraints/main/post-process structure has proven to be a powerful and easy to maintain template for many different languages. The final group design pattern allows you to safely handle end-of-word scenarios. And, as always, you need to think about how your keyboard will be used. Your users will be typing away with your keyboard all day: a solid design will be much appreciated by your keyboard users.
A few days ago I was assisting a Tamil customer with a Unicode keyboard they had designed which used visual input order. Visual input order means that vowels such as TAMIL VOWEL SIGN E, U+0BC6 ெ are typed before the consonant with which they combine, even though they are stored after the consonant as per the Unicode standard. Our customer was running into a problem where U+0BC6 ெ was combining with the wrong consonant in a run. In this blog I'll discuss some of the possible solutions and potential issues with those solutions and finish with the solution that we proposed and that the customer chose to use. These solutions apply to any Indic script but we will use Tamil as an example.
The basic keyboard layout is shown below. For this discussion, I've only populated 3 keys – E, A and K. So you won't be seeing real words in this example – the examples have been chosen as the simplest way to illustrate rendering complexities, and not as valid Tamil text.
The Keyman source file looks like this:
store(&VERSION) '8.0'
store(&NAME) 'My First Tamil Keyboard'
store(&MESSAGE) 'Demonstrating Visual Input Order'
begin Unicode > use(main)
group(main) using keys
+ 'a' > U+0BBE
+ 'e' > U+0BC6
+ 'k' > U+0B95
Now with this visual input order keyboard, the Tamil vowel U+0BC6 ெ is stored after the consonant in the document but typed before it. The keyboard as it stands won't do that:
Typed
Expected Display
Actual Display
Text Stored
e
ெ
ெ
U+0BC6
ek
கெ
ெக
U+0BC6 U+0B95
The initial solution was to add a rule to reorder these:
U+0BC6 + 'k' > U+0B95 U+0BC6
This fixes the initial issue but introduces the Travelling Vowel Problem – a vowel that just won't stay where it is put:
Typed
Expected Display
Actual Display
Text Stored
e
ெ
ெ
U+0BC6
ek
கெ
கெ
U+0B95 U+0BC6
ekk
கெக
ககெ
U+0B95 U+0B95 U+0BC6
ekkk
கெகக
கககெ
U+0B95 U+0B95 U+0B95 U+0BC6
The problem here is how to tell that the U+0BC6 ெ has already been combined with a consonant to prevent it moving further down the text store. The solution initially chosen by our customer involved using U+200C ZWNJ to stop the vowel U+0BC6 ெ from moving along to the next consonant:
U+0BC6 + 'k' > U+0B95 U+0BC6 U+200C
This simple change stops the rule from matching repeatedly, because U+0BC6 ெ is no longer at the end of the context. But does that solve the problem completely?
Typed
Expected Display
Actual Display
Text Stored
e
ெ
ெ
U+0BC6
ek
கெ
கெ
U+0B95 U+0BC6
ekk
கெக
கெக
U+0B95 U+0BC6 U+200C U+0B95
Okay, so this seemed to display just fine but behind the scenes we now had an extra U+200C ZWNJ in the text store which is certainly not ideal. Our customer noticed this when one application rendered U+200C ZWNJ as a space rather than zero width.
So what if we used Keyman's deadkey functionality to not actually store a character in the text, but still flag that the vowel has been combined?
U+0BC6 + 'k' > U+0B95 U+0BC6 deadkey(combined)
Typed
Expected Display
Actual Display
Text Stored
e
ெ
ெ
U+0BC6
ek
கெ
கெ
U+0B95 U+0BC6 (dk)
ekk
கெக
கெக
U+0B95 U+0BC6 (dk) U+0B95
Success! Or is it? What happens when we type the following?
Typed
Expected Display
Actual Display
Text Stored
k
க
க
U+0B95
ke
கெ
கெ
U+0B95 U+0BC6
kek
ககெ
ககெ
U+0B95 U+0B95 U+0BC6 (dk)
kekk
ககெக
ககெக
U+0B95 U+0B95 U+0BC6 (dk) U+0B95
Hey! We don't want to combine with that consonant – this is a visual order keyboard! This could be called the Overenthusiastic Vowel Combining Problem. However, the text is stored correctly.
So that's a rendering issue again. We can't solve that with a deadkey statement. It looks like our customer was on the right track after all. The key to solving this is to remember that the uncombined vowel is an intermediate state. We can temporarily add a U+200B ZWSP before this vowel to stop it combining to the consonant, knowing that we can delete the U+200B ZWSP as soon as the combining consonant is typed, by changing two rules in the keyboard:
I chose U+200B ZWSP because it does not have any other shaping behaviour. Now when we type our test sequences, we get the following:
Typed
Expected Display
Actual Display
Text Stored
k
க
க
U+0B95
ke
கெ
கெ
U+0B95 U+200B U+0BC6
kek
ககெ
ககெ
U+0B95 U+0B95 U+0BC6
kekk
ககெக
ககெக
U+0B95 U+0BC6 U+0B95 U+0B95
I've highlighted in that table the U+200B ZWSP character that is stored temporarily to prevent the U+0BC6 ெ from combining with the previous character. Notice that U+200B ZWSP gets deleted in the next step. This simple pattern solves both the Overenthusiastic Vowel Combining Problem and the Travelling Vowel Problem.
Just for fun, I'll add one final rule to handle the TAMIL VOWEL SIGN O, U+0BCA ொ. This is a combination of the U+0BC6 ெ and U+0BBE ா vowels, and is rendered on both sides of the consonant it attaches to. This ends up being a single, simple rule:
Using these design patterns, you can create visual input order keyboards for any of the Indic scripts, and you would transfer the same principles to phonetic input methods. Judicious use of the any, index and store statements will also make light work of handling all the possible combinations. Other considerations that I have not covered here include visual order backspacing and prevention of illegal combinations such as U+0B95 U+0BBE U+0BBE காா.
I have just updated the documentation around the Keyman Developer CRM - specifically, details on how to manage your customer licence records and create new customer licences for your custom Keyman-based products. See http://tavultesoft.com/keymandev/documentation/70/tutorial_crm.html for details!