おげんきですか？ That means “How’s your health” in Japanese, it’s a typical greeting you may use when seeing a friend for the first time after a long period. In English the closest colloquial convention might be “Hi, how are you?”. But you don’t really care about that do you? You care more for the fact that I, Craig Tataryn a native English speaker with only a few months of study in Japanese, can read those “crazy characters” and make any sense out of them. It’s true, I assure you! The real crazy thing? It only took me a few weeks to learn how to write and pronounce every single Japanese word. If someone were to speak a Japanese word to me, I could write it. If someone wrote a Japanese word, I could pronounce it. This is not because I’m some sort of linguistic savant , it’s because Japanese is actually very concise in both written and spoken form, and logical in how words are constructed. Two aspects of language I am already familiar with in my craft of Programming.
What’s the point of this blog post? I just find Japanese interesting and I think it’s because I’m a programmer. So read on as I draw some parallels between Japanese and programming concepts, at the very least you too might start to see the logic in the language and be able to decipher those “crazy characters”. がんばって！ (good luck!)
English is not concise. If I were to say to someone after they explained a concept to me “that was very concise” what I probably mean is that the person used a lot of $5 dollar words and they didn’t explain it in more than one way. They just explained the concept in the most “concise” way possible.
An example: “What is a Set?”
Concise Answer: “A Set is a collection of distinct objects“
That’s probably the bare minimum amount of words you could use to describe what a Set is making it a concise explanation. However this type of conciseness is not what I’m referring to when I say the Japanese language is concise. No, what I mean is the Japanese don’t, except in a few instances, have zany rules which state, such as in English, that “a character may change its sounds depending on the characters adjacent to it”
Case in point: Parmesan
Let’s just focus on the “P” here. It has the sound “puh” in this word. Now how about “Pharmacy“? All of a sudden our “P” sounds like an “F”! All because there is a rule which states “if a P precedes an H, they combine to be pronounced fa“. How about the “san” part? Is it “san” as in “sand” or “sawn” or “zshawn” as the Italians would say because it in fact an Italian word, although how could you ever know that unless you were told right?
Well, ok, that example’s not so bad you might say. But it is bad, very bad actually, when you consider there are literally 1000s of these rules! This is typical of the “Indo-European Languages“, ergo Spanish, English, French… All languages which have “evolved” and grew up in close proximity to one another. They borrowed words from base languages such as Greek, Latin and then modified a number of linguistic rules. Not only did they borrow words from one another, they also changed the spellings in some cases and others they left it the same. How confusing is that!
Japanese is different though, I mean think about it in terms of geography. Europe, where Germanic, Latin and Greek languages all evolved within “walking distance”, so to speak. Japan on the other hand is an Island. You already know what effect Islands have on thing evolving, I mean think of all the species of animals native to Australia which exist no place else on Earth! So too is true of the Japanese language, it shares no genealogical relationship with any of its neighbours across the sea. Which is amazing when you consider that China would have been the mecca of wealth, power, religion and thought during Japan’s formative years. Yet, the Japanese, in what some may consider “their stubborn ways”, clung onto their own distinct language without allowing other outside languages to pollute it. In fact, as we’ll see in a bit, the Japanese developed a system to actually isolate “foreign words” and the way they did it has a strong analog with a programming concept we all know and love. But more on that later.
Syllabary vs Alphabet
First off, the Japanese language is not based on an alphabet, but rather a “syllabary”. What’s that you say? Well, instead of the Japanese breaking down their words into units knows as letters, they instead broke them down into syllables. Does it all of a sudden make sense why the poetic form of Haiku originates from Japan?
To construct their syllabary the Japanese took the following base sounds:
|a||“ah” as in “law”|
|i||“ee” as in “tee”|
|u||“oo” as in “too”|
|e||“ay” as in “play”|
|o||“oh” as in, well, “oh” 🙂|
That’s it guys and gals! There is no more! The only thing that’s left to do is pick some consonants and then combine them with the above table. So you’ll end up with “ka, ki, ku, ke, ko” and “na, ni, nu, ne, no”. They’ll trip you up a bit with the “t” and the “s” variants, because you end up with “ta, chi, tsu, te, to” and “sa, shi, su, se, so”, but honestly, that’s about as complicated as it gets.
This simple syllabary amounts to 9 possible consonants combined with the 5 vowels. Add in a few special modifiers such as diacritical marks, and you end up with roughly 110 possible sounds. That’s it, that represents *all* possible sounds you’ll ever hear in the Japanese language. Sound like a lot? Contrast that to English where we have about, oh… I don’t know… 8000?!?!?!?!!!!???? This is a big reason why I can boast about knowing how to pronounce any Japanese word, because there aren’t in all actuallity that many allowable sounds in the language. Lord help the poor Japanese ESL student forging their way through our behemoth of a language.
At some point in their history, the Japanese codified these base syllables into the syllabary known as “Hiragana”
の means の (and never “know”)
ねこ (cat) is pronounced “neh-ko”. ne (ね) and ko (こ) *always* sounds like that. It doesn’t really matter what characters are adjacent to them (aka beside them), they’ll aways sounds like “ko” and “ne”. Just like if you wanted to pronounce the sound “no”, it’s always written as の regardless of where it’s placed, or what word it’s in. Think of all the times in English when the pronunciation vs written form of a word is ambiguous without the context in which you are using the word… know/no, red/read,wear/where/ware,right/write/rite. I can go on like this for sum thyme 😉
There are a few instances however when characters can have their sounds modified, albeit only somewhat, based on the character which follows it. Luckily a visual queue provides a hint to help you to know when this rule is in effect. The rule goes like this: the second character will effect the sound of the character preceding it, and this second character will always be about 2/3rds as tall. Here’s an example, the sound “Kyū”. If you reference the Hiragana table above you’ll quickly see there is no character which represents the sound “K” on its own. There are only “ka, ki, ke, ku, ko”, known as the K-sounds. There is however a character which represents “yu” ゆ. So how do we form this word Kyū (written on its own means 9) such that it’s writable in Japanese? Here’s how: きゅう. We take the character for “ki” – き and we append a smaller version of “yu” ゅ to it, then follow up with a う(oo as in too) to complete the prolonged ū sound.
The good news is, this modification isn’t *that* far off from if you were to pronounce the two syllables ki, yu and u quickly together anyway. This small-yu variant is just a mechanism to represent what happens naturally when you say patterns of syllables that naturally “slide” off the tongue in a certain way when said in succession.
To the untrained eye one might not pickup on the subtle difference in the height of the modifier character ゅ. It’s possible that’s because you are looking at a font on a screen, and not a hand written representation where the difference is more pronounced. However to a native speaking Japanese, they know, even without a clear distinction in heights, that a き followed by a ゆ always combines to make the sounds “kyu” (English: cue), the difference in height is just a nicety the ancient Japanese linguists built in for us. These syntax rules make for a very concise way in which words are constructed, much like a programming language. Could you imagine if your programming language had all the “gotchas” of English? The compiler would need to run on a quantum computer just to parse the code!
Do you remember when I said previously that the Indo-European Languages often borrowed words from one another (Parmesan) whilst changing the spelling and pronunciation of some of these borrowed words but not others? THIS NEVER HAPPENS IN JAPANESE! That’s not to say the Japanese won’t allow a foreign word to penetrate their vocabulary, quite on the contrary, it’s just that they follow a strict process every time they adopt a non-Japanese word.
The Japanese created a special character set; or what we programmers might call a “namespace”; for these words, and then they mapped syllables from the foreign word to the closest matching sounds in the Japanese language. So if a Japanese wants to say the word “Computer”, they would instead map the syllables “Com” “pu” “ter” to the closest sounding Japanese equivalent “Kon” “pyu” “ta” and then instead of using the native Japanese character set of Hiragana, they created a new character set, aka Namespace, called Katakana to codify all foreign words. This means instead of writing “Konpyuta” in Hiragana as “かんぴゅた” they would instead write it as “カンピュタ”.
This new namespace provides all the visual cue one needs to discern the fact that the word they just read was “imported” into the Japanese vernacular. Essentially, they imported a 3rd party library when they need functionality from outside their code base 😉
Hiragana, Katakana, wow two character sets to represent the same sounds used in different contexts. Surely that’s enough right? Wrong! Japanese has yet another character set, and it’s a behemoth! Weighing in at around 2,000 officially accepted characters is, what is known as, the Kanji set. So you may be asking yourself “If Hiragana is for Japanese words, Katakana is for foreign words, what on earth could Kanji be?”. Well, in a word, Kanji are Macros for Hiragana.
Kanji provide a means to compile multiple Hiragana into just one written character. Or stated differently, Kanji is the macro, your pronunciation of the Kanji is the expanded form. Otherwise, Japanese sentences can get pretty long when written purely in Hiragana! Now, what are the Kanji based on? The Kanji actually represent the first written form of Japanese. They came before Hiragana/Katakana and are comprised of characters imported from the Chinese language. You see, when the Chinese first started trading with Japanese, Japan had a formal spoken language but not a written one. Instead of inventing their own set of characters, which isn’t too useful if you are trying to trade with someone who doesn’t understand your language, they instead selected the Chinese characters they needed in order to carry on a conversation. The characters were lumped into two categories:
- Chinese characters representing an existing word in Japanese, for example cat 猫 (maō), would be imported into the Kanji set and it’s pronunciation changed to match the Japanese equivalent word. In this case cat is “neko” in Japanese. Essentially the Japanese were compiling the Kanji down to their base assembly language that they natively understood.
- Chinese characters for which there were no Japanese words available at the time. For these characters, the Japanese asked “how do you pronounced this in Chinese?” and then they did a linguistic mapping exercise, similar to how they later mapped other foreign words to Katakana. They simply found the Japanese syllables that matched closest the Chinese pronunciation of the character and that is how they would pronounce the character. This had two effects, the first being the Chinese character was now pronounceable by a Japanese person because it used only syllables familiar to Japanese words and secondly it increased the vocabulary of Japanese in adding new words not previously known.
One important note about the difference between the Hiragana/Katakana character sets and Kanji is that the Chinese characters not only represent sound(s), they also represent a meaning. That’s what makes the Chinese character sets so absolutely huge, and that’s why you don’t see me attempting to list them in a table on this page! They literally have a character to represent everything. Rain (雨), snow (雪), dog (狗), snake (蛇), law (法).
In contrast, Hiragana/Katakana are more like our own alphabet in English in that we use meaningless characters as building blocks for the meaningful concept of words.
Bringing it all together
It is for all of the preceding reasons why a typical sentence in Japanese, as you might read in a news article, looks like a mixture of simple characters and complex ones. What you are seeing is simple Hiragana/Katakana intermixed with the more visually complex Kanji characters imported from China. For instance, this random sentence from Yahoo! Japan:
Can you now distinguish the Kanji from the Hiragana and Katakana?
So that’s that…
Hey, if you’ve read this far I thank you so much for indulging me. Sometimes I just like to blog about things I’m interested in, and in this case I find it interesting to think perhaps Japanese appeals to me because, at its core, the language exhibits the same logic and conciseness of a programming language I might love to use. I hope this has piqued at least a few reader’s interest.
P.S. I couldn’t’ help but draw some parallels from this post to this funny article by The Onion…