Tag Archives: japanese listening

The Rhythm of Japanese: Improve your speaking and hearing

Drama Club begins with Amenba no Uta (Heartcatch Precure, Ep 16)
Drama Club begins with Amenbo no Uta (Heartcatch Precure, Ep 16)

The Amenbo no Uta is sometimes recommended as a way to improve one’s Japanese pronunciation. However, it can do more than that. It can also improve one’s ability to hear and comprehend the language.

But what is the Amenbo no Uta? It is a nonsense poem that is used by just about all Japanese speaking professionals as a daily warm‐up, from news‐readers to voice actors (and very likely politicians too). While it reads like a stream‐of‐consciousness dream‐sequence, there is method in its words, but not narrative method. Its aim is to drill all the sounds and combinations of Japanese.

It runs to a very strict rhythm, and this is particularly important for the Japanese learner.

One of the problems, not only with speaking but also with hearing, is that we post‐process what we hear rapidly and immediately. For first‐language comprehension this is very useful. We are able to hear all kinds of strange accents, mumbled words and distortions and adjust for them, processing them back into the sounds they “ought” to be.

However, when we hear foreign sounds, we process them back to the nearest familiar equivalent, often helped in the case of Japanese by Romaji transliterations, which also represent the nearest English language equivalent. So we hear し as shi, for example (in fact it is neither shi nor si but a sound that does not exist in English). This doesn’t matter too much for comprehension or comprehensibility, though it is good if we can overcome it.

More important is the fact that the English sense of rhythm is radically different from the Japanese, and this does make Japanese very hard to hear. Our brains are attempting to post‐process what we hear into something English‐like that is very different from what we have heard.

That is one great importance of shadowing. It forces us to say, and therefore become aware of, what we are actually hearing, not what our brains want to post‐process it into.

Reciting the Amenbo no Uta can also help with a very important aspect of this: Rhythm. English is a stress‐timed language, while Japanese is mora‐timed. A mora is not actually the same thing as a syllable. This statement sometimes puzzles people, but I believe we can demonstrate here exactly what we mean, using the Amenbo no Uta.

First of all, I would like you to read and listen to the first four verses. They are written in kana. As you may know, every kana corresponds to exactly one mora (ex cept the small versions of the three y‐kana: ゃ,ゅand ょ which always combine with い‐row kana to form single‐mora glides: きゃ, りゅ, ちょ etc.)

Here is the reading, at a relatively slow “training speed”:

あめんぼ あかい な
あ い う え お
うき もに こえびも
およいでる

かき のき くり のき
か き く け こ
きつつき こつ こつ
かれ けや き

ささげ に すを かけ
さ し す せ そ
その うお あさせ で
さしました

たちましょ ラッパ で
た ち つ て と
とて とて たった と
とびたった

I think you can hear the very regular rhythm:

1234 1234

12345

1234 1234

12345

Each mora is a beat of the poem (and you will also find this in most Japanese songs).

So here is why a mora is not a syllable. We have a very good example in the first line:

あかい な akai na (1234).  English speakers are prone to pronounce and hear this as a 123 “ak eye na”. Because in English the diphthong ai (often spelled “I”) plus any attached consonants is one syllable, eg. “time”, “mind” “sky”.

In Japanese there are no diphthongs. Japanese あい is not the one‐syllable English sound “I”. It is two morae あ and い. It is the same if a consonant is attached:

かい is か+い: two morae.

あかい is あ+か+い: three morae.

This is why it is important to read Japanese in Japanese script.

Going back to the beginning of the line, we are prone to read あめんぼ (1234) as amenbo (123). It doesn’t help that that is how it is written in Romaji, but we would do it anyway because that is how English works. We post‐process what we hear into something we are used to hearing rather than what we actually are hearing.

So already, unless we are attuned to the rhythm of the poem we have 123 123 instead of 1234 1234 for the first line.

あめんぼ is four kana and four morae. a me n bo. ん is always a mora of its own. Japanese people always think of it that way. If a Japanese person says bangohan to you and you don’t catch the word, she will repeat it slowly and carefully:

ba n go ha n (12345)

pronouncing each mora separately.

Fortunately, for much of the poem morae and syllables are the same, so we are able to catch and hold the rhythm easily. But the places where they aren’t are very important. By using this exercise regularly and getting it into one’s blood, one begins to feel the mora‐rhythm of Japanese. Once one has that, the language becomes more hearable. Naturally the Amenbo no Uta should be combined with listening to regular Japanese (your favorite anime is fine, so long as there are no English subtitles).

To round up a few more common difficulties in these first four verses:

We are prone to hear さしました sashimashita (stung) as sashimashta (1234). This is not exactly the fault of Romaji (though the kana tell us how many morae there really are). Most Japanese speakers suppress the second “i” sound almost completely, if not completely. But even if it is completely suppressed it still takes up a mora.

This is another way morae differ from syllables. A mora does not have to contain much in the way of actual sound. It is a beat, whether fully voiced or not.

When saying the Amenbo no Uta you could emphasize this by pronouncing the word sashimashita with the second “i” vowel fully spoken. However, I would advise against this. It is important to practice giving full mora value to morae with no vowel.

A very similar consideration applies to the small tsu. The fourth verse uses several of these, and each time they occupy a mora (as they always do).

We may hear たちましょ ラッパ で tachimasho rappa de as 1234 123, but if we do, it is because we are not counting the gap between ラ and パ (marked by ッ) as a beat.

Because I am writing this, it tends to sound very theoretical. In fact it is quite the opposite. We are talking about the rhythm of the language. Its very heartbeat. You need to to feel this in your blood, not just know about it.

If you can chant Amenbo no Uta every day with its proper rhythm…

1234 1234

12345

1234 1234

12345

…you can safely ignore everything I have written here. You will be getting the language in a more natural way.

Still, I hope you found it of some use.

Reccomended: Harmonizing – How to Shadow Japanese


The Full Amenbo no Uta

あめんぼ あかい な
あ い う え お
うき もに こえびも
およいでる

かき のき くり のき
か き く け こ
きつつき こつ こつ
かれ けや き

ささげ に すを かけ
さ し す せ そ
その うお あさせ で
さしました

たちましょ ラッパ で
た ち つ て と
とて とて たった と
とびたった

なめくじ のろ のろ
な に ぬ ね の
なんど に ぬめって
なに ねばる

はと ぽっぽ ほろ ほろ
は ひ ふ へ ほ
ひなた の おへや にゃ
ふえ を ふく

まい まい ねじまき
ま み む め も
うめ の み おちて も
み も しまい

やき ぐり ゆで ぐり
や い ゆ え よ
やまだ に ひ を つく
よい の いえ

らい ちょう さむ かろ
ら り る れ ろ
れんげ が さいたら
るり の とり

わい わい わっしょい
わ い う え を
うえきや いど がえ
おまつり だ


PS ‐ there is one line (only one) where the scansion is not regular. Once you have the feel of it you will get a little “glunk” (to use the technical term) on that line. Presumably it was caused by the exigences of getting all the sound combinations into one poem. It really is a 力作, or a tour de force as we say in ‐ uh ‐ English.

I will send an invisible winged Dollykiss to the first person to identify the “odd” line in the comments below.

Reccomended: Harmonizing – How to Shadow Japanese

Kikitori: Japanese listening – the Dolly Sentences Method – Technical how-to-do-it

I have talked a bit about the dolly sentences method of kikitori or Japanese listening study. Or rather, general Japanese study with an emphasis on listening. Now we are going to look at the technicalities of how it is actually done.

I am using a Mac, and as we will see, there is a certain advantage in that, but for the most part the method will be similar for other operating systems.

First, in your Anki you need to install an addon (Tools > Addons > Browse & install) – one or both of Awesome TTS and Google TTS. Once you have installed it/them, you will see one or two speaker icons added to the top bar of the add card window:

kikitori-japanese-listening-2If you click one of these it will give a window like this.

kikitori-japanese-listening-3Here you can type or paste the text you want spoken and click the preview button. If you get silence, it is probably because you haven’t changed the language. The language drop-down must be Japanese for Google or Kyoko for OSX’s voice system (unless you are using an Apple device don’t worry about Kyoko).

The text will be read back to you. You may need to make some changes. Sometimes the synthesizer will read kanji incorrectly. Kyoko – while in most respects the best consumer-level voice-synthesizer available – is particularly bad about this. She reads 人形 as ひとかたち, for example – particularly galling to a doll. If that happens just re-spell the word in kana. You may also need to add or delete commas to get the sentence read in a way that is clear and understandable.

When you are happy with the synthesis, just hit OK and the file will appear wherever your cursor was when you opened the box (I have an Audio field on my cards as you can see in the first screenshot).

This is really all there is to adding spoken sentences to Anki. For my method I have the audio file play on both the front and the back of the card.

You can then harvest the sentences to put them on your MP3 player in accordance with the Dolly Sentences Method. It really is easier than you might think. First you need to find them, and they are in your Anki folder in a sub-folder called collection.media. Here it is (you can click the image to enlarge it):

kikitori-japanese-listening-1Let’s look at the red rings.

1. (top and bottom) shows you the file-path on a Mac. It will be similar on Windows. Just search “collection.media” if you have trouble finding it.

2. shows you the actual sentences. They are small MP3 files. If you are using Mac OSX’s Kyoko voice, the title of the file is conveniently the sentence itself.

3. If you are using the Google voice synthesizer the title is a lengthy code. But don’t worry because:

4. You can always use the date of the file to show you what you have added recently.

What you will do is simply copy-drag all your recently-added sentences into a folder and add this to your MP3 player. It really is as simple as that.


Update: Google TTS no longer supports Anki, but Awesome TTS now gives access to the vastly superior Acapella TTS engine among others. This also has the advantage that you now only need one addon to choose between Acapella and Apple’s Kyoko voice if you have her. You can also now choose human-readable filenames for everything. The instructions in this tutorial still apply.


Here is a sample, complete with recommended 3-second break, to show how the sentences actually sound:

These sentences are spoken by Mac OSX’s Kyoko voice, which, apart from her problems with Kanji reading, is in my view the best Japanese voice synthesizer available. The third sentence is spoken by the Google synthesizer, so you can hear the difference.

I use about 95% Kyoko with the Google alternative for the minority of occasions when Kyoko really won’t read a sentence well (this also mixes up the speech a bit, which I think is good). Google’s synthesizer will be installed automatically when you install the Google TTS addon. If you are using an i-device (iPad, iPhone etc) you should be able to use Kyoko too, though I am not certain about this (please let me know in the comments if you find out).

Kyoko speaks well and naturally for the most part, and actually knows the difference in tone between many Japanese homophones. For example if you type 奇怪 kikai (strange, mysterious) and 機械 kikai (machine), Kyoko will pronounce them each with the correct syllable raised, which is what differentiates them in spoken Japanese.

If you end a sentence with a ? Kyoko will raise her tone into a question intonation very naturally. The Google synthesizer does not do this, neither is it aware of tone differences between homophones. On the other hand, type a ! and Kyoko ends the sentence with a funny noise, and she makes far more kanji errors than Google. Neither of these problems really matters (just avoid ! and re-spell mispronounced kanji in kana).

The Apple synthesizer is considerably ahead of Google’s alternative and yet is in some minor respects surprisingly unpolished. But if you have a device that supports her you should definitely use her.

So there you have the technical aspects of the Dolly Kikitori sentences method. If you have any questions or want to share your experiences, please use the comments section below. For the method itself, please go here.

Kikitori – the Dolly Sentences Japanese Listening Method

kikitori-japanese-listeningI was a little hesitant in writing about my kikitori Japanese listening sentences method, because it may be somewhat idiosyncratic. However, it is working well and friends have taken some interest, so I’ll go ahead.

I have read about the 10,000 sentences method of Japanese learning which is recommended by some immersion-inclined sites. Frankly, I could never fully understand it — but then I am just a doll. I fiddled with it for some time and never really got to grips with it.

I did, however, like the idea of learning Japanese in sentences rather than just words. After all, that is how children learn language, and it gives one the feel of what just “sounds right”, rather than merely knowing grammar rules. I am by no means saying one shouldn’t know grammar rules — often one needs to — but I have always argued that grammar is a quick-and-dirty shortcut by which adult learners half-learn a language from the outside rather than actually knowing inside what feels right. Shortcuts can be good. They can even be necessary. But you don’t know a language till you can feel it. You only know about it.

My new assault on the sentences method came about partly as a result of my looking for new ways to improve my kikitori — Japanese listening. I started turning the sentences into digitized speech and putting them in Anki. I would review with my eyes shut and only count myself correct if I got the sentence first time without looking. If I couldn’t, I would give it a second hearing, and if I still couldn’t get it I would open my eyes to see the Japanese text. Only as a last resort do I turn the card over to see the furigana. This rarely happens (after all I can both see and hear the text if I need to open my eyes). The very last resort is to scroll down to where I have (sometimes — when I think I might need it) hidden a translation. This I try not to use and rarely do.

There is a second phase to this method, and that is putting the sentences on an MP3 player. I then play them on a random loop in any spare times (when cooking or walking, for example, and often when resting).  I use this a lot, which means I get a lot of exposure to the sentences.

I put a three-second gap between sentences. This is the most my player allows (annoyingly, I don’t think iPods have a means of doing this at all). This gives me a little time to think about a sentence after hearing it, and I think this is important. It is true that in the wild you don’t get any thinking time. But if you are at the stage when you can’t catch much in the wild (in anime or regular non-foreigner-directed conversation), this is what you need in order to get there.

In those three seconds you do certain things and one of them in particular is, I believe, fundamentally important to kikitori or hearing Japanese (or any other language). You correct what you hear. In our familiar language I believe we do this all the time. We hear the word “bubble”, realize that doesn’t fit the sentence we are hearing, and correct it to “double”. We hear the word “wise” and correct it to “wives” (or if we actually don’t understand the context, we don’t — which is why so many people make the blooper “old wise tales” in writing — Google finds over four thousand instances of “old wise tales”).

This common slip (and many others) underlines my point. It really isn’t easy, even in one’s native language, to tell “wise” from “wives”. Ninety-nine percent of the time we understand what the sentence should be and correct mishearings so fast we don’t even know we’ve done it. It is one of the key subliminal skills that makes kikitori — in any language — possible.

With a three-second gap between sentences, we are able to perform this correction-hearing in slow-ish motion, which, at this stage, we need to.

Now, as I have pointed out before, language consists to a very large extent of set phrases and collocations. Words go together in the same groups most of the time. That is a large part of the reason that kikitori is actually possible in any language.

Hearing sentences and auto-correcting (in slow motion at first) lets one go through the same process a child goes through. She hears words together. At first she mispronounces them, and even when she knows what a common word-group means she may not fully understand what the component words are. Slowly it all starts to make sense.

During sentence-listening we think we hear “kaishite”, for example, and realize it must be “taishite”. We also start to get the feel for the fact that in hundreds of similarly-constructed sentences we will hear in our Japanese-language life “kaishite” is actually going to be “taishite”. After a while it won’t even matter, because just as in reading we don’t need to see (and don’t, as studies have shown, even look at) all the letters, so in kikitori we don’t need to hear all the sounds. We get the pattern and fill in or auto-correct the gaps. If we don’t know or fully understand a phrase (such as “old wives’ tales”), even in our first language, we can’t auto-correct and we may go through life hearing it wrongly, as many people in fact do.

The vital point to grasp here is that while our natural, “naive” view of native-language kikitori is that we hear correctly and therefore understand, to a large extent the reverse is true: we understand and therefore hear correctly. Of course both are going on at once, and it is the interplay of the two that makes language-understanding possible.

With one phrase (like old wives’ tales) mishearing doesn’t matter very much. In fact we end up knowing what the phrase means even while consistently mishearing it. But when, in Japanese, we are faced with dozens of phrases that we can’t auto-correct, or can’t auto-correct quickly enough to keep up, then we can’t understand what is being said.

So the three seconds between sentences gives us a kind of middle ground. Doing the sentences in Anki we can ponder the sound at our own pace. In the wild we have almost no time. On the MP3 player we have three seconds to auto-correct anything in the sentence that needs it as well as to muse on the grammar, realize, perhaps on the 30th hearing, “ah, so that‘s why…” and so on.

These are all things a child does. Those of us who grow up continually pondering the ins and outs of language are probably more childish than odd. Children have to do a lot of that for the first several years. Some of us just never stop.

There are many important and interlocking benefits to this sound-sentence method. When we learn vocabulary via Anki, we only know the definitions of words — not how they are actually used. Now when I am going through my vocab Anki I am continually stopping with “that one needs a few sentences”. Once I have become familiar with several sentences using the word, I am much clearer on its range of uses and its nuances. I am also much less likely to forget the word.

In doing all this we are going through the process a child goes through. We are learning how words fit together, what they imply, what their near neighbors are likely to be in a sentence. We are also building up a fund of examples in our mind which we will use, sometimes consciously, but often — and this is where language starts to become natural — unconsciously, to compare with new sentences and new uses of the same words in different contexts. You build up your feel for the language. You start to hear what “just sounds right” without necessarily knowing why.

Surprisingly, the hearing sentences can even help with kanji, since one will sometimes in the Three Seconds think「あぁ。それは緊張の緊ですね」(“Ah, that’s the 緊 kin of kinchou, isn’t it”). Because that is part of how Japanese words fit together and mean what they mean.

Currently I am at 1,600 sentences using this method and I am finding it extremely useful, not only for kikitori but for every aspect of Japanese.

The “throw ’em in at the deep end” school may complain at the three seconds recognition-time, but I am not suggesting  that sentences should be our only listening practice. Full-speed native Japanese materials should definitely be used. But using this method, I think you will find that your ability to process that full-speed Japanese progresses a lot more rapidly.

How do I get spoken sentences in Anki? How do they actually sound? How do I get them to my MP3 player? Find all the answers in our sister article on the technical tricks of the Japanese-listening sentences method. It’s easier than you think – even a doll can do it!