Analysis of Harry Potter in German
This is a minor analysis of German - Harry Potter & The Philosophers Stone, for language learners.
A common technique for language learners is mass exposure, where you immerse yourself in all mediums of a language. The most accessible and perhaps the most beneficial when beginning a language is reading. Ebook readers make books from all languages very accessible. And have the added benefit of onboard dictionaries, where translations are a finger gesture away. I've found it a helpful technique myself. And believe its popular amongst polyglots out there (at least the one or two I follow). And after breaking it down it starts to become clear why this is an effective learning technique.
How soon can I start reading in a target language?
Sooner than you may think. In Harry Potter und der Stein der Weisen 100 words cover 50% of the content! Which is astounding given there are 82,390 words used to describe the story in German. Of that 82,390 there are 7364 unique words used. Which at first glance is an intimidating level of vocabulary to acquire in any language. But knowing that 100 words cover half the content it starts to appear quite accessible.
Knowing 400 words gets you through 70% of the content, and 2000 words 90%. From this point on you have to acquire many, many more words to increase your comprehension level. Although when you're at the level of understanding 2000 words or 90% of the content, word acquisition becomes less of an effort. Because as your understanding of the context expands so does your ability to understand new words.
Comprehension relative to known words
In the above chart you can see that basic comprehension is acquired rapidly while also trailing off rapidly at around 1000 words mark.
100 words make up more than 50% of the content.
3727 words appear only once, and make up less than 5% of the content.
See most of the words used are called hapaxes (3727 in our example). A hapax being a word that appears only once in an authors work. The entry level to basic reading comprehension is quite low. Now in this context most of the top 100 words are articles, names or linking words. In most cases when analysing natural language you would strip these types of words out ie. The,They,Them,It,A,He,She. But here they have been left in as these are all relevant when learning.
Frequency Distribution of Hogwarts School Houses
Also included are character names, which if you're familiar with the story are an instant win. With names mentioned in the top 300 words covering 3% of the content. And in the above chart you can see it takes a while before school houses being mentioned. The first being Slytherin and Hufflepuff, I believe when Malfoy mentions them on The Hogwarts Express. These are also free wins when it comes to reading, helping us derive meaning from ambiguous content.
Reading gives you added context which makes it easier to derive the implied meaning. Even if you don't fully understand the word/s it is often possible to grasp an idea of the overall scene. By doing this it helps solidify known words and exposes us to new words helping to expand that vocabulary and in turn comprehension level. Apart from literal translations like days of the week, numbers, seasons most words are metaphorical (as Stephen Fry points out in Fry's English Delight). Reading in context helps paint the picture of these metaphors. Where flash cards or other short phrased learning programs don't impart the greater context. While Flash cards and getting the basics down are a good place to start. Reading is the perfect place to level up once you have some basics and perhaps aren't comfortable speaking yet.
Key word density of top 1000 words
Top 100 words in Harry Potter und der Stein der Weisen
- und
- 2604
- ein
- 1955
- er
- 1824
- der
- 1734
- die
- 1718
- harry
- 1422
- sie
- 1376
- den
- 1061
- nicht
- 1058
- zu
- 1027
- in
- 993
- war
- 922
- sich
- 905
- ich
- 870
- es
- 845
- auf
- 843
- das
- 840
- sein
- 730
- sagt
- 728
- mit
- 693
- von
- 685
- hatt
- 606
- ihn
- 581
- ist
- 522
- du
- 487
- dem
- 477
- dass
- 477
- wie
- 467
- an
- 449
- als
- 448
- ihr
- 430
- ron
- 425
- was
- 416
- noch
- 394
- hab
- 372
- hagrid
- 366
- doch
- 363
- aus
- 349
- um
- 348
- so
- 326
- ihm
- 320
- all
- 319
- potter
- 314
- seit
- 296
- wir
- 292
- wenn
- 290
- im
- 282
- stein
- 275
- hat
- 272
- hermin
- 272
- dies
- 253
- vor
- 241
- konnt
- 231
- kein
- 224
- weis
- 221
- üb
- 219
- aber
- 217
- für
- 214
- nur
- 209
- nach
- 209
- sah
- 201
- dann
- 201
- schon
- 190
- etwas
- 189
- ganz
- 187
- professor
- 179
- auch
- 176
- snap
- 175
- würd
- 173
- and
- 167
- imm
- 164
- da
- 164
- mal
- 161
- dumbledor
- 161
- gross
- 155
- wied
- 155
- mein
- 153
- mir
- 152
- durch
- 150
- am
- 149
- nun
- 148
- gut
- 147
- ja
- 146
- mehr
- 143
- hätt
- 143
- uns
- 143
- mich
- 142
- hier
- 141
- dir
- 127
- jetzt
- 124
- zum
- 123
- dich
- 121
- onkel
- 120
- dudley
- 119
- sind
- 119
- klein
- 118
- aug
- 118
- vernon
- 117
- ging
- 116
- dein
- 114