Oh wow! I’m surprised to see someone post my analysis haha
Happy to answer any questions here. I kept my analysis really high level for a general audience but since this is HN, we can get a bit nerdy :D
What makes comprehensible input comprehensible? Is that a trick question?
Avoiding unknown vocabulary, or including just a small amount that can be inferred from context; avoiding rare grammatical rules; avoiding stuffing too many clauses into sentences, keeping them short.
Just like a language has a large vocabulary of words of which only a subset is common, a similar observation holds for the grammar rules. Some are used only in very formal/erudite speech or writing. Also, just like your active vocab is not as large as the vocab you understand, the same goes for grammar: you don't wield as many constructs as you grow.
Semantically, avoiding obscure cultural references, culturally rooted unstraightforward metaphors, figures of speech or idioms.
Avoiding difficult topics. E.g. "I have a pen" vs. explaining Karl Popper's logical positivism.
It's much easier to acquire the "household" dialect of a language than to be able to understand news about politics, scientific papers, or literary essays.
> Word length - At least in English and French (the languages I know best), longer words are generally considered harder.
I think in a language with a lot of similar sounds or even homophones, longer words are easier. For a beginner Chinese speaker that knows both words, hearing "chē" will probably be ambiguous, but "chūzūchē" will be parsed immediately.
This is mainly resolved by context. "Penultimate" is a harder word than "pen". Now that could also mean "penitentiary" in North American vernacular, or a box in which a pig is kept, but not in a sentence like "Can I borrow your pen?"
That’s a good point.
I don’t think the ‘longer equals harder’ pattern holds for every language. I actually reached out to the head teacher at CIJ when I first made this analysis and she said the same.
I don't think this captures the whole situation. Much of what makes comprehensible input comprehensible, at lower levels, is presence of visual hints.
That's exactly right.
Much of the beginner videos make use of visual hints like you say (images, props, etc), and none of these were taken into account in my analysis.
I do think it could be cool to do a 'visual' analysis of CI in the future where you attempt to measure how much context is present (or not) in each video and see what insights you could draw from that.
I love this. I made a totally free, just for fun, tool based around learning Japanese via Youtube using the CI approach. https://seikai.tv The trick is finding content that is at the right level but that you also find interesting. Great article, thank you!
Thanks for the kind words!
Here's the source code for this analysis to those interested: https://github.com/joshdavham/cij-analysis
I will note that the transcripts (and parsing scripts) are not included in the repo. The transcripts are not my intellectual property so I can't share it (and the parsing scripts are a bit of a dumpster fire).