• dapangzi 4 hours ago

    Longtime lurker, made an account specifically to give feedback here as an intermediate speaker. :)

    This is a great initiative and I hope to see more come out of this; I am not criticizing, but just want to provide my user experience here so you have data points.

    In short, my experience lines up with your native speakers.

    I found that it loses track of the phonemes when speaking quickly, and tones don't seem to line up when speaking at normal conversational speed.

    For example, if I say 他是我的朋友 at normal conversational speed, it will assign `de` to 我, sometimes it interprets that I didn't have the retroflexive in `shi` and renders it `si`. Listened back to make sure I said everything, the phonemes are there in the recording, but the UI displays the wrong phonemes and tones.

    By contrast, if I speak slowly and really push each tone, the phonemes and tones all register correctly.

    Also, is this taking into account tone transformation? Example, third tones (bottom out tone) tend to smoosh into a second tone (rising) when multiple third tones are spoken in a row. Sometimes the first tone influences the next tone slightly, etc.

    Again, great initiative, but I think it needs a way to deal with speech that is conversationally spoken and maybe even slurred a bit due to the nature of conversational level speech.

    • mercanlIl 2 hours ago

      The tool definitely needs to address tone transformations, it’s a big part of how the language is spoken. Otherwise it’s mostly useful for a first year student speaking in isolation.

      Hoping to see improvements in this area

      • sqs 2 hours ago

        I don't think it takes care of tone transformation (eg 他是 ni3shi4 -> ni2shi4). Or if it does, my tones are just off. But it's a really cool idea!

        • jhanschoo 38 minutes ago

          The tone sandhi example you just gave looks incorrect to me

        • tifan 2 hours ago

          I had the same issue! Perhaps being another dapangzi is the problem here lol

          • et-al an hour ago

            I'm not familiar with this slang: what's a big plate?

            • dirteater_ an hour ago

              the commenter's username (i'm guessing they mean 大胖子, feel free to google translate)

        • olalonde 12 minutes ago

          Feedback: it might be a mic issue but my wife, who is a native speaker, seems to get most characters wrong according to the app. I will try again later in a quieter environment see if that helps.

          • ecshafer 4 hours ago

            Anyone that is a native European language speaker that hasn't tried to learn Chinese or some other tonal language, its really hard to understand how hard it is. The tones can really be very subtle, and your ear is not fine tuned to them. So you think you are saying it right, but native speakers have no idea what you are saying.

            • vjvjvjvjghv 2 hours ago

              Agree. It’s really hard. It also explains why a lot of people born in China tend to make serious pronunciation errors when speaking English or German. They are used to focus on different things than us westerners.

              It took me very long time to really understand how impersonating tone is in Chinese.

              • laurieg 3 hours ago

                For someone who hasn't grown up speaking an language with tones or pitches, the process of learning them can be maddening. I applaud anyone who makes tools like this to try to make the process easier.

                My experience in learning Japanese pitch accent was eye-opening. At the start, I couldn't hear any difference. On quizzes I essentially scored the same as random guessing.

                The first thing that helped me a lot was noticing how there were things in my native language (English) that used pitch information. For example, "uh-oh" has a high-low pitch. If you say it wrong it sounds very strange. "Uh-huh" to show understanding goes low-high. Again, if you reverse it it sounds unusual.

                The next part was just doing lots of practice with minimal pairs. Each time I would listen and try my best to work out where the pitch changed. This took quite a lot of time. I feel like massed practice (many hours in a day) helped me more than trying to do 10 minutes regularly. Try to hear them correctly, but don't try too hard. I didn't have any luck with trying harder to 'understand' what was going on. I liken it to trying to learn to see a new color. There isn't much conscious thought.

                The final piece of the puzzle was learning phrases, not individual words, that had pitch changes. For example: "yudetamago" could be boiled egg or boiled grandchildren. Somehow my brain just had a much easier time latching on to multi-word phrases instead of single words. Listening to kaki (persimmon) vs kaki (oyster) again and again seemed much harder.

                Of course, your mileage may vary with these techniques. I already spoke decent Japanese when I started doing this.

                • ronyeh 12 minutes ago

                  > For example, "uh-oh" has a high-low pitch. If you say it wrong it sounds very strange. "Uh-huh" to show understanding goes low-high. Again, if you reverse it it sounds unusual.

                  Wow… Thanks for making it clear that English also has tones! I hadn’t thought of it this way before. “Uh-huh” sounds similar to Mandarin tones 3 & 2. “Uh-oh” is similar to Cantonese tones 1 & 3.

                  I’m wondering if we can find good examples to teach the Mandarin tones. I think two or three syllable words are best because it illustrates the contour of the tones.

                • danparsonson 3 hours ago

                  Wholeheartedly (or maybe downheartedly?) agree with this - sometimes I try to say the simplest things and people just stare at me like I'm speaking Martian. Which I suppose I might as well be! One of my big problems is implicit use of tones for things like expressing uncertainty; that's a very difficult habit to get out of.

                  • bunderbunder an hour ago

                    Another one that I wish I had realized sooner is that, contrary to the impression teachers tend to convey, tones aren’t just a pitch contour thing. There are also intensity and cadence elements. Native speakers can fairly accurately recognize tones in recordings that have had all the pitch contour autotuned out.

                  • cyberax 4 hours ago

                    I'm a native Russian speaker, and I decided to learn Mandarin, because it's linguistically almost the opposite of Russian.

                    I had no problems with tone pronunciation, but tone recognition was indeed much trickier. I still often get lost when listening to fast speech although I can follow formal speech (news) usually without problems.

                    • barrell 18 minutes ago

                      I recently started learning a tonal language, and so far have not struggled too much wrt tones when everything is slow. There was an original strangeness and refusal for my vocal cords to want to work that way, but probably only for the first month or so.

                      At least, this is the case for slow text. Once the text is sped up it’s amazing how my brain just stops processing that information. Both listening and speaking.

                      I’m sure this will come with practice and time but for now I find it fascinating

                    • dionian 3 hours ago

                      its critical because without proper tonal enunciation the words can be ambiguous.

                    • bunderbunder 2 hours ago

                      This is very cool, but from one Mandarin learner to another I’d caution against relying too heavily on any external feedback mechanism for improving your pronunciation.

                      If you can’t easily hear your pronunciation mistakes so clearly it hurts, consider putting more energy into training your ear. Adult language learners usually have brains that have become resistant to, but not incapable of, changing the parts of the brain responsible for phoneme recognition. The neuroplasticity is still there but it needs some nudging with focused exercises that make it clear to your brain exactly what the problem is. Minimal pair recognition drills, for example, are a great place to start.

                      It’s not the most fun task, but it’s worth it. You will tighten the pronunciation practice feedback loop much more than is possible with external feedback, so a better accent is the most obvious benefit. But beyond that, it will make a night and day difference for your listening comprehension. And that will get you access to more interesting learning materials sooner. Which hopefully increases your enjoyment and hence your time on task. Plus, more accurate and automatic phoneme recognition leaves more neurological resources free for processing other aspects of your input materials. So it may even help speed things like vocabulary and grammar acquisition.

                      • zdc1 an hour ago

                        I completely agree with this. There's a certain confidence you get when you can hear a word you don't know, but can still comprehend it well enough to know what pinyin to type into your dictionary app. Mandarin Blueprint has a nice pinyin pronunciation video on YouTube that I worked through a while ago, and then followed with a few weeks of immersion in Taiwan, I was able to really pick out what people were saying.

                        I feel like listening is the key to speaking. You don't necessarily need to rote learn the tones for each word. You just need say words as you hear them spoken by others.

                      • vunderba 5 hours ago

                        When I was living in Taiwan, one of the ways I forced myself to remember to pronounce the tones distinctly was by waving my hand in front of me, tracing the arc of each character’s tone.

                        It helped a lot even if I did look like an insane expat conducting an invisible orchestra.

                        One more thing: there's quite a bit of variation in how regional accents in the mainland can affect tonal pronunciation. It might be worth reaching to some native speakers to give you some baseline figures.

                        • zdragnar 4 hours ago

                          In a university Mandarin class, one of the adult students (i.e. probably 40 or so) WAY over exaggerated his tones, to the point that the little old lady teaching us laughed out loud after one of his answers.

                          A few years later, he had the most clean and consistent pronunciation out of anyone I'd been in a class with, and easily switched between the Beijing and other accents depending on which teacher we had on any given day.

                          I rather regret not emulating him, even though I haven't really used it for nearly 20 years and have forgotten most of it.

                          • ecshafer 4 hours ago

                            From a language learning standpoint that does make sense. Over-exageration while you are learning to help cement the idea, and then when you are speaking more naturally you will fall back into a regular kind of tone.

                            • luckydata 3 hours ago

                              that's EXACTLY how I taught myself to speak with a Spanish accent from Madrid. I repeated the way tv celebrities and the speakers on the metro announced the stations, and it gave me a base for how to use my mouth and throat appropriately. After a while I was able to tone it down and my accent got so good that locals couldn't tell I wasn't spanish - I had this cool party trick pulling out my id and showing them I was truly a foreigner!

                            • sowbug 2 hours ago
                              • simedw 5 hours ago

                                For accents, I’ve mostly tested with a few friends so far. I’m wondering whether region should be a parameter, because training on all dialects might make the system too lax.

                                • devin 4 hours ago

                                  This sounds like how solfeg training works. You use a hand signal to indicate a specific tone: do re mi fa so la ti

                                  • cyberax 4 hours ago

                                    Hand motions help! Especially when you want to memorize new words, because initially you need to treat tone as something additional to remember.

                                    I used simple index finger motions to mark tones.

                                  • frozennothing an hour ago

                                    This is really cool. Thank you for sharing. Before now I had not sought to understand how this technology works under the hood, but seeing it done at this scale made me curious to see if I could do something similar.

                                    • tifan 2 hours ago

                                      Well, it would work only when I speak word by word, not as a sentence or in a normal speed for daily conversations. The model thinks I was making mistakes when I speak casually (as a native Chinese speaker, I had Mandarin 2A certification, which is required for teachers or other occupations that requires a very high degree of Mandarin accuracy). You wouldn’t really notice it but language pronunciations is very different between causal and formal speech…

                                      • memalign 2 hours ago

                                        I wish this had a pinyin mode…! I am learning to speak Mandarin but I am not learning to read/write.

                                        ( I’m learning using a flashcards web app I made and continue to update with vocab I encounter or need: https://memalign.github.io/m/mandarin/cards/index.html )

                                        • data_ders 2 hours ago

                                          same! but if you get it inevitably wrong the first time it gives you the pinyin. but i struggled to get it to transcribe the consonants I was making let alone the tones. i'm pretty sure i'm not as bad as that!

                                        • rahimnathwani 5 hours ago

                                          This is incredible. When I was first learning Chinese (casually, ~20 years ago), my teacher used some Windows software that drew a diagram of the shape of my pronunciation, so she could illustrate what I was getting wrong in some objective way.

                                          The thing you've built is so good, and I would have loved to have it when I was learning Mandarin.

                                          I tried it with a couple of sentences and it did a good job of identifying which tones were off.

                                          • cocoa19 an hour ago

                                            Have you tried the Azure Speech Studio? I wonder how your custom model compares to this solution.

                                            I played around with python scripts for the same purpose. The AI gives feedback that can be transformed to a percentage of correctness. One annoyance is that for Mandarin, the percentage is calculated at the character level, whereas with English, it gives you a more granular score at the phoneme level.

                                            • dirteater_ 42 minutes ago

                                              IMO the SotA for this is https://www.speechsuper.com/. Amazon suffers for similar

                                              > One annoyance is that for Mandarin, the percentage is calculated at the character level, whereas with English, it gives you a more granular score at the phoneme level.

                                              This is the case for most solutions you'd find for this task. Probably because of the 1 character -> 1 syllable property. It's pretty straightforward to split the detected pinyin into initial+final and build a score from that though.

                                            • rablackburn 2 hours ago

                                              > And if there’s one thing we’ve learned over the last decade, it’s the bitter lesson: when you have enough data and compute, learned representations usually beat carefully hand-tuned systems.

                                              There are still holdouts!

                                              Come back to me in a couple of decades when the trove of humanity's data has been pored over and drifted further out of sync with (verifiable) reality.

                                              Hand-tuning is the only way to make progress when you've hit a domain's limits. Go deep and have fun.

                                              • stuxnet79 3 hours ago

                                                How difficult would it be to adapt this to Cantonese? It is a surprisingly difficult language to learn. It has more tones than Mandarin plus comparatively less access to learning resources (in my experience)

                                                • ChadNauseam 3 hours ago

                                                  This is amazing. I'm also working on free language learning tech. (I have some SOTA NLP models on huggingface and a free app.) I have some SOTA NLP models on huggingface and a free app. My most recent research is a list of every phrase [0].

                                                  Pronunciation correction is an insanely underdeveloped field. Hit me up via email/twitter/discord (my bio) if you're interested in collabing.

                                                  [0]: https://gist.github.com/anchpop/acbfb6599ce8c273cc89c7d1bb36...

                                                  • affogarty 4 hours ago

                                                    This is extremely cool, although I asked my wife (who is Chinese) to try it out and it said she made some mistakes.

                                                    • hawflakes an hour ago

                                                      I tried it out and it has some issues with my native speech. I grew up with more Taiwan mandarin but I know the Beijing standard and the recognizer was flagging some of my utterances incorrectly.

                                                    • SequoiaHope 3 hours ago

                                                      Amazingly I just did the same thing! Only with AISHELL. It needs work. I used the encoder from the Meta MMS model.

                                                      https://github.com/sequoia-hope/mandarin-practice

                                                      • baby 3 hours ago

                                                        For people trying to say the "j" sound correctly, as in "jiu" (old), just say "dz", so in that example "dziu"

                                                        • byb 3 hours ago

                                                          Neat. A personal tone trainer. Seriously, shut up and take my money now. Of course, it needs a vocabulary trainer, and zhuyin/traditional character support.

                                                          • jrockway 3 hours ago

                                                            Interesting application! A friend of mine built a model like this to help her make her voice more feminine, and it is neat to see a similar use case here.

                                                            • bytesandbits 4 hours ago

                                                              great work! I am going to try it out. Currently about to learn some Mandarin to be able to talk with hawker stand owners for a trip I am doing soon. I am trilingual and can speak a few languages on top of that, but none of them tonal. I am new to tonal languages and I find myself struggling with this... a lot!

                                                              • anonzzzies 4 hours ago

                                                                goof luck! I speak 6 languages fluent but none of them tonal and I find mandarin very challenging; it does not help that people in places where you might need it are not very forgiving; asking for green fork in a tea shop has people very bewildered.

                                                              • nirvanatikku 4 hours ago

                                                                talk about 30 seconds to wow. great app, UX and demo. would love to use this. kudos.

                                                                • jellojello 5 hours ago

                                                                  This is amazing, if you feel like opening an entire language to being learned more easily.. Farsi is a VERY overlooked language, my wife/her family speak it but it's so difficult finding great language lessons (it's also called Persian/Dari)

                                                                  • simedw 5 hours ago

                                                                    Thank you.

                                                                    I had a quick look at Farsi datasets, and there seem to be a few options. That said, written Farsi doesn’t include short vowels… so can you derive pronunciation from the text using rules?

                                                                    • kranner 4 hours ago

                                                                      > written Farsi doesn’t include short vowels… so can you derive pronunciation from the text using rules?

                                                                      You can't, but Farsi dictionaries list the missing short vowels/diacritics/"eraab" for every word.

                                                                      For instance, see this entry: https://vajehyab.com/dehkhoda/%D8%AD%D8%B3%D8%A7%D8%A8?q=%D8...

                                                                      With the short vowel on the first letter it would be written حِساب (normally written as just حساب)

                                                                      The dictionary entry linked shows that there is a ِ on the first letter ح

                                                                      But you would have to disambiguate between homographs that differ only in the eraab.

                                                                  • cmuguythrow 4 hours ago

                                                                    Awesome idea!

                                                                    • dionian 3 hours ago

                                                                      it heard wu2 but i heard wo2 from you fine. and it should sound like wo2 not wo3 if spoken quickly. not a native speaker though so i could be wrong

                                                                      • btrlsnqtn 4 hours ago

                                                                        The article mentions the bitter lesson. I'm confused about the status of Sutton's opinion of the bitter lesson. On the one hand, he invented the concept. On the other hand, he appears to be saying that LLMs are not the correct approach to artificial intelligence, which to a naive outsider looks like a contradiction. What gives?

                                                                        • drekipus 5 hours ago

                                                                          instantly awesome.

                                                                          I suck at chinese but I want to get better and I'm too embarassed to try and talk with real people and practise.

                                                                          This is a great compromise. even just practising for a few minutes I already feel way more confident based on its feedback, and I feel like I know more about the details of pronunciation.

                                                                          I'm worried this might get too big and start sucking like everything else.

                                                                          • iamanllm 2 hours ago

                                                                            holy crap, I was literally imaging how I wanted something exactly like this yesterday! you are a hero!