Wiktionary has dialect maps for common Chinese vocabulary that showcases the differences in terminology across various regions of Chinese, rather than their similarities. Example: sleep -> https://en.wiktionary.org/wiki/Template:zh-dial-map/%E7%9D%A... , hide-and-seek -> https://en.wiktionary.org/wiki/Template:zh-dial-map/%E6%8D%8...
p.s. I'm saying this because most of these terms that has a dial-map are common in daily conversation. The differences in written Chinese vocabulary aren't as significant; how scientific and technical terms are expressed is largely determined by your administrative region.
Ukrainian and russian words often use the same letters but are pronounced very differently due to distinct phonetics. On the other hand, some Polish and Czech words sound the same or very similar to Ukrainian but look quite different because of their different alphabets. Therefore, phonetic transcription would be a valuable improvement.
I can mostly speak for German. It seems to mix them all into one general language. But there are a lot of local differences between north and south of Germany, Switzerland and Austria. And it’s not just dialect, but really different words that might not be understood everywhere. If you look at the english part it has at least three different words. Similar in Spanish.
My best guess:
- Swiss German and Austrian German didn't make the cut because Switzerland and Austria are on good terms with Germany and don't mind if we call their languages a dialect of German. Not only is that justification to exclude them, they are also not in Google translate for this reason (which this map uses)
- Luxembourg did mind and went to great lengths to get their German dialect recognized as a separate language, is in Google translate, but Wikipedia lists them as only 300k speakers
- Frisian is seen as a distinct language because of how different it is, is in Google translate, but has about 200k speakers
- Similarly, Scottish Garlic is in Google translate has only 70k-200k speakers
The map is consistent if you set the goal of only considering languages that are in Google translate and have at least 500k speakers.
I do think these rules detract from the map. Frisian and Luxembourgish are interesting as "in-between" languages (Luxemburgish has a lot of French influence, Frisian is closer related to English). And Swiss German has many distinct words that are very different from their German counterparts, so for the purposes of this map it really should be a language.
I think for ‘Scottish Garlic’ you meant ‘Scottish Gaelic’…
IIUC, the Swiss German can't make a cut as there's no standard written form (and with it, not much resources), and the variations between the cities are pretty significant.
There really isn't a single "Swiss German" dialect. It is rather a family of dialects, and this family is again part of the larger family of "Alemannic German" dialects, which are spoken in most of southwestern Germany, Switzerland and western parts of Austria [0]. It is really very hard to clearly demarcate "Swiss German" from dialects spoken for example in the Black Forest, around the city of Freiburg im Breisgau, in Vorarlberg or even (historically) in Alsace. My own dialect is Swabian (also Alemannic), and I never had trouble understanding the local dialects around Basel, Berne or Zurich. It is easier for me to understand these Swiss German dialects than, for example, Bavarian dialects.
[0] https://en.wikipedia.org/wiki/Alemannic_German#/media/File:A...
> And it’s not just dialect, but really different words that might not be understood everywhere. If you look at the english part it has at least three different words. Similar in Spanish.
I think you cannot really compare the minuscule differences between "Standard German", "Austrian Standard German", and "Swiss Standard German" to the differences between English, Irish and Welsh, which are not even from the same language family. Also, the tool is based on Google Translate, and AFAIK Google Translate doesn't differentiate between them.
Comparing the tool to this map [0], it seems to do a pretty good job in capturing all major languages in Europe, while ignoring their dialects.
But I agree that I would be great if you could zoom into the map and also show differences in local dialects. ChatGPT seems to be pretty good at translating to different variants of standard German, or German dialects [1]
[0] https://en.wikipedia.org/wiki/Languages_of_Europe#/media/Fil...
[1] https://chatgpt.com/share/67bba4db-9458-800c-b5f8-fd3fa196d4...
Same for Belgium, Google/Apple translate has never been able to correctly translate French and Dutch for us while our vocabulary choices are drastically different from neighbouring France and Netherlands.
For an example take a look at this map of the different words used in German for "meatballs":
You immediately see the difference (or similarly) of languages when using words that are very old, such as "iron", or "stone", which are words that have existed from the origins of that language.
Also "cow". And "sun", "mama" and "papa" seem to transcended most European languages.
For some words, their identity is not obvious when you do not know the rules for the changes of sounds between the Indo-European language subfamilies.
For instance "cow" and "Kuh" come from the same word as "boeuf" and "buey" (also despite the gender difference).
salt, tea, ..
one can follow migrations.. and criss-crosses..
btw, "orange" as color in Bulgarian is still "orange" (оранжев/а/о/и), but "orange" as fruit is портокал ("portokal") - so that's tricky..
"oranges" seems more correct, vs "orange color" maybe
Salt and tea are good examples for the 2 reasons that can be the cause for finding the same word in many languages.
Salt is an ancient Indo-European word that was already in use several millennia ago, so it has been inherited in most Indo-European languages.
Tea is a relatively recent borrowing in the European languages, which has spread from one language to another, with a few pronunciation variants, across all Europe, regardless of the genetic relationships between languages.
Poland took issue with the story that worldwide there are only two words for tee, and which one you use depends on whether you got introduced to tea via sea or via land
Mama and papa is a whole other phenomenon.
Arabic: mama babi. Mandarin: mama baba. Swahili: mama baba. Inuktitut: anaana ataata. English: mama papa. Tamil: amma appa.
These languages are not known to be related.
The first vowel sound a child makes is approximately "a" and the first consonant they form tends to be a nasal plosive "mba mba mba" and the second distinct sound tends to be a dental or labial plosive "pa ta pa ta". And the first thing a baby says is "mommy" of course and the second thing a baby says is "daddy" of course. So mama is mommy and papa or tata is daddy. That's the usual explanation, anyway.
French is the only language where Company and Society are the same word: société. It's fun to watch
This is very cool. Also, it seems like Romanian is the only language where the word for turtle translates literally to "shelled frog".
German’s Schildkröte, “shield-toad”, is quite similar.
Sköldpadda - Swedish too
There are examples from five language families shown here: Indo-European, Basque, Uralic, Turkic, and Afro-Asiatic.
The words for bridge split neatly into language subfamilies. The only exception appears to be Welsh.
Perhaps of interest, the translation guesses things like age relations and genders. This is accurate when the word has that same meaning in English, like nun and monk have a gender, but e.g. the word hairdresser in English is translated to a specific gender in German and Dutch even though the original didn't have one. Similarly for diminutives: "brother" (broer) usually means "big brother" in Dutch because for a younger brother you'd add a diminutive suffix. It's hard to define, but maybe: words that are not synonyms, yet all translate back to your input. (The reverse also exists, of course: insulation and isolation isn't differentiated in Dutch)
The map would be more complete with this information because it may be very similar or completely different and can be interesting to compare, for example:
- EN: receptionist for both, NL: receptionist and receptioniste, DE: rezeptionist and empfangsdame. The map currently just shows the female version for German, without indication that they also use a transliteration of the English.
- EN: little brother, NL: broertje (the submission shows a doubled up version of kleine broertje), DE: kleiner Bruder. Although German has the diminutive suffix to make Brüderchen, they don't use it the way that we do, which I find interesting to see.
Google Translate's API can output multiple options, <https://cloud.google.com/translate/docs/reference/rest/v3bet...>, and Google's own website seems to indeed provide these different variants, but there is no label to say what the different array entries mean the way that Google's own website shows
I got curious which gender it guesses that you might mean. It seems to assume a male unless it's also very heavily female-connotated in English. In Dutch and German, it outputs male for hairdresser and doctor, female for nurse and receptionist (German translations mean "sick-sister" and "reception lady", respectively), and mixed for secretary (female in Dutch, male in German) because Dutch doesn't have a male word for it anymore (only workarounds)
Love that the numbers in Catalan are represented as numerals, not as words.
EDIT: playing with it, it's a bit sad that large numbers do not work at all (in any language); and that not all common forms of a word are shown. For example, I tried to see how "ninety six" is said in french in France, Belgium and Switzerland, but it does not work.
As a French, I always found that the way Wallons or Swiss word out numbers >69 makes way more sense than ours
Growing in Belgium, we learned that our Walloon brethren use septante (70), quatre-vingts (4 * 20 or 80) and nonante (90).
We never learned huitante (80), but here are apparently parts of Belgium that use is. We did learn soixante-dix and quatre-vingts-dix, and were allowed to use both. [0]
The Swiss also use huitante, and Nova Scotia uses octante.
[0]: Funnily enough, writing American English was a no-go. We had to write centre, colour, metre, lift (elevator), ticket (receipt).
I often wondered if the fact your number system forces you to multiply somehow affects your mathematical competence. France has won a lot of Fields medals.
The English number system kind of also forces you to multiply. Ninety-six is nine tens plus six, or 9*10+6. French is just special because they randomly sneak in base 20. But I doubt they really think more about saying 4 score plus sixteen then you do about saying nine tens plus six.
What is more influential (in a detrimental way) is German randomly switching reading direction. They read 2196 as 2000+100+6+90 instead of the more reasonable 2000+100+90+6
Dutch does that too, and I've tried out what happens if you say it correctly ("negentig 'n zes", ninety and six, instead of "zes 'n negentig")
It takes a second to process and then they'll ask "do you mean [reverse order variant]?" so they do kinda get it and I think transitioning to the sane version could be possible without much trouble, but people would have to want to
We can test your theory by checking how the Danes fared.
https://blogs.transparent.com/language-news/2016/08/29/danis...
As a no French, I love French numbers you dislike. 90 being 4 20 10 is something sort of awesome and funny.
Huh, the example "she runs" is not correct in Polish. Currently translates to "ona działa" = "she functions."
She runs, as in the form of locomotion, is "ona biega/biegnie."
The site says that in a blue bubble below the input field (I agree it's not very noticeable at all):
> This example demonstrates that the map should be interpreted with care; some translations have the meaning "she lasts" or "it works".
Another mistake for this example, although subtler, is the Dutch version, which is translated to the meaning of "she walks"
Thanks, didn't notice that. Not very thorough of me.
It's interesting that the site says it uses Google Translate, because using it via the web UI, it does give the correct answer.
https://translate.google.com/?sl=iw&tl=pl&text=she%20runs&op...
You are coloring it by 4 colors like map but you should color countries phonetically (speex, levenshtein or something similar)