You say tomato, I say tomahto.
You say Rome, I say Roma.
You say Munich, I say München.
Let’s Not call the whole thing off.
Have you ever wondered why the country code and abbreviation for Germany is DE, or similarly why it is ES for Spain? Unlike FR and CA, which are France and Canada respectively, DE and ES seem out of place for Germany and Spain. A simple explanation is that DE is short for Deutschland and ES is short for España – which are the names used locally for these countries.
Local names such as Deutschland and España are known as endonyms, and Germany and Spain are English language exonyms. You may be wondering, what are endonyms and exonyms? To put it simply, endonyms are the names of places used by the locals and exonyms are the names used by foreigners. So an endonym is what a country calls itself, and an exonym is the name used by other countries.
(As another example, United States is an endonym for, well, the United States. Meanwhile, exonyms for the United States will depend on the country involved: the French call us the États Unis and the Russians call us Соединенные Штаты.)
The DOTS Address Validation International (AVI) service currently offers three output language options to let the end user choose their preferred language setting and behavior: ENGLISH, BOTH (English and local addresses), and LOCAL_ROMAN. Let’s examine each of these in detail:
ENGLISH – Instructs the service to return the address in English, without any localized text or accents.
BOTH – Instructs the service to return a standardized address in both English and in its localized text (e.g., Cyrillic, Chinese, etc.) and format when applicable.
Here’s an example of a Chinese address in both English and in its local Chinese text.
Address input in English
No. 1514 Changyang Lu
Yangpu Qu, Shanghai Shi
Address output in Simplified Chinese
上海市杨浦区长阳路1514号
Here’s an example of a Russian address in both English and Cyrillic.
Address input in English
Kommunarov Ul, 290, 9
Krasnodar
Krasnodarskiii Kraii
350020
Address output in Cyrillic
Коммунаров ул, д. 290, OFFICE 9
КРАСНОДАР
КРАСНОДАРСКИЙ КРАЙ
350020
One last example, this time in Greece.
Address input in English
Alkamenous 76
104 40 Athens
Address output in Greek
104 40 Αθηνα
Αλκαμενους 76
LOCAL_ROMAN – Instructs the service to return the address in its local spelling using Roman text.
For example, the city of Rome will be returned as Roma, Naples as Napoli, Dublin as Baile Átha Cliath, Naestved as Næstved, and Cologne as Köln. Let’s take a look at some address examples.
Here’s an example of an address in Italy.
Address input in English
Via Villafranca 20
00185 Rome RM
Address output in Italian
Via Villafranca 20
00185 Roma RM
Example of an address in Denmark
Address input in English
Kobmagergade 20
4700 Naestved
Address output in Danish
Købmagergade 20
4700 Næstved
Example of an address in Germany.
Address input in English
Weisshausstr. 20-30
50939 Cologne
Address output in German
Weißhausstr. 20-30
50939 Köln
The service also has the ability with some countries to accept an address in its localized spelling and text and return the address in English. Try entering any of the address examples above into the AVI service using the local language, spelling, and format with the output language set English to see the address validated and standardized into English. When submitting an address in a non-English language, be careful to ensure that the text is properly encoded.
The AVI service cannot correct corrupted characters, so it is important to ensure that anything that will hold the address in memory and stores the data can support the character set. Otherwise, you will end up with data corruption, which is not always easy to detect or fix.
For example, in some cases, a character may simply come back as a question mark ‘?’ or a square ‘■’. Take the following address.
Weißhausstr. 20-30
50939 Köln
The fourth character of the first line and the eighth character of the second line will come back corrupted, as follows:
Wei■hausstr. 20-30
50939 K?ln
In other cases, the corruption can be quite severe, and you may end up with something like ‘تخت اره ÙŠÚ©’. Not only is it important to ensure that you do not send any corrupted data to the AVI service, but you also want to make sure that you properly handle and store the service response. Otherwise you may end up corrupting an address after it has been validated. (How this happens would make a good topic for another blog, but for now, just make sure to use the Unicode Transformation Format (UTF) on everything that handles the data.)
Each of these options gives you the flexibility to have a consistent addressing format for your international addresses, depending on your location, your customers, and your mailing conventions. All of them provide an automated, consistent approach to address validation. Whether it is addressing mail to customers in the format of their home countries, translating addresses, or ensuring readability for the sender, DOTS Address Validation International truly speaks your language.