The Challenge of Storing International Addresses

Working with international address data can be difficult and confusing. Even when you have an application available to validate an address, and it tells you that it’s deliverable, you still have to deal with the chore of storing the resulting data. So when someone asks, “what’s the best way to store international addresses?”, what they are really asking is, “what’s the easiest way to store international addresses?”

The short answer to the “what’s the best?” question, as it often is, you’re asking the wrong question. Many of you who have worked with varying data sets before already know that you first need to ask yourself, “what do I intend to do with the data once it is stored?” What the data is used for should have the largest impact on how it is stored. Depending on your specific requirements, the way you store address data can vary greatly. For some, how you store your data may not be entirely up to you as you may not have any control over the storage design, and are instead forced to work with the fields that are made available to you. Many users work with US-centric Customer Relationship Management (CRM) solutions that are designed with US address fields in mind, which can make storing international addresses all the more confusing and can also potentially lead to some data loss.

For those looking to simply print an address label for mail delivery, a single text field containing the complete formatted address will suffice. After all, why bother with breaking an address down to a mess of individual fragments if you’re not going to use them? Worse yet, what do you do when it comes time to put the pieces back together and you find that you don’t know how?

For some, correctly putting an address back together from its individual fragments might not be of great concern. The primary use of the data may be for some form of query analysis and/or organization. In which case you might be more concerned about which specific data type your individual fields should be or how to properly map these fragments. If you are implementing your own design then keep in mind that not all international addresses are necessarily parsed the same way, and you will need to consider if your design should be flexible enough to handle all international addresses or if you would prefer to go a country-specific route.

Mapping address fields

Consider this example of an address in England:

9 Gorse View
School Road
Knodishall, Saxmundham
IP17 1TS
UNITED KINGDOM

If we include the country name, then the above address has five address lines; six if we split the third line. Now, let’s go ahead and attempt to store this address in our CRM. Most CRMs will contain the following address fields for a contact:

Address1
Address2
Address3
City
State
ZIP
Country

Depending on the CRM, we may have somewhere between five to seven address related fields on average to work with. In the above example we have seven, so that should make things easy, right? We have more than enough fields, so there should not be any loss of data, but right away we see State and ZIP fields. These should be red flags that the storage was not designed for international addresses, but unfortunately, it is what we have to work with. Let’s go ahead and look at the parsed fields that we are likely to get back from an address validation solution:

Premise Number: 9
Dependent Street Name: Gorse View
Street Name: School Road
Dependent Locality: Knodishall
Locality: Saxmundham
Postal Code: IP17 1TS
Country: UNITED KINGDOM

In most cases, users will find that they can typically match Locality to City, Administrative Area to State, and Postal Code to ZIP. If you are unfamiliar with the address terms “Locality” and “Administrative Area” then please check out our previous blog, Five Commonly Used Terms and Definitions in International Address Validation Systems.

In the above example, you’ll notice that an Administrative Area equivalent was never provided. You’ll quickly find that this is quite common for many countries and that the locality is usually preferred. You’ll also notice that we have a dependent locality, which is a sub-region of the locality, and a dependent street name. It is important not to omit or lose these pieces of data if they are provided, as they offer additional detail/instruction on the whereabouts of an otherwise ambiguous address. So where to map them?

Luckily, our database design offers enough fields to accommodate these values, but keep in mind that this may not always be the case. In our example, we can map the premise number and dependent street name to Address1, the street name to Address2, the dependent locality to Address3, locality to city, postal code to ZIP, country to country, and leave the state empty. However, even though we were able to successfully map every value to our CRM, it is still very tedious and risky to try and handle all of the various address formats. Also, what course of action do we take when an address also includes a double-dependent locality or a sub-region?

Missing state or administrative area equivalent

Let’s look at two more example addresses:

3-10-13 Ryoke
Urawa-Ku
Saitama-Shi 330-0072
JAPAN

and

5 Rue Sainte-Catherine
12000 Rodez
FRANCE

The first example is a Japanese address. Looking at it with American eyes one might think that the first line is a premise number and a street name, the second line the city, and the third line the state and postal code, or their equivalents. However, things work very differently in Japan. Streets are not commonly named or used for addresses. Instead of street names, they primarily use regions that can normally be thought of as districts. In the above example, Ryoke is a second level sub-locality, Urawa is a first level sub-locality and Saitama is the locality. No administrative area equivalent value is given. Administrative areas are commonly omitted as often as they are included in Japanese addresses.

In the second example, we have a premise number and street name in the first address line, and a postal code and locality in the second. Once again, no administrative area value is given. The address is in France, but many European addresses will follow this general format, and it is common for them to omit a first level administrative area. Therefore, it is highly recommended that you do not make an administrative area a required field. Doing so would mean rejecting valid addresses for entire countries.

Facing the challenge

As I mentioned earlier, when breaking an address apart we also run the risk of putting it back together incorrectly. So while no individual address fragment might be lost, we still risk losing the correct address order and format. Addresses and their various fragments and formats can vary greatly not just from country to country, but also within the same country. So what’s the point of it all? Is there no hope when it comes to international addresses?

If you are forced to use a set storage design and are unable to alter it then your best course of action may be to simply store the complete formatted address in a single field, if it can fit. If the complete address cannot fit in a single field, then split it into multiple fields when necessary. In general, storing the complete address should be your primary objective as it should contain all of the necessary information that you need. The complete address can always be parsed out later as needed. Storing the country and postal code should be next on your priority list, although not all countries use postal codes. Postal codes are very important and useful, so be sure to store them when they are available. Finally, look towards storing the locality and admin area if they are available.

For those who will be implementing their own design, look to the output specifications of your validation solution. Most validation solutions will have a large list of address fields that cover the majority of the most widely used international addresses out there. You may consider it cumbersome, but if you include all of the output fields from your validation solution in your own design then you minimize the risk of losing data during the mapping process. You might not consider it the best way to handle storing international addresses, but unless you want to become an expert on the subject, it is definitely easier to use an existing design.