Some things seem simple on the surface, but aren’t so easy in reality – for example, programming your DVR, or building assemble-it-yourself furniture. In the world of contact data quality, we would add one more item to the list: removing duplicate addresses from your database.
Why is this? Because in many cases, the exact same delivery location can be described in multiple ways and formats. Some of them are quirks of geography: for example, a location that can be described as part of different municipality levels, or a rural route location that also has a valid street address. Others are victims of syntax, such as having different ways of listing a suite or office number. Some can be caught by the human eye, but not easily by a computer. Still others would confuse anyone.
This article will look at many of the ways that duplicate address can slip by in your database – and some ways you can fix this, with a little automation. Let’s dive in.
Spotting Duplicate Addresses
If you were to look at the following address examples you would be able to easily identify them as being the same.
Example 1A:
27 East Cota Street Suite 500
Santa Barbara, CA 93101
Example 1B:
27 E Cota St #500
Santa Barbara, CA 93101
After all, the only difference between the two is that example 1B is abbreviated and example 1A is not. To a computer however, the two addresses are distinctly different, and they would therefore require standardization in order to look the same to a computer.
Using an automated solution, like our DOTS Address Validation products, that standardizes addresses according to USPS or other guidelines is a great solution for these scenarios. Here is how both of these addresses would look after being standardized:
27 E Cota St Ste 500
Santa Barbara, CA 93101-7602
How about this next example: do these addresses look the same to you?
Example 2A:
960 Embarcadero Del Norte
Isla Vista, CA 93117-5106
Example 2B:
960 Embarcadero Del Norte
Santa Barbara, CA 93117-5106
Example 2C:
960 Embarcadero Del Norte
Goleta, CA 93117-5106
Address examples 2A, 2B and 2C are all valid, USPS standardized, and they are all for the same mailing address. However, to a computer, they are still uniquely different. If you were maintaining a list of addresses and trying to remove duplicate addresses or prevent duplicates from being added, then the above examples would likely slip by unnoticed.
In this next example, would you have been able to guess that they are both for the same mailing address?
Example 3A:
RR 1 Box 1465
Bunch, OK 74931-5160
Example 3B:
90455 S 4687 Rd
Bunch, OK 74931-5160
In this case, a rural route address also has a street address equivalent. Here’s another example of duplicate addresses that would be tough to detect.
Example 4A:
10246 Spicewood Rd
Cadet, MO 63630-7211
Example 4B:
RT 2 Box 2730
Cadet, MO 63630-7211
If you were simply reliant on the address string to try and detect duplicates, then there is no way that you would be able to catch those examples. Wouldn’t it be nice if there was some sort of simple ID code or preferably an ID number that you could use to identify addresses instead of the full address? Actually, there is a solution for this in some countries, in the form of a unique address ID (UID).
Where Do IDs Come From?
Unique Address IDs should ideally come from an authoritative source, such as a postal authority or municipality. Authorities such as municipalities are generally responsible for naming streets and addressing buildings, while postal authorities are responsible for delivering mail to these locations. Differing authorities will generally come up with different IDs to fit their specific needs, and it is unlikely that address IDs will be shared by both. For example, municipalities will generally be more concerned with where an address is physically located and its classification type, whereas a postal authority will focus more on mail delivery and carrier routes. Therefore, it is not uncommon for a mailing address to differ drastically from its corresponding physical address.
Going back to address examples 2A, 2B and 2C, we have three duplicate mailing addresses, but of the three example 2A is the one that best describes address’ geographic location. This is because the address is geographically located in the unincorporated community of Isla Vista; however, Isla Vista has no post office of its own and mail is likely served by post offices in the neighboring cities of Goleta and Santa Barbara. According to USPS, all three city names are acceptable and can be used equally. This is because USPS has assigned them the same address barcode.
Address Barcodes
Barcodes are often unique address identifiers. For US mailing addresses, USPS provides a Delivery Point Barcode which is comprised of the full ZIP+4, the Delivery Point Code, and finally a checksum digit. The barcode digits can be used to identify duplicate mailing addresses. Take the previously mentioned address examples, their barcode digits are as follows.
Example 1 Barcode: 931017602254
Example 2 Barcode: 931175106601
Example 3 Barcode: 749315160554
Example 4 Barcode: 636307211461
With the barcode digits, it doesn’t matter how different the various duplicate address strings look since they all share the same numeric barcode digit. Note that you can obtain these barcodes digits as one of the outputs from DOTS Address Validation – US for US addresses.
USPS is not the only postal authority to offer a unique ID. Australia, for example, has a Delivery Point Identifier (DPID) that can be used as a unique identifier for an Australian address. The DPID is generated and maintained by Australia Post. According to the Australia Post Data Guide, the DPID is defined as follows.
- The Delivery Point Identifier (DPID) is a randomly generated, unique 8-digit number, which is allocated for every new address added to the source address database. All DPIDs, for complete addresses, fall within the range of 30,000,000 to 99,999,999.
Unfortunately, not every country has an authority that offers a unique address identifier. Sometimes delivery point data simply isn’t available, and identifiers are only available for some buildings, streets, communities and regions.
IDs do not guarantee uniqueness
Even when an authoritative source offers a unique delivery point identifier, this does not necessarily equate to uniqueness. For example, not all addresses are deliverable, and some communities rely on general delivery services where the recipient is required to pick up their mail at a post office. Since a general delivery address can serve more than one person or household, it would be dangerous to rely on the address ID of one to try and remove duplicate addresses from a database if they are associated with contacts.
It is also not uncommon for some rural areas to share a mailbox. According to USPS’s General Guidelines and Policies for Rural Delivery,
- On a rural route, more than one (1) family, but not more than five (5) families may use the same mailbox. A written notice of agreement signed by those who use such a box is filed with the postmaster at the delivery unit.
If more than one household is sharing a rural route mailbox then they will all share the same address barcode ID. So, while the address itself may be unique, it is not truly representative of the number of households behind it. This could prove problematic for businesses looking to use an address ID as a way to limit sales and promotions to a certain number of purchases or entries per household.
In some cases, there will be addresses that represent entities that send and receive large volumes of mail. These entities, such as universities, government agencies and some large corporations, will sometimes be assigned their own unique postal code. In the US these entities will be assigned a “unique ZIP+4” code. The French postal authority, La Poste, assigns CEDEX (Courrier d’Entreprise à Distribution Exceptionnelle) codes. In the UK, these codes are sometimes called “large user” codes and they are managed by Royal Mail. It is not uncommon for these large organizations to have their own mail department. Postal carriers are generally only responsible for delivering mail to these internal mail departments, and the large organization will handle delivery to the recipient.
Making sense of address IDs
Overall, when it comes to address IDs it important to keep a few things in mind.
- Make sure the address you are using accurately represents what you need it to in order to meet your business needs. For example, you are not trying to use a mailing address as a physical address and vice versa.
- If an address ID is available then ensure that it is generated and maintained by the appropriate authority, such as a postal authority for mailing addresses.
- Depending on your needs, address IDs may not always be an appropriate method to ensure uniqueness.
- Authorities have full control over the address ID. An address ID may change at any time or become orphaned without warning.
If you’re feeling less sure now about what you need then don’t fret. Sometimes the more you learn about a subject the more confusing that subject becomes. Here at Service Objects, we pride ourselves on helping you find the right tool for the job and are here to help.