Introduction To Address Validation Version 4
Introducing Address Validation 4, the latest and most advanced member of our Address Validation family! Our recent blog highlighted the powerful capabilities of Address Validation 4, showing examples of how it enhances and corrects various types of addresses. You can check out that introduction here.
In this blog, we’re diving deeper into the service’s outputs, breaking down what each response means and key details to look for. We’ll explore the address response, the parsed input response, address notes, and the functionality of our unique rating system. Join us as we unpack how Address Validation 4 delivers precise, actionable address data for your applications.
Validation Is Still The Point
Ultimately, validating, standardizing and returning the best address our users can get is the goal of the service. This service returns an object called “addresses” containing all of the same outputs that the original Address Validation services return like Address, City, State, Zip, County, Barcode Digits etc. as well as the post validation parsed out address parts. “addresses” can contain 1 or more results. Ideally, we only want one address but there will be cases when a partial address is submitted that matches multiple candidates and we can’t use alternative data to pick the right one.
We will focus more here on what makes Address Validation 4 different. All Address Validation 4 responses are given a status that users can key off of. “OK” means we were able to do something with the address. It may not be full validation but we at least found some of input datapoints valid (more on that later). “Ambiguous” is as the example above, we found more than one address that seems like a good fit for what the user intended and we could not pick a single case. “NotFound” means we just didn’t know anything at all about the address or its parts. We might still return a parse but its highly unlikely its anything good.
The returned address itself might be confirmed by the USPS, supplied by an alternative data source, or even be a partial match result. A result called “validationType” informs users what we found out about the address. “AddressGood” means we have high confidence a good clean address was given and returned. “PrimaryGood” means either we were not able to confirm or find an apartment or suite or an alternative source was used to verify the address. These are lesser quality responses but generally still good. “BlockGood” means that the address itself was not verified but that nearby addresses in the same block are present and good. “StreetGood” means we know the street is a good one but do not know anything further about the address with certainty. “AreaGood” means supporting address inputs (city, state and zip) are good but that we can find no information about the address itself.
The rest of the outputs in the “addresses” object are very similar to our previous Address Validation services with the expected standardizations, address and area informational pieces. We have plenty of documentation on that. Behind the scenes there is a complex scoring system that helps us choose the best address and a rating field that helps indicate risk scoring. That is important enough that we will discuss it in a different section below.
Ability to Parse Addresses
Unless we are completely unable to parse an address (most commonly because its incomplete or does not appear to be an actual address), we should always be able to return a parsed result called “parsedInput”. This is separate from the “addresses” response as it’s based on the original inputs before being cleansed and standardized by the validation process. This is something our previous iterations of Address Validation did not do. Since the parse is of the raw inputs, getting an actual validated result at the street level or better will be a better result to use. But in the case of an address that we either have no knowledge of or just some high level knowledge like the city, state or zip, the parsed input can give a usable result even if not validated.
Standardization is attempted but cannot be guaranteed when the primary data points are unknown. For addresses that are missing key elements a perfect parse might be impossible, take this example:
1234 Coast Village Rd, CityA, StateB, ZipC.
If given as 1120 Coast Village, CityA, StateB, ZipC, a valid parse would be 1234 Coast Vlg, CityA, StateB, ZipC. The advanced process of validation that Address Validation 4 takes could very well validate and provide a proper parse for 1234 Coast Village Rd. In this case the address response in “addresses” might provide a valid address that does not match the parsedInput.
“parsedInput” still will provide a clean if not perfect result that can be put into a database or CRM with appropriate risk indicators as well as be used for comparisons later to other datapoints. So it may still be useful to users.
Informational Address Notes
“addressNotes” is a list of informational notes both good and bad that can be found within the “addresses” object. They can be helpful in determining why a rating was high or low, indicate risk factors with the address or give useful information about what we had to do to validate the address.
We won’t try to hit all Informational notes since there are many (and new ones will be added from time to time) But we will try to hit on ones that are either interesting or new to Address Validation 4. For the full list see our Developers Guide <link to notes section of AV4 guide>.
“dpv” and “dpvDesc” are the indicators that an address has mail service by the USPS, and while this is a very strong authoritative indication the address is good, if its not present it does not mean the address is bad just that it may not receive mail delivery.
“corroborations” indicates agreement by multiple data sources. In the case that USPS does not know about the address its ideal if multiple other datasets agree its good.
“ambiguityResolvedByAlternativeSource, “secondaryAddedByAlternativeSource”, and “secondaryChangedByAlternativeSource” are important notes that indicate improvements were made to the original address that when tied to a good USPS result improves it. For example, perhaps the input leads to multiple possible addresses being the right choice but an alternative data source helps choose the correct one. Or the input address is missing an apartment or suite number leading to a lesser result and an alternative source is able to improve that to a better result.
A number of the addressNotes are tied to changes from the input address to the output address. Some examples of these are “preDirectionalAdded”, “postDirectionalRemoved” and “streetNameChanged”. These can be used to track what changed or potentially more importantly how much changed. Tracking too many changes from input to output could be a risk factor.
The rest of the notes are more informational about the address. “IsFreightForwarderLocation”, “IsPrisonLocation” and “IsHotelLocation”, “IsVacantLocation” etc. are good indicators of an address that would be higher risk to work with.
Rating Gives an Insight to the Risks
Behind the scenes of the validation for Address Validation 4 there is a complex scoring system that helps us chose the right address. Because we are aggregating many datasets together, there is the possibility of matching to multiple addresses that we then have to choose between. Its extremely important that we pick the correct address every time. To our users, only one address (except in the case of a truly ambiguous result) is ever returned and this scoring is hidden from view. The evidence of the scoring can be found throughout the response however in things like address notes that indicated when an alternative dataset address was chosen or what data points had to change to make the address work.
The result of the internal scoring system is also evident in the “rating” response found in “addresses”. “rating” is a 0-100 score that combines the quality of the output address as an address plus the risk factors of the address itself. The closer to 100 the address gets, the more reliable we think the result is. Another thing we strive for is that the rating when viewed as a percentage indicates the % likelihood of the address being good. This rating should give users a good idea what we think of the address as a physical location. BlockGood, StreetGood and AreaGood responses will always get a 0 rating because they never point to a good physical location, however for what they are, those results can be relied upon with high certainty. If we say a block is good, its good.
When we are evaluating a physical location, given varying degrees of quality of datasets used behind the scenes from highly authoritative USPS datasets to lower quality mapping databases the source of the address factors into the rating score. Additionally, multiple alternative datasets agreeing will bump up the rating, especially in cases where lesser datasets agree. If changes were needed to the input address to make it work that can decrease the rating. Too many changes from input to output should indicate a higher risk address. Other clear risk factors like vacant addresses, freight forwarder or prison addresses would likewise reduce the rating. In the future we may consider splitting the rating into three things: The rating as is, a pure is it a good physical location rating and a risk score factoring in any high risk factors separately from the data quality.
When evaluating a response from Address Validation 4, users can take into account validationType, rating and DPV values to determine how they want to use the result.
Address Validation is a powerful new member of our Address Validation family, if you would like to learn more contact us.