We are currently doing some work on creating rule patterns that enable us to automagically find duplicate Concept values and create a master view of them.
For example creating a master view of Customers, or a master list of Products.
We have completed a McSpikey (AgileData.io research spike) for this and worked out that to achieve any real level of accuracy we will need a multi-step matching process. One of the first steps in this process is to use a soundex function to identify the values that match easily. The McSpikey identified the NLP Double Metaphone as the best soundex option for this.
To this end Nigel plumbed in a new NLP rule type that meant we could use the Double Metaphone algorithm to populate a new detail record with the soundex value and then we can use that to identify and flag any close matches.
An example of the rule attribute required to apply this rule type to the FirstName field would be:
double_metaphone(FirstName) : first_name
In the first test run everything went fine, but on the second field I tried it we got an error. The issue was the first field I picked was Surname and it had values for every Customer. The second field I picked was Middle Name and there were some Customers where there was no Middle Name, it seems the Double Metaphone does not like null values.
The quick fix was to wrap the rule attribute in a coalesce function that would pass a blank string value if there was a null value and make the old Double Metaphone happy. Something like:
double_metaphone(coalesce(MiddleName,'')) : middle_name
But as always we apply our “how easy does this make it for the user” lens and having to add all that extra code was a less than magical experience.
So a quick change to the rule type was done, now it automagically wraps the coalesce for the user in the background and we are back to:
double_metaphone(MiddleName) : middle_name
I remember attending a session at WebStock on Micro Interactions (I think it was this one by Dan Saffer. This session highlighted that every interaction the user does with your product, effects their perception of the ease of use of that product.
The example that was given was when you download and install an app on your Apple iPhone an interactive pie chart displays the download / install progression for you. You can get a feel for how far through the process is and a sense of movement to let you know it is still doing something as you wait.
So while removing a few extra lines for a rule attribute may not seem to be a big product feature, removing every single step we can for the user and making it happen as if by magic is the way we will ensure AgileData.io lives up to its promise of “Simply Magical Data”.
A few other things spring to mind as I think about this scenario:
- we have to remember to continuously live and breath our principle of making the complex simple for the user every time, its the core of what we are about;
- we are able to iterate our code and release it damn quickly;
- the patterns we apply under the covers allow us to easily iterate on rule types with minimal effort.