Semantic Fingerprinting is like Geocoding
Geocoding:
"... is the process of assigning geographic identifiers (e.g., codes or geographic coordinates expressed as latitude-longitude) to map features and other data records, such as street addresses."
A geocoder accepts a document written in plain language, and identifies geographic features. Each of these features is converted into latitude and longitude, a unique, canonical representation that is perfectly suited to this kind of data.
Once a document has been geocoded, the computed coordinates can be used to display and manipulate the document as if it were a geographic feature. It can be placed on a map; a collection of documents can be searched by location, with clauses such as 'within 25 miles of Brookline Massachussets'. The value is obvious to anyone who has used a map of real estate listings, or searched for a used car within 25 miles. Imagine having to type in the name of every city within 25 miles of your house, and still not being sure that you didn't miss anything.
Yet, without semantic fingerprinting (or maybe we should call it Healthcoding), this is exactly how people are still searching for documents about health and medicine. Medical documents are full of rich terminology : "malaria", "bupropion", "randomized controlled trial". The semantic fingerprinter recognizes each of these concepts, and translates it into a unique code. Of course medical terms don't map into coordinates, but they do have important natural relationships that, once understood, can be exploited to build powerful and intuitive user interfaces. My favorite example is the concept cloud; you can see two of them (Drugs and Diseases) at the bottom of the MyDailyApple news page. Here, current news is collected and fingerprinted. The medical concepts within the news articles are then aggregated according to their hierarchical relationships, and then ranked by importance. The result is a complete overview of current health news that fits into half a browser screen! Only fingerprinting makes this possible.
Like geocoding, and unlike purely text-based methods, semantic fingerprinting is also very robust to synonyms. Just as a geocoder converts multiple representations of the same place to the same coordinates, the fingerprinter replaces medical synonyms like 'bupropion', 'wellbutrin', and 'zyban' with the same unique identifier. And as in geocoding, the algorithms that operate on the coded document deliver the same query results no matter which synonyms are used in the query and the content.
Comments