« September 2007 | Main | November 2007 »

October 2007

Semantic Fingerprinting is like Geocoding

Geocoding:

"... is the process of assigning geographic identifiers (e.g., codes or geographic coordinates expressed as latitude-longitude) to map features and other data records, such as street addresses."

- Wikipedia

A geocoder accepts a document written in plain language, and identifies geographic features. Each of these features is converted into latitude and longitude, a unique, canonical representation that is perfectly suited to this kind of data.

Once a document has been geocoded, the computed coordinates can be used to display and manipulate the document as if it were a geographic feature. It can be placed on a map; a collection of documents can be searched by location, with clauses such as 'within 25 miles of Brookline Massachussets'. The value is obvious to anyone who has used a map of real estate listings, or searched for a used car within 25 miles. Imagine having to type in the name of every city within 25 miles of your house, and still not being sure that you didn't miss anything.

Yet, without semantic fingerprinting (or maybe we should call it Healthcoding), this is exactly how people are still searching for documents about health and medicine. Medical documents are full of rich terminology : "malaria", "bupropion", "randomized controlled trial". The semantic fingerprinter recognizes each of these concepts, and translates it into a unique code. Of course medical terms don't map into coordinates, but they do have important natural relationships that, once understood, can be exploited to build powerful and intuitive user interfaces. My favorite example is the concept cloud; you can see two of them (Drugs and Diseases) at the bottom of the MyDailyApple news page. Here, current news is collected and fingerprinted. The medical concepts within the news articles are then aggregated according to their hierarchical relationships, and then ranked by importance. The result is a complete overview of current health news that fits into half a browser screen! Only fingerprinting makes this possible.

Like geocoding, and unlike purely text-based methods, semantic fingerprinting is also very robust to synonyms. Just as a geocoder converts multiple representations of the same place to the same coordinates, the fingerprinter replaces medical synonyms like 'bupropion', 'wellbutrin', and 'zyban' with the same unique identifier. And as in geocoding, the algorithms that operate on the coded document deliver the same query results no matter which synonyms are used in the query and the content.

Comparing Semantic Fingerprinting to other search technologies

Today I came across this Venture Beat post about Twine, which provides an enumeration of technologies by which a computer can come closer to true understanding. I'll repeat that list here and include semantic fingerprinting. I'll go from most complex to most feasible.

  1. Natural language. This is the Hard AI problem of teach a computer to read and understand written (or spoken) language. The computer may respond in a variety of different ways, but I think the most commonly discussed is 'question answering', in which you ask a question and the computer tells you the answer. The meme for this seems to be a system that can answer 'Who did Dick Cheney shoot?', and also respond to 'Who shot Dick Cheney?' (as of today, nobody has). Systems without 'understanding' tend to give the same answer to both question. Ask Google and you'll see.
  2. Semantic search. This is something of a gray area because there aren't many great examples to point to. But it seems to be essentially an offshoot of the Semantic Web idea, in which people associate structured tags with text content. Semantic search technologies are working more with hard facts than the web search engines of today. However they require additional effort from content publishers that may or may not ever happen (my money is on 'not'). Semantic search systems can do question answering without natural language understanding, because the data they are working on has been specially encoded for them by humans. It's really more like database technology.
  3. Statistical analysis and keyword search. Documents are treated simply as collections of words. Some attention is paid to word proximity, and a lot of attention is paid to headings and links, but generally speaking the search engine does not try to 'understand' anything. The power of this technique is that no complex modeling is required on the part of the search engine developer, and no burden is placed upon the content author either. Boolean search has proven to be immensely scalable, largely (as far as I can tell) because the system is restricted to accepting very simple input.

So where does that leave semantic fingerprinting? Semantic fingerprinting is like:

  • Natural language, in that the user input is accepted in the natural written or spoken form.
  • Semantic search, in that the fingerprint contains true concepts rather than opaque words.
  • Keyword search, in that it is based on boolean search technology.

It is unlike:

  • Natural language or semantic search, because the output of the system is an organized set of documents, rather than a specific answer to a question.
  • Keyword search, because the user input is not limited to keywords.
  • Semantic search, because it operates on documents rather than facts.
  • Statistical techniques, because a domain model is required.
  • Statistical techniques, because the search results are organized according to insight gained from the domain model.

As far as I know, semantic fingerprinting is uniquely suited to search-like problems in which the input is one or more complex statements (optionally combined with restrictive keywords), and the output is an organized collection of documents. Some application examples:

  • Presenting medical evidence to answer a complex medical question.
  • Showing medical content (e.g. guidelines, alerts) that is related to a patient electronic medical record.
  • Guiding keyword-based search results using a patient profile. A keyword or two may constrain the hit list, but the profile strongly influences the relevance ranking.
  • Delivering health news and alerts to a patient, based on their health profile.
  • Making social connections between patients, and between patients and providers, based on their health profiles and areas of expertise.
  • Automatically showing related content next to blog and discussion posts.

The Big Idea : Personalized Health Information

We've all seen the searches in television commercials : 'hypertension', 'diabetes', 'ankle sprain'. But if professional health information is a Ferrari, the typical search results from a simple keyword query is more like a shiny tricycle. The novelty quickly wears off; we know there's a bigger world of information out there, but how to access it?

Being a patient is a journey. There's a period of initial diagnosis, learning basic disease information, then achieving a deeper understanding, gaining experience with the personal aspects of the condition, sharing information and gaining support from peers, and eventually giving back to the community through mentoring or active involvement in patient support and disease advocacy. The journey is complex and personal; health information resources on the web should understand this and support it.

Personalization is the key. In order to go beyond the introductory 'What is Hypertension', a health site needs to learn about your personal health history, family health history, medical conditions, and treatments. From this basic, anonymous, health profile, you can realize immediate benefits

  • Personalized health news, selected from the best sources to match your profile
  • Personalized search results, highlighting important factors such as connections between symptoms and conditions, side effects, drug interactions, and factors in your family history that may influence diagnosis, risk factors, and treatment.
  • Personal connections to other patients that share a similar profile.

You are now 'plugged in' to your condition, continuously updated with relevant information, able to delve deeper into background material and research, and connected to peers.

Next, you'll begin to communicate with your health provider. Personal health records are coming into being in many forms, and there are great benefits to integrating the physician's view of the patient with the patient's own health profile. Using the health profile, the patient is seeing information written for patients, with the ability to venture into more advanced material if they desire. From the physician's point of view, the patient health profile becomes more than a bookkeeping system; it's a window into the best, most relevant, most applicable guidelines and research that they can use to make real-time decisions.

Realizing this vision, a personalized health information system that bridges new patients, mentors, and health care providers, is not a simple task. It is not all about technology, although innovative technology is required. It is not all about discussions and social networking, although effective communication is required. The system we describe can only be built by connecting patients, providers, content developers, industry, and medical information systems, using new technology that truly understands the language and structure of medicine.

This complete solution, this 'Big Idea', is what we're working towards.

November 2008

Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            
AddThis Social Bookmark Button

Google Analytics