|
|
|
|
|
 |
Text Analysis: The Next Step in Search
In general, text analysis refers to the process of extracting interesting and non-trivial information and knowledge from unstructured text.
|
|
Information Visualization
Text analysis is often mentioned in the same sentence as information visualization, in large part because visualization is one of the viable technical tools for information analysis after unstructured information has been structured.
A common visualization approach is a "treemap," in which an archive is presented as a colored grid (see figure left). The components of the grid are color-coded and sized based on their interrelationships and content volume. This structure allows you to get a quick visual representation of areas with the most entities. A value can also be allocated to a certain type of entity, such as the size of an email or a file.
These types of visualization techniques are ideal for allowing an easy insight into large email collections. Alongside the structure that text analysis techniques can deliver, use can also be derived from the available attributes such as "sender," "recipient," "subject," "date," etc.
Text analysis differs from traditional search in that, whereas search requires a user to know what he or she is looking for, text analysis attempts to discover information in a pattern that is not known beforehand (through the use of advanced techniques such as pattern recognition, natural language processing, machine learning and so on). By focusing on patterns and characteristics, text analysis can produce better search results and deeper data analysis, thereby providing quick retrieval of information that otherwise would remain hidden.
Text analysis is particularly interesting in areas where users must discover new information, such as in criminal investigations, legal discovery and when performing due-diligence investigations. Such investigations require 100% recall; i.e., users cannot afford to miss any relevant information. In contrast, a user who uses a standard search engine to search the Internet for background information simply requires any information as long as it is reliable. During due diligence, a lawyer certainly wants to find all possible liabilities and is not interested in finding only the obvious ones.
|
|
3 of 5
|
|
|
|
|
|
|
|