The way to the intelligent assistant

Posted on 01.08.2018 by

Thomas Aurich

Machine Learning, Artificial Intelligence and Cognitive Search

With machine learning methods, larger amounts of data can be collected and categorized independently by trained systems. The machine recognizes the content of the document and organizes it based on previous experience. Due to the underlying vector mathematics, additional application scenarios, such as the comprehensive search in foreign-language texts without their translation, result for the search.

Ever since the arrival of language-based everyday helpers such as Amazon Alexa, Apple's Siri or Google Home, machines are increasingly taking over tasks that used to cost us time, energy and effort. "Alexa, put dog food on my shopping list!". "Siri, find me a restaurant nearby!". "OK Google. Play classic music! "With each input and every command, these devices learn about it and after a short time Google, Facebook and Co. know us almost better than we ourselves.

While some developments may be viewed critically, in most cases they add value and the benefits of machine learning will have a similar impact on our lives as the invention of the refrigerator or the phone. A survival without both is always possible, but with them it makes our everyday life much more pleasant.

Terms and technology

Before we turn to the digital assistant, some terms should be clarified. Fueled by the media and the presence of the buzz words "artificial intelligence", "machine learning" and "neural networks" in many publications, there is still some uncertainty about the crucial terms and, before we begin exploring the possibilities of cognitive search, we should use these terms clarified.

  • Artificial intelligence is the actual generic term and a branch of computer science. The aim of the AI (or AI in the English-speaking world) is the imitation of human behavior. Machines follow human decision patterns and are enabled by programming to make independent decisions.
  • Machine learning is itself a branch of artificial intelligence and stands for methods and approaches to implement the goals of the AI. The main feature of the ML is the training or training of machines (or computers) by means of examples. From these, the machine develops a pattern and can independently make appropriate decisions from a certain point.
  • Cognitive Search is a use case or a specialization of machine learning and is mainly concerned with search and information retrieval of search results based on neural networks.
Request here our free whitepaper on the topics:

for more information.

  • Deep Learning - the master key for document analysis
  • Be smart - Information Management! With AI methods and machine learning

The basis for every Machine Learning is a broad and growing data foundation. The better the pool of information, the more accurate the later results. Training data is selected from this data and the machine is trained by a data scientist. This results in a model that is capable of making its own decisions after a certain degree of maturity. These are based on insights and assumptions from existing data. In the end, the model needs little or no human intervention, should results vary too far.


The process of machine learning

In word processing with machine learning methods such as e.g. Word embeddings take terms into a multidimensional vector space. Each term receives a position within this space due to various parameters assigned to it (frequency, density, etc.) and is therefore mathematically describable. By means of vector calculation methods, relationships between terms can be described and new insights can be gained. The example of the words shoe and shoe can be such. Determine the words ice skate or inline skates, as they are in the vector space between the two.


Vectorized terms in a 2D space

In the work with texts and language, however, there are a few pitfalls that previously could only be solved with some effort. The variety of language in terms of expression and meaning leads to problems when multiple words, for example, the same meaning (synonyms) or a search word multiple meanings (homonym) may have. So far, connections had to be established via lists and thesauri. For example, let's think of the word "court", which may be a legal function but also a meal. So far, search engines have failed to establish meaning.

Using Word Embedding or Word Co-Occurrence - two methods of machine learning - a sense context can be created through an associative network. Important is the question in which contexts a word with other words stands. The following example illustrates the idea using the example of the word "wing". A grand piano has several meanings that can only be deduced from the context. Is it within the document to pedals, harpsichord or instruments, the musical instrument is meant. If, however, terms such as flies, humeri or birds fall, the flying organ will be meant.


The example of the word wing as a homonym

The added value for the search

What is the actual added value for a search function? For the search in documents and texts the approaches from the Machine Learning are a gain. In particular, the classification of contents of larger amounts of data can now be left to a trained machine, which independently opens up existing as well as new connections. In addition, documents can automatically be tagged out of their own content, which leads to an enrichment of the information.

Vectorization of words and their translation into mathematics provide another advantage. Terms from different languages form similar patterns within the vector space.


Combined terms in a vector space


The German equivalent of the vectorized terms

So instead of translating content, search can find patterns in different linguistic spaces that are similar.

The above examples additionally show another aspect of Cognitive Search. Terms in the vicinity of a search term can serve as a search in similar content in the future. For example, if you search for a car, the search can extend the results to tires or trains.

The methods of machine learning and especially of cognitive search open up new use cases:

  • Text classification: as part of a sorting and classification system, e-mails can be recognized to a central address and automatically assigned to a correct recipient. Organizations or companies with a constant and large amount of incoming data (insurance, banks, etc.) benefit from automated allocation.
  • Image Recognition / Face Recognition: Large image archives or media publishers using DAM systems still have to manually tag and categorize images. With the help of methods of machine learning, contents of image data can be automatically recognized and assigned. So they are also available in the search better than before.
  • Conceptual search: this approach can be found in large volumes of data that need to be related. Patent or contract searches are key industries that need to compare and evaluate documents with existing ones. Capturing the content is very important and cognitive search is a significant advantage.

This entry was posted in Corporate Blog and tagged "cognitive search", "machine learning", enterprise search, enterprise search solution, search engine. Save the link here.