A multi-lingual, multi-modal search and access system for biomedical information and documents.


Khresmoi stands for “Knowledge Helper for Medical and Other Information users”. It aims at automatically collecting the most up-to-date medical information from various sources (journals, websites, books, images), automatically making sense of all of this information and automatically making it easily and quickly available to the general public, to doctors and to radiologists according to their expertise.

The system is multilingual and include a search for images so that, for example, radiologists can find X-ray images similar to those of a particular patient and then find what the diagnoses for these similar images were.


Khresmoi 2D image/article search system protoype"

Khresmoi, medical information analysis and retrieval"

Khresmoi 3D image search system prototype"

Eye-Tracking of Radiology Medical Viewing"

Components & resources

The following components are available:

Khresmoi semantic annotation pipeline

The KHRESMOI semantic annotation pipeline is available for the processing of user's own documents in the cloud.

Image Retrieval and Analysis Framework

The Image Retrieval and Analysis Framework component will assist the radiologists in comparing images for a better diagnosis in less time, being able to access external repositories and helping radiologists to reach the adequate articles from the abundant scientific literature available. This component combines several search techniques in order to provide an innovative image information extraction system.

Adaptive user interface framework

The Adaptive User Interface Framework is an interface framework development tool set that can be used to build user interfaces for search systems. The main benefit in using this component is the possibility to easily develop usable tools to final users.

Machine Translation (MT) system dedicated to medical domain

Offer multilingual techniques in the field of indexing and search for the biomedical or other domains, offering innovative components to create cross-lingual services. This component will support cross-language search, including multi-lingual queries, and returning machine-translated documents/summaries when necessary. The main benefit for the user is access to medical domain literature in its own language.

Biomedical Knowledge Info-structure

This component customizes and tailors Knowledge base repositories in the healthcare domain and provides them a Knowledge infrastructure in where different types of data are integrated (public, private). The main benefit is Reuse existing datasets in web portals or in medical knowledge systems (The ones being provided by hospital information system vendors).

Multilingual Spelling correction

In the medical field there are a lot of different terms that can be used for the same disease or pathology. With this solution, this problem is overcome. This component provides spelling correction for different languages, including language detection service, and proposes suggestions in the 14 different languages supported. It is also used for annotating the resources. Help and it solves the problem of many synonyms for translation.

Semantically annotated corpus from the medical domain

It deals with semantically annotated literature for search and retrieval. By semantic annotation, we mean the linkage of regions of text (annotations) to concepts in ontologies. Semantic annotation is used to facilitate more complex search and retrieval; provide features for classification algorithms, such as trustability, readability, and provide linkage to biomedical knowledge resources. Once manual annotation has been done, feedback loops should be put in place to improve automatic annotation models and pattern grammars.

Annotated image data (anatomical location, pathology)

This is a dataset of medical images collected between 2010 and 2011 for the PACS dump in Vienna and 2006-2009 for the lung CT data, including:

Khresmoi collection of query logs Monolingual & Translated

A representative evaluation dataset of 1,508 search queries, manually translated from English to German, French, and Czech and thoroughly reviewed that reflects both the medical domain and the genre of the texts to be translated, i.e., short search queries that can be used for search engines evaluations.

The data set is now available through the LINDAT/Clarin repository and has already been used in a large-scale CLIR experiment.

Biomedical multilingual corpora

The BioMedical Corpora includes:

  1. A comparable corpus of monolingual texts (CS and EN indexed biomedical data),
  2. Some parallel corpora (aligned data).

CLEFeHealth2013 Task 3 Evaluation Package

The CLEFeHealth 2013 Task 3 Evaluation Package contains data used for the User-centred health information retrieval Shared task at the CLEFeHealth Lab conducted in 2013, with an emphasis on multilingual and multimodal information. (Conference and Labs of the Evaluation Forum).


There are 3 available KHRESMOI prototypes:

Khresmoi for Everyone

A search engine aimed at easing medical information search for the broadest group of end users

Khresmoi Professional

A search engine providing professional search features for medical professionals.


ParaDISE (Parallel Distributed Image Search Engine) is an image retrieval engine developed by the medGIFT research group as a replacement of the GIFT search engine in the context of the KHRESMOI project. The main concepts behind its design are scalability, flexibility, expandability and interoperability, allowing it to be used in standalone applications, integrated systems and for research purposes.



