QuestMiner – Detection and evaluation of anomalies in graphs

An anomaly is generally defined as a deviation from the norm and from the expected behavior. Such anomalies often indicate incidents and constellations that require immediate attention and reaction. In a social network, an anomaly can indicate spontaneous attractions such as demonstrations. Their early detection is crucial for further management. With regard to graph data, anomalies can be modeled as subgraphs, in which the nodes deviate significantly from the norm attribute values and edge distributions. In the case of a dynamic graphs, historical conditions can also be taken into account.

Existing methods for the detection of anomalies in graph data are largely based on one statistical analysis of the graph. Some methods are capable of evaluating heterogeneous graphs, in which different node types are treated and evaluated differently. This is particularly advantageous to process complex data structures with a variety of different entities.

Other methods specialized in a uniform assessment of detected anomalies in order to better compare results among individual anomalies and estimate their severity. This is important to the interpretation of the results because it enable the user to compare a detected anomaly with other previously evaluted anomalies. The user can thereby judge the severity of the anomaly and decide how to proceed.

So far there is no method that combines both aspects: a method to uniformly evaluate detected anomalies in heterogeneous graphs. Both of these aspects, however, are necessary in many applications in order to achieve meaningful results. In addition, there are few procedures in the area of ‚Äč‚Äčanomaly detection and evaluation designed for real-time operation. This is essential in continuous monitoring of data sets and particularly in anomalies that require timely treatment.

The goal of the project is to develop a method for dynamical heterogeneous graphs, which is able to continuously detect and evaluate anomalies. Thereby, the procedure should be designed for real-time service and should permanently be supplied with a stream of new data. In addition, the method should be scalable and able to process large amount of data. For this, not only an algorithm specially adapted to this problem is necessary, but also supporting index and data structures that provide efficient access to historical data. The applicability and practicability of the procedure should be assessed during the course of the project by means of a prototypical implementation for which we want to use the SDIL platform.

Data Innovation Community

Smart Infrastructure

Contact person

Simon Sudrich, KIT, sudrich@teco.edu

Project duration

December 2017 – April 2018