The SDIL Platform is the powerful in-memory computing infrastructure offered free-of-charge to research projects by our operation partners.

It is operated by the Steinbuch Centre for Computing (SCC) at KIT and offers state-of-the-art software and hardware.

This infrastructure includes:

SDIL Platform Overview

SAP HANA

SAP HANA is a revolutionary platform that allows customers to explore and analyze large amounts of data in real-time, create flexible analytic models, and develop and deploy real-time applications. The SAP HANA in-memory appliance is available for use on the SDIL Platform.

In addition, we have installed the Application Function Library (AFL) on the HANA instances. These algorithms can be leveraged directly in development projects, speeding up projects by avoiding writing custom algorithms. AFL operations also offer very high performance, as AFL functions run in the core of the SAP HANA in-memory database. The AFL package includes:

  • The Predictive Analysis Library (PAL) is a set of functions in the AFL. It contains commonly used and parameterizable algorithms, primarily related to predictive analysis and data mining. It supports multiple algorithms, e.g., for cluster analysis (k-means), association analysis, C4.5 decision tree, multiple linear regression, or exponential smoothing. Please refer to the official SAP HANA PAL user guide for further information (SAP HANA PAL Library Documentation).

  • The Business Function Library (BFL) is a set of functions in the AFL. It contains commonly used and parameterizable algorithms and is primarily related to the analysis of financial data. Please refer to the official SAP HANA BFL user guide for further information (SAP HANA BFL Library Documentation).

System: SAP HANA
Cores: 320 (4 servers with 80 cores each)
RAM: 4TB (each server hosts 1TB of RAM)
Disk Space: 80TB (each server hosts 20TB of disk space)
Network: 10Gbit/s Ethernet
Software
SAP HANA Database System
Predictive Analysis Library
Business Function Library
sap-hana-platform

Further information

Terracotta BigMemory Max

Terracotta BigMemory Max is an in-memory data management platform for real-time big data applications developed by Software AG. It supports a distributed in-memory data storage, which enables the sharing of data among multiple caches and in-memory data stores on multiple machines. BigMemory Max uses a Terracotta Server Array to manage data that are shared by multiple application nodes in a cluster.

BigMemory Max is installed and available on the SDIL Platform. A single and active Terracotta Server is configured and running on this machine. The server manages Terracotta clients, coordinates shared objects and persists data. Terracotta clients run on the application server along with the applications being clustered by Terracotta. The data are held on the remote server with a subset of recently used data held in each application node.

System: Software AG Terracotta
Cores: ( * on request * )
RAM: ( * on request * )
Disk Space: ( * on request * )
Software
BigMemory Max

Further information

IBM Open Platform with Hadoop and Spark

The SDIL Platform has the IBM Open Platform available and is running Apache Hadoop on a cluster with IBM Power8-Nodes. This includes, for instance, Spark as a decentralized processing engine, Map Reduce, HBase, Hive, and Pig. To store the data, we make use of a central SpectrumScale-Cluster filesystem.

IBM SPSS Modeler

The IBM SPSS Modeler built by IBM is a data mining and text analytics software application. It provides a range of advanced algorithms and techniques that include text and entity analytics, decision-making management, and optimization to build the predictive models and conduct a various range of data analysis tasks. (SPSS Modeler User Guide)

IBM SPSS Analytic Server

With our installation of the IBM SPSS Analytic Server, we enable data analysis with the SPSS Modeler on the IBM Open Platform and Apache Hadoop. This avoids transferring large amounts of data between the systems and user benefit from the optimum performance for their analysis task.

System: IBM Watson Foundation Power 8
Cores: 140 (7 servers with 20 cores each)
RAM: 4TB
Disk Space: 300TB
Network: 40Gbit/s Ethernet
Software
IBM Open Platform with Hadoop/Spark
SPSS Modeler
SPSS Analytic Server
DB2 with BLU Acceleration
ibm-platform

Further information

Huawei FusionInsight

FusionInsight provides a comprehensive Big Data software platform for batch and real-time analytics using open-source Hadoop and Spark technologies.

The system leverages HDFS, HBase, MapReduce, and YARN/Zookeeper for Hadoop clustering, along with Apache Spark for faster real-time analytics and interactive queries. Solr adds powerful full-text searching of rich text documents (Word and PDF files), and rich APIs and development tools let you customize the system for specialized data analysis.

Extract big value from Big Data faster and easier with Huawei’s enterprise-class FusionInsight data analysis platform.

System: Huawei FusionInsight
Cores: 356 (16 servers)
RAM: 5TB
Disk Space: 362TB
Network: 10Gbit/s Ethernet
Software
Hadoop
Spark
Storm
Hive
huawei-platform-big

Virtualization and Resource Allocation

HTCondor

In order to use the SDIL resources efficiently and to avoid interference between users, we make use of the HTCondor batch system. This system takes care of resource management and guarantees that users get exclusive access to the requested resources. A program will run and returns after it’s finished. While it is running it consumes memory (RAM) and CPU. If many users run many programs the total available memory might not be sufficient and the program or even the computer server might crash. When using a batch system the system takes care of resource management and avoids system overload and crashes. To do this, users need to specify which computing task they would like to perform and what resources will be required for this task. This so-called job is submitted to the batch system and the system takes care of executing of it as soon as the requested resources will become available. The users can get an overview about their submitted and running jobs via an API. Additionally, users can be informed via email when their job is finished.

System: HTCondor
Cores: 32 x 4 = 128
RAM: 1TB
Network: 1Gbit/s Ethernet
Software
RapidMiner
Python
R
Matlab

SDIL Security and Data Protection

The SDIL platform is protected by several layers of firewalls. Access to the platform is only possible via dedicated login machines and only to users, which were approved beforehand in our identity management system. The hardware itself is operated in a segregated server room with a dedicated access control system.

Any data processing takes place in compliance with German data protection rules and regulations. Data sources are only accessible if such access was expressively granted by the data provider in advance.

To protect against data loss we perform frequently encrypted backups to our tape library. All data is deleted from the platform after the project finished.

Project Data Storage

Once you have successfully registered for the SDIL service, you are allowed to upload and work with your data on the SDIL Platform. You can upload your data using the SFTP or SCP protocols. All users get a dedicated private home directory for their files. For projects involving multiple users, a project directory is available which is only accessible by the project members.

In order to provide the ability to restore unwillingly deleted files, we have introduced snapshots of the file system. These snapshots are available for both user and project directories.

Tutorials and Support

Please refer to our platform documentation for detailed information about the platform usage. The tutorials provide some examples on how to perform data analyses with our software: