The SDIL Platform is the powerful in-memory computing infrastructure offered free-of-charge to research projects by our operation partners.
It is operated by the Steinbuch Centre for Computing (SCC) at KIT and offers state-of-the-art software and hardware.
This infrastructure includes:
- SAP HANA in-memory platform,
- Software AG Terracotta in-memory data management system,
- IBM SPSS Modeler and Server, and the IBM InfoSphere BigInsights analytics platform,
- Virtualization and resource-sharing environment for custom analytics and open source.
SAP HANA is a revolutionary platform that allows customers to explore and analyze large amounts of data in real-time, create flexible analytic models, and develop and deploy real-time applications. The SAP HANA in-memory appliance is available for use on the SDIL Platform.
In addition, we have installed the Application Function Library (AFL) on the HANA instances. These algorithms can be leveraged directly in development projects, speeding up projects by avoiding writing custom algorithms. AFL operations also offer very high performance, as AFL functions run in the core of the SAP HANA in-memory database. The AFL package includes:
The Predictive Analysis Library (PAL) is a set of functions in the AFL. It contains commonly used and parameterizable algorithms, primarily related to predictive analysis and data mining. It supports multiple algorithms, e.g., for cluster analysis (k-means), association analysis, C4.5 decision tree, multiple linear regression, or exponential smoothing. Please refer to the official SAP HANA PAL user guide for further information (SAP HANA PAL Library Documentation).
The Business Function Library (BFL) is a set of functions in the AFL. It contains commonly used and parameterizable algorithms and is primarily related to the analysis of financial data. Please refer to the official SAP HANA BFL user guide for further information (SAP HANA BFL Library Documentation).
- OpenSAP HANA Learning Material
- SAP HANA Blog
- SAP HANA Academy
- SAP HANA Appliance official user guides
Terracotta BigMemory Max
Terracotta BigMemory Max is an in-memory data management platform for real-time big data applications developed by Software AG. It supports a distributed in-memory data storage, which enables the sharing of data among multiple caches and in-memory data stores on multiple machines. BigMemory Max uses a Terracotta Server Array to manage data that are shared by multiple application nodes in a cluster.
BigMemory Max is installed and available on the SDIL Platform. A single and active Terracotta Server is configured and running on this machine. The server manages Terracotta clients, coordinates shared objects and persists data. Terracotta clients run on the application server along with the applications being clustered by Terracotta. The data are held on the remote server with a subset of recently used data held in each application node.
- System: Software AG Terracotta
- Cores: ( * on request * )
- RAM: ( * on request * )
- Disk Space: ( * on request * )
- BigMemory Max
IBM Open Platform with Hadoop and Spark
The SDIL Platform has the IBM Open Platform available and is running Apache Hadoop on a cluster with IBM Power8-Nodes. This includes, for instance, Spark as a decentralized processing engine, Map Reduce, HBase, Hive, and Pig. To store the data, we make use of a central SpectrumScale-Cluster filesystem.
IBM SPSS Modeler
The IBM SPSS Modeler built by IBM is a data mining and text analytics software application. It provides a range of advanced algorithms and techniques that include text and entity analytics, decision-making management, and optimization to build the predictive models and conduct a various range of data analysis tasks. (SPSS Modeler User Guide)
IBM SPSS Analytic Server
With our installation of the IBM SPSS Analytic Server, we enable data analysis with the SPSS Modeler on the IBM Open Platform and Apache Hadoop. This avoids transferring large amounts of data between the systems and user benefit from the optimum performance for their analysis task.
FusionInsight provides a comprehensive Big Data software platform for batch and real-time analytics using open-source Hadoop and Spark technologies.
The system leverages HDFS, HBase, MapReduce, and YARN/Zookeeper for Hadoop clustering, along with Apache Spark for faster real-time analytics and interactive queries. Solr adds powerful full-text searching of rich text documents (Word and PDF files), and rich APIs and development tools let you customize the system for specialized data analysis.
Extract big value from Big Data faster and easier with Huawei’s enterprise-class FusionInsight data analysis platform.
Virtualization and Resource Allocation
In order to use the SDIL resources efficiently and to avoid interference between users, we make use of the HTCondor batch system. This system takes care of resource management and guarantees that users get exclusive access to the requested resources. A program will run and returns after it’s finished. While it is running it consumes memory (RAM) and CPU. If many users run many programs the total available memory might not be sufficient and the program or even the computer server might crash. When using a batch system the system takes care of resource management and avoids system overload and crashes. To do this, users need to specify which computing task they would like to perform and what resources will be required for this task. This so-called job is submitted to the batch system and the system takes care of executing of it as soon as the requested resources will become available. The users can get an overview about their submitted and running jobs via an API. Additionally, users can be informed via email when their job is finished.
- System: HTCondor
- Cores: 32 x 4 = 128
- RAM: 1TB
- Network: 1Gbit/s Ethernet
SDIL Security and Data Protection
The SDIL platform is protected by several layers of firewalls. Access to the platform is only possible via dedicated login machines and only to users, which were approved beforehand in our identity management system. The hardware itself is operated in a segregated server room with a dedicated access control system.
Any data processing takes place in compliance with German data protection rules and regulations. Data sources are only accessible if such access was expressively granted by the data provider in advance.
To protect against data loss we perform frequently encrypted backups to our tape library. All data is deleted from the platform after the project finished.
Project Data Storage
Once you have successfully registered for the SDIL service, you are allowed to upload and work with your data on the SDIL Platform. You can upload your data using the SFTP or SCP protocols. All users get a dedicated private home directory for their files. For projects involving multiple users, a project directory is available which is only accessible by the project members.
In order to provide the ability to restore unwillingly deleted files, we have introduced snapshots of the file system. These snapshots are available for both user and project directories.
Tutorials and Support
Please refer to our platform documentation for detailed information about the platform usage. The tutorials provide some examples on how to perform data analyses with our software: