Topics

Lehre MNM Team Projekte Publikationen en

. Home . Lehre . Seminare . Sommersemester 2019 . Hauptseminar . Topics

Im Folgenden finden Sie eine Aufstellung der zur Verfügung stehenden Themen. Die angegebene Literatur versteht sich als Startlektüre und weitere Literatur sollte selbstständig recherchiert wertden.

The suggested topics are listed below. The literature indicated here is intended as a starting point only - further literature should be researched independently.

1) Intel's Global Extensible Open Power Manager (GEOPM)

A framework for power/energy optimizations targeting High Performance Computing (HPC) by dynamic coordination of hardware settings across system compute nodes utilized by a given parallel application in response to the application's behavior and requests from the resource management and scheduling system.

2) Parallel and Distributed Deep Learning

Deep Neural Networks (DNNs) are becoming an important vehicle for modern large-scale computing applications, while their training in most cases still requires a significant amount of time. The topic discusses the training problem and describes possible approaches for its parallelization. Final technical report

Ben-Nun, Tal, and Torsten Hoefler. "Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis." arXiv preprint arXiv:1802.09941 (2018).

3) Tensor Processing Units (TPUs) and their In-Datacenter Deployment Analysis

A Tensor Processing Unit (TPU) is application-specific integrated circuit (ASIC) developed by Google for accelerating the inference phase of artificial neural networks. The topic describes these AI accelerators and discusses their indatacenter performance.

Jouppi, Norman P., Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates et al. "In-datacenter performance analysis of a tensor processing unit." In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on, pp. 1-12. IEEE, 2017.

4) Container technologies (e.g. Singularity, Docker, Charliecloud, Shifter)

Containers, being instances of an Operating System (OS) level virtualization, are getting more appealing due to their higher efficiency as compared to the full, hardware-level, virtualization. The topic describes the concept of the containers, and discusses the differences among main software containers like Singularity, Docker, Kubernetes, Charliecloud, and Shifter that are currently used in HPC. Final technical report

5) Deep Neural Networks for Malware Detection in Executables

As the use of computing devices increases in every day life, malware detection continues to remain a serious challenge for corporations, governmental agencies, and individuals. Today malware detection systems still heavily rely on heuristic and signature-based methods, where signature represents set of rules that are generally specific and thus usually fail to capture a new malware. This topic discusses an alternative approach, that uses neural networks and the raw bytes of the binary program itself to determine maliciousness without executing the target application. Final technical report

6) Apache Kafka: a distributed streaming platform

The topic discusses Apache Kafka which is a streaming platform allowing to publish and subscribe to streams of data enabling their corresponding storage and processing. Kafka is being used by tens of thousands of organizations and is among fastest growing open source projects and is currently one of the key technologies for managing and processing streams of data. The topic provides a general overview of the system and describes the Kafka's key concepts (writing/reading data to/from Kafka). Final technical report

Narkhede, Neha, Gwen Shapira, and Todd Palino. Kafka: The Definitive Guide: Real-time Data and Stream Processing at Scale. " O'Reilly Media, Inc.", 2017.

7) Streaming and batch processing with Apache Flink

According to a recent article from Forbes, around 90 percent of the data in the world today has been created in the last two years alone, and with the growth of Internet of Things (IoT) devices this number is expected to increase further.The topic discusses Apache Flink, built as open source software by an open community for distributed stream processing engineered to overcome certain tradeoffs that have limited the effectiveness and/or ease-of-use of other approaches for processing streaming and batch data. Final technical report

8) Contemporary HPC and the World's Fastest Supercomputer Summit

The topic discusses the supercomputing today by analyzing current TOP500 list and identifying the major trends. Additionally, it describes Summit system, deployed at Oak Ridge National Laboratory and recognized as the world's fastest computer back in June' 18 during International Supercomputing Conference by delivering 122.3 PFLOPS performance during the LINPACK run. Since then the system has been upgraded to deliver 143.5 PFLOPS LINPACK performance making it still the fastest supercomputer in the world (TOP500 November'18 rankings). The topic discusses the system design aspects and describes its compute and cooling infrastructure. Final technical report

9) Bridges: Converging HPC, AI, and Big Data

Bridges system, deployed at Pittsburgh Supercomputing Center (PSC), is specifically designed to meet the requirements of both traditional and non-traditional HPC communities. The system features interconnected set of interacting systems that offer exceptional flexibility for data analytics, simulation, workflows and gateways, leveraging interactivity, parallel computing, Spark and Hadoop. The topic describes the highly heterogeneous architecture of the Bridges system and outlines its design advantages. Final technical report

10) Can less complex processors enable HPC?

Currently, there are three technology architectures evolving for HPC: the first one is taking the commodity processors and putting them together with a high performance interconnect to createa supercomputer. The second way to buildsuch a high-end system is to is to take the commodity processors and augment them with accelerators (e.g. GPUs) that boost the delivered FLOP rate,i.e. performance of the system. This topic discusses the third category, which is using very lightweight processor cores, having a very simple structures associated with them and are less complex than the commodity ones. More specifically, the topic addressesthe possibility of building HPC systems from ARM multicore chips, and provides a detailed performance and energy efficiency evaluation. Final technical report

11) An empirical survey of performance and power variation of recent Intel processors

Traditional performance and energy/power efficiency analysis of homogeneous HPC systems (in terms of installed hardware and system software stack) assumes homogeneity across performance and power characteristics of underlying system compute nodes. As a result, processor power and performance variation were relegated to the background during system/application evaluation and benchmarking procedures. This topic considers the performance and power variation aspects of HPC-grade Intelprocessors of recent generations.

Marathe, Aniruddha, Yijia Zhang, Grayson Blanks, Nirmal Kumbhare, Ghaleb Abdulla, and Barry Rountree. "An empirical survey of performance and energy efficiency variation on Intel processors." In Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, p. 9. ACM, 2017.

12) Machine Learning Based Classification Over Encrypted Data

The HPC systems today enable the application of Machine Learning (ML) based methods in various domains, ranging from face recognition to medical or genomics predictions, as they provide the required data storage and computational power. However, many of these ML based applications rely on sensitive data usage making it important to control the privacy of the considered data and the underlying classifier. The topic discusses ML based classification techniques over encrypted data. Final technical report

13) Cross-Architectural Modelling of Power Consumption Using Neural Networks

Energy consumption is becoming a dominating factor for the Total Cost of Ownership of many supercomputers, making it important to keep energy costs in budget and to operate within available capacities of power distribution and cooling systems. The topic considersprediction of power consumption of HPC systems utilizing artificial neural networks that use data obtained from hardware performance counters. The topic discusses the accuracy of the proposed, portable across different micro-architectureimplementations, methodology and outlines the advantages against its simpler, linear-regression based, counterparts.

Elisseev, Vadim V., Milos Puzovic, and Eun Kyung Lee. "A Study on Cross-Architectural Modelling of Power Consumption Using Neural Networks." Supercomputing Frontiers and Innovations 5, no. 4 (2018): 24-41.

14) Next generation arithmetic for HPC and AI

Posit arithmetic, a form of universal number (unum) computer arithmetic, is designed as a direct drop-in replacement for IEEE Standard 754 floating-point numbers (floats), provide compelling advantages over floats, including larger dynamic range, higher accuracy, better closure, bitwise identical results across systems, simpler hardware, and simpler exception handling.Posits never overflow to infinity or underflow to zero, and 'Nota-Number' (NaN) indicates an action instead of a bit pattern. The topic discusses the posit arithmetic and outlines its advantages against fixed-pointarithmetic approaches currently used for AI and signal processing. Final technical report

Gustafson, John L., and Isaac T. Yonemoto. "Beating floating point at its own game: Posit arithmetic." Supercomputing Frontiers and Innovations 4, no. 2 (2017): 71-86.

Funktionen