This is a Keynote Lecture delivered at the Big Data Analytics Summer School 2020 by Dr Gloria Bordogna, Italian National Research Council IREA
This is a Lecture delivered at the Big Data Analytics Summer School 2020 by Dr. Simon Scerri, Fraunhofer IAIS.
This introductory lecture discusses the Big Data processing pipeline and the Big Data Landscape from the following perspectives
- Big Data Frameworks
- NoSQL Platforms and Knowledge Graphs
- Stream Processing Data Engines
- Big Data Preprocessing
- Big Data Analytics
- Big Data Visualization Tools.
The initial release of KGs was started on an industry scale by Google and further continued with the publication of other large-scale KGs such as Facebook, Microsoft, Amazon, DBpedia, Wikidata and many more. As an influence of the increasing hype in KG and advanced AI-based services, every individual company or organization is adapting to KG. The KG technology has immediately reached industry, and big companies have started to build their own graphs such as the industrial Knowledge Graph at Siemens.
Big data plays a relevant role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Semantic web technologies have also experienced great progress, and scientific communities and practitioners have contributed to the problem of big data management with ontological models, controlled vocabularies, linked datasets, data models, query languages, as well as tools for transforming big data into knowledge from which decisions can be made.
Although each government in Europe with their public administration services can be treated as a big data ecosystem, the opportunities of interconnecting, integrating and processing the data on EU level presents a real challenge nowadays. Discussions on public benefit of integrating and opening the data can be found in our previous work, where we examine the use of Linked Data Approach in European e-Government Systems.
This module will cover the setup, APIs and different layers of SANSA. At the end of this module, the audience will be able to execute examples and create programs that use SANSA APIs. The final part of this lecture is planned to be an interactive session to wrap up the introduced concepts and present attendees some open research questions which are nowadays studied by the community.
This module will cover the needs and challenges of distributed analytics and then dive into the details of scalable semantic analytics stack (SANSA) used to perform scalable analytics for knowledge graphs. It will cover different SANSA layers and the underlying principles to achieve scalability for knowledge graph processing.
Please, download from the following link.
In the practical level, the Big Data frameworks use different APIs for graph computations and graph processing. In this lecture, the important libraries built on top of Apache Spark will be covered. These include SparkSQL, GraphX and MLlib. The audience will learn to build scalable algorithms in Spark using Scala.
Please, downoloadfrom the following link.
The “processing frameworks” are one of the most essential components of a Big Data systems. There are three categories of such frameworks namely: Batch-only frameworks (Hadoop), Stream-only frameworks (Storm, Samza), and Hybrid frameworks (Spark, Hive and Flink). In this lecture, we will introduce them and cover one of the major Big Data frameworks, Apache Spark. We will cover Spark fundamentals and the model of “Resilient Distributed Datasets (RDDs)” that are used in Spark to implement in-memory batch computation.