Survey on Big Data Tools

Posted on: Thu, 06/11/2020 - 09:53 By: valentina.janev

This introductory lecture discusses the Big Data processing pipeline and the Big Data Landscape from the following perspectives

  • Big Data Frameworks
  • NoSQL Platforms and Knowledge Graphs
  • Stream Processing Data Engines
  • Big Data Preprocessing
  • Big Data Analytics
  • Big Data Visualization Tools.

Reasoning on Financial Knowledge Graphs: The Case of Company Networks

Posted on: Fri, 06/05/2020 - 10:53 By: valentina.janev

The initial release of KGs was started on an industry scale by Google and further continued with the publication of other large-scale KGs such as Facebook, Microsoft, Amazon, DBpedia, Wikidata and many more. As an influence of the increasing hype in KG and advanced AI-based services, every individual company or organization is adapting to KG. The KG technology has immediately reached industry, and big companies have started to build their own graphs such as the industrial Knowledge Graph at Siemens.

Data Lakes and Federated Query Processing

Posted on: Wed, 04/01/2020 - 20:54 By: valentina.janev

Big data plays a relevant role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Semantic web technologies have also experienced great progress, and scientific communities and practitioners have contributed to the problem of big data management with ontological models, controlled vocabularies, linked datasets, data models, query languages, as well as tools for transforming big data into knowledge from which decisions can be made.

Open and Big Data – Utilization Perspective

Posted on: Wed, 04/01/2020 - 20:09 By: valentina.janev

Although each government in Europe with their public administration services can be treated as a big data ecosystem, the opportunities of interconnecting, integrating and processing the data on EU level presents a real challenge nowadays. Discussions on public benefit of integrating and opening the data can be found in our previous work, where we examine the use of Linked Data Approach in European e-Government Systems

Distributed Semantic Analytics II

Posted on: Mon, 12/24/2018 - 16:22 By: valentina.janev

This module will cover the setup, APIs and different layers of SANSA. At the end of this module, the audience will be able to execute examples and create programs that use SANSA APIs. The final part of this lecture is planned to be an interactive session to wrap up the introduced concepts and present attendees some open research questions which are nowadays studied by the community.

Distributed Semantic Analytics I

Posted on: Mon, 12/24/2018 - 16:21 By: valentina.janev

This module will cover the needs and challenges of distributed analytics and then dive into the details of scalable semantic analytics stack (SANSA) used to perform scalable analytics for knowledge graphs. It will cover different SANSA layers and the underlying principles to achieve scalability for knowledge graph processing.

Please, download from the following link.

Distributed Big Data Libraries

Posted on: Mon, 12/24/2018 - 16:21 By: valentina.janev

In the practical level, the Big Data frameworks use different APIs for graph computations and graph processing. In this lecture, the important libraries built on top of Apache Spark will be covered. These include SparkSQL, GraphX and MLlib. The audience will learn to build scalable algorithms in Spark using Scala.

Please, downoloadfrom the following link.

Distributed Big Data Frameworks

Posted on: Mon, 12/24/2018 - 16:20 By: valentina.janev

The “processing frameworks” are one of the most essential components of a Big Data systems. There are three categories of such frameworks namely: Batch-only frameworks (Hadoop), Stream-only frameworks (Storm, Samza), and Hybrid frameworks (Spark, Hive and Flink). In this lecture, we will introduce them and cover one of the major Big Data frameworks, Apache Spark. We will cover Spark fundamentals and the model of “Resilient Distributed Datasets (RDDs)” that are used in Spark to implement in-memory batch computation.

Subscribe to PPT