This module will introduce the topic of Knowledge Graphs. We will cover what a Knowledge Graph is, the similarities and differences between “world” Knowledge Graphs and Enterprise Knowledge Graphs, as well as theory and practice in the area. In particular, we will discuss the Vadalog Knowledge Graph Management System developed at the University of Oxford.
This module will discuss the topic of extraction for Knowledge Graphs. We will focus on web data extraction in this module. Web data extraction is essential to make information available on the web accessible and usable by Knowledge Graphs. We provide a thorough introduction to the topic. This will feature both Oxford’s Vadalog and OXPath systems.
This module will discuss reasoning in Knowledge Graphs. Reasoning is essential to gain value from Knowledge Graphs by deriving insights, and making available new implicit data from existing data. We will cover the theory and practice of reasoning in Knowledge Graphs, and provide a number of easily accessible examples based on Oxford’s Vadalog system.
This lecture will cover the existing advanced Big Data architectures following a bottom-up approach. In this lecture, the important knowledge to design and architect scalable solutions for challenging problems will be introduced. The primary components in the architecture of such systems and their architectures will be presented and discussed including “inter alia distributed kernels” and cluster managers, distributed file systems and storage systems.
This lecture focuses on architecting Big Data solution. We will discuss the role and importance of the components in realizing system architectures. The participants will be introduced to unique problem characteristics that drive Big Data and the unending technology options to solve them. The application of the introduced concepts and components will be discussed in real-world example of practical use-cases.
The “processing frameworks” are one of the most essential components of a Big Data systems. There are three categories of such frameworks namely: Batch-only frameworks (Hadoop), Stream-only frameworks (Storm, Samza), and Hybrid frameworks (Spark, Hive and Flink). In this lecture, we will introduce them and cover one of the major Big Data frameworks, Apache Spark. We will cover Spark fundamentals and the model of “Resilient Distributed Datasets (RDDs)” that are used in Spark to implement in-memory batch computation. Furthermore, essential parts of the important practical techniques will be introduced such as Hadoop Distributed File System for the data resiliency, and the "lineage" property of “Directed Acyclic Graphs (DAG)” to achieve resilience for the computation resiliency, or use of catalyst for code optimization.
In the practical level, the Big Data frameworks use different APIs for graph computations and graph processing. In this lecture, the important libraries built on top of Apache Spark will be covered. These include SparkSQL, GraphX and MLlib. The audience will learn to build scalable algorithms in Spark using Scala.
This module will cover the needs and challenges of distributed analytics and then dive into the details of scalable semantic analytics stack (SANSA) used to perform scalable analytics for knowledge graphs. It will cover different SANSA layers and the underlying principles to achieve scalability for knowledge graph processing.
This module will cover the setup, APIs and different layers of SANSA. At the end of this module, the audience will be able to execute examples and create programs that use SANSA APIs. The final part of this lecture is planned to be an interactive session to wrap up the introduced concepts and present attendees some open research questions which are nowadays studied by the community.