- Scalable RDF Analytics with SANSA - Half-Day tutorial at The 19th International Semantic Web Conference (ISWC2020) - 09:00 AM – 12:00 PM (CET) 1st November 2020, (Virtual) [1]
The size of knowledge graphs has reached the scale where centralised analytical approaches have become infeasible. Recent technological progress has enabled tools for powerful distributed in-memory analytics that have been shown to work well on elementary data structures but they are not specialised for knowledge graph (KG) processing. Scalable Semantic Analytics Stack (SANSA) is a library built on top of one such tool, Apache Spark, and it offers several APIs covering different facets of scalable KG processing. SANSA is organized into several layers: (1) RDF data handling e.g. filtering, computation of RDF statistics, and quality assessment (2) SPARQL querying (3) inference reasoning (4) analytics over KGs. In addition to processing native RDF, SANSA also allows users to query a wide range of heterogeneous data sources (e.g. files stored in Hadoop or other popular NoSQL stores) uniformly using SPARQL. This tutorial aims to provide an up to date overview of the stack, together with detailed discussions on the previous releases, technical add-ons and developments. Furthermore, a hands-on session on SANSA, covering all the aforementioned layers using simple use-cases will be provided.
Link to Tutorial and CookBook.