Posted on: Mon, 05/31/2021 - 15:36 By: valentina.janev

Linked open data sources and the semantic web has become a precious data source for data analytics tasks and data integration. The growing data set sizes of RDF Knowledge Graph data need scalable processing and analytics techniques. The processing power of in-memory frameworks which can perform scalable distributed semantic analytics like SANSA, make use of Apache Spark and Apache Jena to provide start-to-end extensive scalable analytics on RDF knowledge graphs. The setup of a technical system with all dependencies and environments can be a tough challenge and might also require sufficient available processing power. To reduce the entry barriers for getting started in evaluating and testing all opportunities of the SANSA framework and even bring this technology to production only from the browser. We introduce within this paper how to get the SANSA stack running within Databricks, with no need for special Apache Spark skills or any installations. This simplified usage offers distributed large-scale processing of RDF data from mobile devices. In addition, the availability of Hands-On Sample Notebooks increases the reproducibility of complex framework evaluation experiments. This paper shows that the startup of a very complex scalable semantic data analytics stack framework does not need to be complicated.