As part of the LAMBDA H2020 project (Learning, Applying, Multiplying, Big Data Analytics)[1], a staff exchange was arranged between the Mihajlo Pupin Institute[2] and the University of Oxford[3] in the period from February 3rd to February 14th, 2020. During this time, three PhD students working as Junior Research Assistants at Pupin visited Oxford, UK in order to learn about tools that are being promoted in the project.
Figure 2. Oxford, United Kingdom
Training Agenda
Figure 3. Agenda for the first two working days
Training Report
The first working day was started with a welcoming lunch at Linacre College[4] and presentation from Prof. Emanuel Sallinger about Knowledge graphs and VADA[5] system at the Department for Computer Science[6]. Prof. Sallinger introduced researchers working on VADA system, an extension of DATALOG language, as well as key concepts of the knowledge graphs and their use. He talked about several project he and his team are currently working on, or they completed using knowledge graphs. He introduced the VADALOG language and talked about its scalability and expressive power. He compared it with SQL and standard procedural programming languages, and showed its advantages, especially evident in cases involving definitions of recurrent relations. In the evening, researchers from Pupin and Oxford attended a lecture given by Georg Gottlob, professor at the University of Oxford and Vienna University of Technology, held at St John’s College[7].
Prof. Gottlob talked about use of knowledge graphs, Google knowledge graph and advantages that semantic technologies have over machine learning. He explained the difference between simply querying a knowledge base and actually inferring knowledge from it and differences between solely applying machine learning versus using it
in conjunction with semantic reasoning.
Figure 4. Emanuel Sallinger talking about knowledge graphs (left); Georg Gottlob’s lecture (right)
Figure 5. Pupin researchers working on VADALOG examples
On the second working day, dr Sahar Vahdati talked about DATALOG language and how the engine and reasoning behind it works. She talked about the structure of the queries, their expressiveness, and restrictions. Later, she introduced one of the main algorithms that are being used to process DATALOG queries. After lunch, Sahar held a hands-on session with VADALOG using Jupiter notebooks deployed on the department’s server. She started with some relatively simple tasks such as deducing relations in a family tree and finding distance between towns that are located on the opposite sides of a river and later covered some more advanced queries and principles.
The third and fourth day were spent mostly in brain storming sessions either between Pupin researchers internally or as a collaboration between Oxford staff and Pupin staff. Namely, during this time, the plan was to develop at least two scenarios where the VADA system[8] is supposed to be applied. As the basis of one of the scenarios, the energy domain was selected since the group from Pupin that the visiting researchers belong predominantly works on research projects related to energy efficiency, energy management and energy trading. Therefore, discussions were conducted in order to find ways in which existing ontologies, already deployed in smart services, can be extended and improved using semantic technologies. Potential applications with peer-to-peer energy trading were also discussed for future energy network.
Figure 6. Networking with VADA Lab colleagues
On the fifth day, Dea, Marko and Dušan were introduced to a knowledge graph embedding library developed by researchers at Oxford and the University of Bonn. However, due to poor cross-platform support, adapting the library, originally developed for Linux-based server applications, for Windows took slightly more work that originally expected. Nevertheless, by adapting the provided scripts, the team managed to run the training script on a remote computer with an appropriate GPU. By the end of the first working day of the following week, Pupin researcher started working in parallel on running the knowledge graph embedding algorithm for example use cases, preprocessing a large multi-gigabyte dataset of research papers for embedding applications by converting JSON data to triplets and decoding preexisting entity labels. Previously mentioned use case exploration continued with rule mining on Monday of the second week using AMIE[9] which was applied to the energy ontology from Pupin that holds smart sensor deployment metadata.
On Tuesday of the second week of the visit, the LAMBDA consortium assembled at the Computer Science Department of the University of Oxford for a previously scheduled and organized plenary meeting. During this day, active discussions were conducted on ongoing issues, future plans and preparations for the upcoming Big Data Analytics (BDA) Summer School that is to be held in Belgrade, Serbia in June 2020. Dissemination activities and LAMBDA as an organizer of the upcoming special track “Data Analytics Trends and Applications” at the 10th International Conference on Information Society and Technology (ICIST 2020)[10] and the “Knowledge Representation & Representation Learning” (KR4L)[11] workshop on the 24th European Conference on Artificial Intelligence (ECAI 2020)[12]. In light of these upcoming events, Pupin and Oxford are planning a joint research effort as a couple of research papers for the KR4L workshop with several papers already submitted by Pupin and accepted for presentation at ICIST 2020.
After the project management activities were concluded (the plenary meeting), researchers from Pupin continued working on previously initiated activities on data preparation and knowledge graph embeddings. On Wednesday of the second week, first successful training attempts of knowledge graph embeddings were completed. This opened up the opportunity for the research paper data, currently in preparation, to be later used for training. In order to achieve this, the raw data is parsed and prepared for embedding.
However, on Thursday, it turned out that the full set of papers was too big to be processed by a single GPU and, hence, a subset of the data with papers related only to healthcare was selected. After re-running the parser for this subset of the data, it was left to be trained overnight. Finally, on Friday, the healthcare paper data had finished training and was ready for further processing. As the visit was concluding on this day, a short meeting was organized between Pupin and Oxford in order to discuss further plans regarding the research efforts for the upcoming workshop and future in general.
[1] https://project-lambda.org
[2] http://www.pupin.rs/
[3] http://www.ox.ac.uk/
[4] https://www.linacre.ox.ac.uk/
[5] http://www.cs.ox.ac.uk/projects/vada/
[6] http://www.cs.ox.ac.uk/
[7] https://www.sjc.ox.ac.uk/
[8] http://vada.org.uk/
[9] https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/amie/
[10] https://www.eventiotic.com/eventiotic/conference/icist2020
[11] https://smartdataanalytics.github.io/KR4L/#Submissions
[12] http://ecai2020.eu/