Accepted Presentations

Presentation 1: Open Data, Open Database: PostgreSQL

Speaker:
Simon Riggs
Affiliation:

2ndQuadrant develops and supports the PostgreSQL database server, the open source database that is used worldwide for mission critical and high performance applications.
Europe's data is Big, Linked and Open. The PostgreSQL database offers an open source platform that is fast and robust enough to meet these challenges with a variety of innovative features. Surprising to many is PostgreSQL's ability to host a variety of different datatypes, covering standard relational, array-based designs, key value stores and XML/JSON document centric databases. From current capabilities to EU funded future enhancements, we will discuss the opportunities for many different data stores and the road map for the future.


Presentation 2: Using LOD to share clean energy data and knowledge

Speaker:
Florian Bauer

The Renewable Energy and Energy Efficiency Partnerhship (REEEP) is leading the development of an clean energy information portal - reegle.info. This portal (funded by several EU countries) takes advantage of (Linked) Open Data technologies and provides and consumes open data in the field of clean energy and is a proven and well recognized best practice example for the benefits of offering and re-using open data.
This presentation will focus on highlighting the benefits of using (Linked) Open Data based on the clean energy info portal "reegle" and showcase some existing integrations with other Open Data portals. In addition an introduction on why open data is essential to accelerate the clean energy marketplace will be given. The Renewable Energy and Energy Efficiency Partnerhship (REEEP) is leading the development of an clean energy information portal- reegle.info. This portal (funded by several EU countries) takes advantage of (Linked) Open Data technologies and provides and consumes open data in the field of clean energy and is a proven and well recognized best practice example for the benefits of offering and re-using open data. One example, showing how mashing up data from different open data sources can add value are the reegle clean energy country profiles (http://www.reegle.info/countries). On these pages reegle consumes high quality open data to offer its users a unique dossier on all countries worldwide. These comprehensive country energy profiles include DBpedia definitions, relevant reegle stakeholders ( actors ) as well as project outcomes from the region. Statistics of UN Data, World Bank and Eurostat feed clear graphs and display the data in the most sensible units. For some most countries there are Policy and Regulatory Overviews focusing on issues related to energy and efficiency. This application has proved very valuable to those active in the field of renewable energies, and several access points (e.g. start page, reegle map) guide the user to this comprehensive and unique overview.


Presentation 3: Enabling open data interoperability - The case for the Core Business Vocabulary

Speaker:
Phil Archer

This presentation introduces the initiative of the ISA Programme of the European Commission on the collaborative development and implementation support of e-Government Core Vocabularies as an enabler of open data interoperability. The European Commission has launched in November 2011 a multi disciplinary Working Group of experts from EU institutions, Academia and Standardisation Bodies to develop three common core vocabularies   Core Person, Core Location and Core Business -to be used in the development of public sector IT systems. The outputs of this work will be published by the W3C Government Linked Data (GLD) Working Group as First Public Working Drafts for further consultation within the context of the typical W3C standardization process. The desired outcome of that process will be the publication of these vocabularies as open Web standards available under W3C's Royalty-Free License. The presentation further exemplifies how and why this line of work promotes open data interoperability. The case for the Core Business Vocabulary is unveiled bringing to the audience a real life implementation of this vocabulary. The implementation by OpenCorporates cases the use of Core information of more than 40 million corporate entities.


Presentation 4: (Copy)right information in the digital age

Speaker:
Andrew Farrow

The objective of the Linked Content Coalition is to lay the foundations for a more coherent organisation of metadata and rights information through the adoption of cross-media rights communication standards.
The purpose of the Linked Content Coalition is to provide answers to the following questions:

  1. How do people who want to trade in rights find each other?
  2. How can they trade cost-effectively?
  3. How can more people be persuaded to trade in rights?
  4. How do we support a trading infrastructure in Rights in the long term?


Presentation 5: Open Bank Project

Speaker:
Simon Redfern
Affiliation:

TESOBE builds web applications, API's and mobile applications using technologies such as Python/Django, Scala/Lift, Node.js, Postgres and MongoDB, and is founder of the openbankproject.com. The Open Bank Project is an open source API for banks that seeks to "raise the bar of financial transparency" and facilitate application and data innovation in the banking domain.


Presentation 6: How the Biggest Open Database of Companies was Built

Speaker:
Chris Taggart
Affiliation:

OpenCorporates is the largest open database of companies in the world, with over 40 million companies and over 50 jurisdictions. Not only does this increase corporate transparency and understanding it also is an important tool, for anyone dealing with cross-border corporate information, from journalists, to regulators, to campaigners to other companies. We have also worked with governments and official bodies to improve access and quality of data, including helping UK Companies House with its linked data URIs, the EU/W3c ISA programme with the Business Vocabulary, and the G20 Financial Stability Board on its global LEI programme. All this from a micro-startup that's been going less than 18 months.
OpenCorporates is the largest open database of companies in the world that has grown in little over a year to over 40 million companies and over 50 jurisdictions. This presentation will explain why open company data is important for all groups, from companies to citizens, governments to journalists, how we grew so quickly with the help of the open data community, and why we think our innovative business model is a way to make open data sustainable.


Presentation 7: FactForge: Data Service and the Value of Inferred Knowledge over LOD

Speaker:
Mariana Damova
Affiliation:

Linked Open Data movement is maturing. Not only LOD cloud increases by billions of triples yearly, but also technologies and guidelines about how to produce LOD fast, how to assure their quality, and how to provide vertical oriented data services are being developed (LOD2, LATC, baseKB).  Little is said however about how to include reasoning in the LOD framework, and about how to cope with its diversity.  In this talk we will present FactForge, a reason-able view on the web of data, which comprise a segment of LOD cloud, e.g. DBPedia, Freebase, Geonames, Wordnet, NY Times, Musicbrainz, Lingvoj, Lexvo, CIAFactbook, loaded in a single repository (OWLIM), and forming a compound dataset,  on which inference is performed. This results in 40% increase of the knowledge available for querying to about 15 billion statements. 
The diversity of LOD makes their use and querying extremely challenging, as one has to be intimately familiar with the schemata underlying each dataset. Initiatives and research projects like schema.org, UMBEL, BLOOMS+, ALOCUS which try to involve the notion of a golden standard at schema level to allow better interoperability of LOD and the WWW in general, are indicative for the search of a solution along these lines. The new version of FactForge which will be shown in this talk and in the making for several years now, aligns with these views. It is supplied with a reference layer of the upper-level ontology PROTON, which is mapped to the ontologies of the LOD datasets in FactForge, making their instances accessible via PROTON concepts and properties. This reference layer makes loading of the LOD ontologies unnecessary, optimizing the reasoning processes, and allows for quick and seamless data integration of new datasets with the entire LOD segment of FactForge.
It also ensures better interfacing with other components via SPARQL as the queries are more compact and easy to formulate, faster response times, because of less joins are employed, and a wealth of inferred knowledge across the datasets, which allows for real journey of knowledge discovery, and navigation from different stand points. FactForge is the largest body of general knowledge and LOD on which inference is performed. We will present applications which make use of FactForge and emphasize the role of inferred knowledge in them produced by the reason-able views, and will argue for a new paradigm of data services, based not only on linked data verticals but also on inferred knowledge.


Presentation 8: Linking and Analyzing Big Data

Speaker:
Kostas Tzoumas

Linking and Analyzing Big Data Summary of the presentation: In this talk, I will provide an overview of two projects at TU Berlin, and the research and innovation challenges in their intersection. Stratosphere (www.stratosphere.eu, funded by the German Research Foundation) is an open platform for Big Data Analytics. It features a cloud-enabled execution engine with flexible fault tolerance schemes, a novel programming model centered around second-order functions that extends MapReduce, and a cost-based query optimizer. Stratosphere is validated by several use-case scenarios, including climate data analysis, text mining in the Bioinformatics, and data cleansing on Linked Open Data. DOPA (an FP7 STREP project) focuses on linking large Data Pools of both structured and unstructured data using data supply chains. The goal is to multiply the utility of each individual service while simultaneously sharing the costs between them. This way DOPA lowers the barrier of entry for SMEs that need to perform advanced analytics across multiple data pools since the required input data as well as the processing environment do not have to be provided by the SME itself.


Presentation 9: SRBench - A Benchmark For Streaming RDF Storage Engines

Speaker:
Peter Boncz

In this talk, we present SRBench, the first benchmark for Streaming RDF Storage Engines, which is completely based on real-world datasets. With the increasing problem of too much streaming data but not enough tools to gain and even derive knowledge from those data, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for the publishing, sharing, analysing and understanding of such data. Various approaches are emerging, , e.g., C-SPARQL, SPARQLStream, StreamSPARQL and CQELS. To help researchers and users to compare streaming RDF engines in a standardised application scenario, we propose SRBench, with which one can assess the abilities of a streaming RDF engine to cope with a broad range of use cases typically encountered in real-world scenarios. The design of SRBench is based on an extensive study of the state-of-the-art techniques in both the data stream management systems and the streaming RDF processing engines, and the existing RDF/SPARQL benchmarks. This ensures that we capture all important aspects of streaming RDF processing in the benchmark.
The first goal of SRBench is to evaluate the functional completeness of a streaming RDF engine. The benchmark contains a concise, yet comprehensive set of queries which covers the major aspects of streaming SPARQL query processing, ranging from simple pattern matching queries to queries with complex reasoning tasks. The main advantages of applying Semantic Web technologies on streaming data include providing better search facilities by adding semantics to the data, reasoning through ontologies, and integration with other data sets. The ability of a streaming RDF engine to process these distinctive features is accessed by the benchmark with queries that apply reasoning not only over the streaming sensor data, but also over the metadata and even other data sets in the Linked Open Data (LOD) cloud.
To give a first baseline and illustrate the state of the art, we show results obtained from implementing SRBench using the Polit cnica de Madrid (UPM). The engine supports the streaming RDF query language, also called SPARQLStream. The evaluation shows that the functionality supported by SPARQLStream is fairly complete. At the language level, it is able to express all benchmark queries easily and concisely. At the query processing level, some missing features have been discovered, for all of which preliminary code has been added for further development.


Presentation 10: Abstract Access Control Model for Dynamic RDF Datasets

Speaker:
Irini Fundulaki
Affiliation:

Given the increasing amount of sensitive RDF data available on the Web, it becomes increasingly critical to guarantee secure access to this content.  Access control is complicated when RDFS inference rules and other dependencies between access permissions of triples need to be considered; this is necessary, e.g., when we want to associate the access permissions of inferred triples with the access permissions of the ones that contributed to the implication of the former.  The standard way to enforce selective access to sensitive information is using access control tags.
Unfortunately, this simple scheme is problematic in the above setting, because after every change in the dataset, or in the access control tags, one has to recompute the access permissions for the entire dataset. To address this problem, we consider abstract access control models, which use abstract tokens and operators to describe the access permission of a triple. This way, the access label of a triple is a complex expression that encodes how said label was produced. This allows us to know exactly the effects of any possible change, thereby avoiding a complete recomputation of the labels after a change. An additional side-effect of our approach is that it allows the simultaneous enforcement of different access control policies by different applications accessing the same data, as well as the easy experimentation with different policies by the same application. This is achieved using the different concretization of the access labels and operators through concrete access control policies, that are used to determine the access permissions of triples.


Presentation 11: Big Data Public Private Forum (BIG) initiative

Speaker:
Nuria de Lama Sanchez
Affiliation:

Big Data Public Private Forum (BIG) is an initiative that aims to create an industrial community around Big Data in Europe. We will present the strategy proposed by the consortium selected; a balanced set of partners representing Academia and specially Industry.