EU Data Cloud

Realising and Exploiting the EU Data Cloud

This break-out session, hosted by the LATC project, will take place on the first day, June 6 from 4pm to 6pm in the room S.05, ground floor (pass left-hand side of the information stand, straight-ahead, on the left side).

We especially encourage representatives of EU institutions, such as Eurostat, Publications Office of the European Union, etc. to join us. In this session we will explain the opportunities and vision of the EU data cloud as well as demonstrate tools that enable to participate in this exciting dataspace.

Program

Time Topic Speaker
16:00 Introduction Knud Möller
16:10 The Web of Data and its 5 Stars

This talk provides the audience with a general introduction into the Web of Data, to which the EU Data Cloud belongs. A special focus will be on Tim Berners-Lee's 5 Stars Open Data ranking for data publishing. The talk will cover the advantages and challenges – reinforcing the benefits of 1-Star Open Data and continuing with the added potential for achieving higher star ratings.

Richard Cyganiak
16:40 LATC and the Data Cloud

We will give a high-level overview of the LATC Project, starting with general information about the project's mission and the partners involved, followed by a look at the various tools that the project provides. In particular, the 24/7 Interlinking Platform will be introduced.

Michael Hausenblas
17:00 Creating and Consuming Data for the EU Data Cloud

Going into more detail, this presentation will take a closer look at some of the concrete EU datasets that the LATC project provides. What are these datasets about, what was involved in creating them, why are they relevant, how can they be used?

Jens Lehmann
17:20 Link Sets and Why They ARE Important

Links are at the heart of the EU Data Cloud. This presentation will focus on just what links in the Web of Data look like and why they are important. The audience will also learn how the LATC platform can be used to create linksets between datasets and what they are good for.

Anja Jentzsch
17:40 On to the Cloud

Wrapping up the session, we will take a look at the larger LATC context, taking a peek at other examples of how Linked Open Data is used, and what's in store for the future.

Knud Möller
17:50 Q&A Session

This is your chance to ask all of your questions around publishing and consuming Linked (Open) Data in general, and LATC and the EU Data Cloud in particular!

 

Q&A Session Summary

We have collected a number of questions during our Q&A Session; find answers to them below.

What about Metadata, how can Provenance be adressed?

DCAT and ADMS can be used to provide metadata and provenance in the general case. For RDF data sources in special the usage of VoID is advised.

How can Interlinking of Datasets be Automated

  • completely automatic linking is probably not quite feasible
  • instead, semi-automatic linking with a human control component makes sense
  • e.g., linking is made a lot easier by employing rules (e.g. using the SILK link-specification language)
  • such rules can be created visually using the LATC Workbench

How Can One Search for Datasets Using the 5-Star Rating?

At the moment, this is partially possible on catalogues such as data.gov.uk, others will hopefully adapt similar functionality.

Is SPARQL suitable for end-users?

  • probably not, at least not for average end-users - in this respect, SPARQL is just like SQL, which is also not something an end-user would normally use
  • SPARQL is a tool for data engineers, data scientists, experts, developers, etc.
  • end users should see and interact with apps that might use SPARQL in the background

What About Scraping ‘Images’ / AV Content?

This is a whole research area of its own, and can require image analysis algorithms, crowd-sourcing approaches incl. gamification, etc. - important, but out of the scope of LATC.

What About Inferencing - Is It Being Used?

  • depends on size the knowledge base: no/lightweight inference for large knowledge bases; full OWL reasoning only for small/medium scale knowledge bases
  • in the future: SPARQL 1.1 entailment

Which RDF stores to use?

A lot depends on your use-case, but the following benchmarks can aid in the decision process:

How Do I Know Which Datasets Are Interesting to Me?

  • metadata on datahubs and dataportals can help: datasets should be categorised and tagged, have licensing information, information on links to other datasets, etc.
  • Example: http://thedatahub.org/dataset/bbc-programmes
  • This metadata can be used for searching

How Do I Learn About What a Dataset Can Tell Me?

  • again, datahubs are expected to contain information to help answering this question
  • examples of possible SPARQL queries is very useful additional metadata
  • datasets on Kasabi often contain such example queries: http://kasabi.com/dataset/world-geography