Linguistic Linked Open Data.

Information about the current status of the growing cloud of linguistic linked open data.

What is LLOD?

Linguistic Linked Open Data is a movement about publishing data for linguistics and natural language processing using the following principles:

  • Data should be openly license using licenses such as the Creative Commons licenses.
  • The elements in a dataset should be uniquely identified by means of a URI.
  • The URI should resolve, so users can access more information using web browsers.
  • Resolving an LLOD resource should return results using web standards such as HTML, RDF or JSON-LD (Content Negotiation may be used to show different versions to different users).
  • Links to other resources should be included to help users discover new resources and provide semantics.

The primary benefits of LLOD have been identified as:

  • Representation: Linked graphs are a more flexible representation format for linguistic data
  • Interoperability: Common RDF models can easily be integrated
  • Federation: Data from multiple sources can trivially be combined
  • Ecosystem: Tools for RDF and linked data are widely available under open source licenses
  • Expressivity: Existing vocabularies such as OWL, lemon and NIF help express linguistic resources.
  • Semantics: Common links express what you mean.
  • Dynamicity: Web data can be continuously improved.

Resources for LLOD.

OKFN

The Open Linguistics Working Group (OWLG)

LIDER

Linked Data as an Enabler for Enterprise

LDL-2015

The 4th Workshop on Linked Data in Linguistics (Beijing, July 2015).

LingHub

Linguistic Metadata Repository

More information about LLOD.

Guidelines for Linguistic Linked Data Generation: Multilingual Dictionaries (BabelNet).

The process of creating a Linked Data (LD) version of a lexical resource, in particular BabelNet, is describe in these guidelines. These guidelines contain advice on the vocabularies selection, RDF generation process and publication of the results. This document describes the models used and the design decisions taken during the conversion of BabelNet into the well-known lemon representation. More in general, we will describe common patterns that naturally emerge when converting a lexical resource into RDF format. Read More

Guidelines for Linked Data corpus creation using NIF.

This document describes best practices to follow for the generation of Linked Data text corpora, using the NLP Interchange Format (NIF). NIF is an RDF/OWL-based format that aims to achieve interoperability between NLP tools, language resources and annotations. It can be used to assign URIs to strings and annotate the resulting resources. The Brown corpus serves as example throughout these guidelines. Read More

Guidelines for developing NIF-based NLP services.

This document describes best practices to follow for the implementation of RESTful NLP web services that rely on the NLP Interchange Format (NIF). „NIF is an RDF/OWL-based format that aims to achieve interoperability between NLP tools language resources and annotations.“ As a proof-of-concept, we have implemented NIF wrappers for the Stanford POS tagger and Stanford parser. Read More

Guidelines for LLD exploitation.

This best practice describes how to exploit Linguistic Linked Data resources. The suggested steps for exploitation comprise: search and discovery of relevant resources, verifying the license of the dataset, navigating to the distribution of the data (download or SPARQL endpoint), extraction of the data that is relevant for a particular purpose or application. Read More

LLOD aware services.

This document recommends best practices for building Linguistic Linked Open Data (LLOD) aware web services. LLOD denotes the representation of linguistic resources in accordance with linked data principles. These principles include that entities are to be identified by HTTP URIs providing RDF-based information about the entity including links to related entities. LLOD-aware services consume, process and produce such resources. LLOD-aware services are services that consume resources available as Linked Data as Input and output an RDF resource that can in turn be published as Linked Data. Read More

Getting started with lemon.

Lemon is an RDF model for representing lexical information relative to ontologies. This guides helps you get starte with using the lemon model. Read More