SPARQL

Introduction to Resource Description Framework & SPARQL

Jesko Rehberg
5 min readApr 28, 2024
Open Data Linked (image by Photo by Stephen Picilaidis)

First standardized by the World Wide Web Consortium (W3C) in 1999, Resource Description Framework (RDF) initially emerged as a robust data model tailored for metadata. However, its versatility quickly led to its expansion, evolving into a comprehensive framework capable of representing knowledge across various domains. This evolution transformed RDF into a foundational pillar of the Semantic Web, facilitating the structured representation and exchange of data on the internet.

RDF stands for:

  • Resource: Refers to anything that can be uniquely identified with a Uniform Resource Identifier (URI), encompassing entities such as web pages, online resources, individuals, products, and more.
  • Description: Encompasses the attributes, properties, functions, and relationships associated with the identified resources. This includes metadata about the resources themselves as well as their interconnections.
  • Framework: Denotes the overarching structure provided by RDF, which includes the model for representing data, the languages used to express this data (such as RDF/XML, Turtle, or JSON-LD), and the syntaxes governing the creation and manipulation of RDF data. Together, these components form a comprehensive framework for semantic data representation and exchange on the web.

What is a triple?

RDF is a general syntax for representing data on the web. Any information expressed in RDF is represented as a triple. A triple is structured like a sentence: It has a “subject”, a “predicate” and an “object”. And it ends with a period to signify its completion.

  • Subject: This is the entity or resource being described. It is identified by a Uniform Resource Identifier (URI) and represents the “source” of the information.
  • Predicate: Also known as a property or attribute, the predicate describes the relationship between the subject and the object. It is typically represented by a URI and specifies the nature of the connection between the subject and the object.
  • Object: The object is the value associated with the predicate. It can be another resource identified by a URI, or it can be a literal value such as a string, number, or date.

Triples are the basis for creating structured, interconnected data in RDF, enabling the representation of diverse information on the web in a machine-readable format.

What is a Graph?

RDF is based on Graph. A Graph is essentially a collection of triples, where each triple represents a relationship between entities in the form of subject-predicate-object. These triples are interconnected to form a network of information, similar to how nodes and edges are connected in a graph structure.

This graph-based representation allows RDF to model complex relationships and express rich semantic information in a machine-readable format. It forms the backbone of the Semantic Web, enabling data interoperability and facilitating the exchange of knowledge across disparate systems and applications.

How to represent data in RDF?

Representing data in RDF involves defining classes, properties, and vocabulary terms to describe the entities and relationships within the data. Here’s a breakdown of each component:

  1. Classes: Classes represent categories of things in the real or information world. Examples include “person,” “organization,” or abstract concepts like “health” or “freedom.” Classes define the types of entities that can exist in the RDF data.
  2. Properties: Properties represent attributes or relationships between entities. There are two main types of properties:
  • Object Type Properties: These represent relationships between entities. For example, the relationship between a document and the organization that publishes it would be represented as an object type property.
  • Data Type Properties: These represent attributes of entities, such as the official name of an organization or the date and time when an observation was made.

3. Vocabulary: A vocabulary is a set of terms that includes classes, properties, and relationships that can be used to describe the data and metadata. RDF vocabularies define the terms that can be used in RDF data representation.

In RDF, terms such as classes and properties are identified using URIs (Uniform Resource Identifiers). These URIs uniquely identify the terms and ensure interoperability between different RDF datasets. By defining classes, properties, and vocabulary terms, RDF provides a structured and standardized way to represent data and metadata on the web.

Model your own vocabulary as an RDF schema

If there is no suitable authoritative reusable vocabulary to describe your data, you use conventions to write your own vocabulary:
- RDF Schema (RDFS)
- Web Ontology Language (OWL)

SPARQL

SPARQL stands as the pivotal standard query language and protocol for Linked Open Data and RDF databases. Originating as a tool to interrogate diverse datasets, SPARQL excels in extracting insights from disparate sources and varying formats. Much akin to how SQL empowers users in relational databases, SPARQL extends its functionality to NoSQL graph databases, such as Ontotext’s GraphDB, enabling seamless retrieval and manipulation of graph-based data.

What sets SPARQL apart is its versatility. Not limited to RDF databases alone, SPARQL can traverse any database that can be interpreted as RDF, facilitated by middleware, overcoming the constraints of local search. For instance, with Relational Database to RDF (RDB2RDF) mapping software, even relational databases become SPARQL-queryable entities. This flexibility empowers SPARQL as a potent tool for computation, filtering, aggregation, and subquery functionalities, transcending traditional boundaries.

Semantic Web vs Semantic Search

The core concept of the Semantic Web and Azure AI Search, are rooted in different approaches and objectives, but they share some commonalities in how they handle and process data.

Semantic Web:

  • Objective: The Semantic Web aims to make data on the web more meaningful and interconnected by adding semantic metadata. It focuses on enriching the web with machine-readable data that is explicitly linked and can be interpreted by machines.
  • Core Concept: At its core, the Semantic Web employs technologies such as RDF (Resource Description Framework), OWL (Web Ontology Language), and SPARQL (SPARQL Protocol and RDF Query Language) to represent, link, and query data in a structured and interoperable manner. It emphasizes the use of URIs to uniquely identify resources and relationships between them.
  • Functionality: Semantic technologies enable data to be processed in a more intelligent and context-aware manner. This facilitates tasks such as automated reasoning, knowledge discovery, and semantic search, where the meaning and context of data play a crucial role.

Azure AI Search:

  • Objective: Azure AI Search, particularly its embedded vector database functionality, focuses on providing advanced search capabilities powered by AI and machine learning algorithms. It aims to enhance the search experience by understanding the context, intent, and relevance of user queries.
  • Core Concept: The embedded vector database in Azure AI Search utilizes vector embeddings, a technique in machine learning where data points are represented as high-dimensional vectors in a continuous vector space. This allows for efficient similarity search and ranking of documents based on their semantic content.
  • Functionality: By leveraging vector embeddings, Azure AI Search can perform semantic search, where documents are not only matched based on exact keyword matches but also on their semantic similarity to the user query. This enables more accurate and context-aware search results, improving the user experience.

In summary, while the Semantic Web focuses on enriching web data with semantic metadata to enable interoperability and intelligent data processing, Azure AI Search with its embedded vector database emphasizes leveraging AI-driven techniques, such as vector embeddings, to enhance search capabilities and deliver more relevant search results based on semantic similarity.

--

--

Jesko Rehberg

Data scientist at https://en.digitalsalt.de/. Views and opinions expressed are entirely my own and may not necessarily reflect those of my company