Abstract

Media producers publish large amounts of multimedia content online - both text, audio and video. As the online media market grows, the management and delivery of contents is becoming a challenge. Semantic and Linking technologies can be used to organize and exploit this contents. This dissertation addresses the problem of integrating Semantic Web technologies and linking data technologies into Vilynx’s platform, a system used by media producers to manage and explode its contents. For that purpose, Knowledge Graphs (KG) and its maintenance through multimodal Knowledge Base Population (KBP) from online data extracted from the Web is studied. The Web is a very large unstructured data source with millions of text, images, videos and audio. This thesis is willing to generate solutions to facilitate automatic learning from these multimodal data and use it in real product applications for media.

This thesis is going to be structured in three parts. The first part of the thesis will cover the construction of a multimodal KG, which will be the core of the system for knowledge extraction, standardization and contextualization.

The second part will consist on the construction of the tools that will be used for KBP. For that we will construct a multimodal semantic tagging framework, based on the previously mentioned KG. This block addresses some typical challenges of KBP and data mining, like: name entity recognition (NER), entity linking (EL), context set construction (CSC), structured data creation, standardization, entity matching and data fusion.

The third part will focus on the extraction of knowledge from the Web to populate the knowledge base. As the KG domain is media, we will populate the KG using events detected from news streams using a multilmodal perspective. To detect events we will construct a news aggregator system. This part will deal with the problems of Topic Detection and Tracking (TDT), Topic Modeling (TM) and multi-document summarization. From these data we will learn relations between world entities, that will populate our KG, dealing with the automatic detection and update of concepts and relations. Also social media information will be analyzed to understand trendiness and world interests.