GPI Seminar Series: ACM talks

Four talks scheduled related to the ACM Multimedia Conference in Barcelona: Amaia Salvador, Carles Ventura, Brendan Jou (Columbia), Matthias Zeppelzauer (IMS/Vienna) (Interactive Media Systems Group at the Vienna University of Technology), 
Monday October 21st, 10:00 - 12:30, Seminar Room D5-007

10:00 Amaia Salvador: Crowdsourced Object Segmentation with a Game
Check this link for the summary of the talk

10:30 Carles Ventura: Open discussion and rehearsal for ACM MultiMedia Doctoral Symposium
This research talk will explore two of the most-used image models for object detection, 3D reconstruction, visual search: region-based and interest-points image representations; and will try to provide a new image model to take advantage of the strengths and over- come the weaknesses of both approaches. More specifically, we will focus on the gPb- owt-ucm segmentation algorithm and the SIFT local features since they are the most contrasted techniques in their respective fields. Furthermore, using an object retrieval benchmark, this dissertation research will analyze three basic questions: (i) the useful- ness of an interest points hierarchy based on a contour strength signal, (ii) the influence of the context on both interest points location and description, and (iii) the analysis of regions as spatial support for bundling interest points.

11:00 Brendan Jou: News Rover: Exploring Topical Structures and Serendipity in Heterogeneous Multimedia News
News stories are rarely understood in isolation. Every story is driven by key entities that give the story its context. Persons, places, times, and several surrounding topics can often succinctly represent a news event, but are only useful if they can be both identified and linked together. We introduce a novel architecture called News Rover for re-bundling broadcast video news, online articles, and Twitter content. The system utilizes these many multimodal sources to link and organize content by topics, events, persons and time. News Rover comprises end-to-end data ingestion from raw data gathering to intuitive user navigation over heterogeneous news sources, which include over 18k hours of broadcast video news, 3.58M online articles, and 430M public Twitter messages. Our system addresses the challenge of extracting "who," "what," "when," and "where" from a truly multimodal perspective, leveraging audiovisual information in broadcast news and those embedded in articles, as well as textual cues in both closed captions and raw document content in articles and social media. Performed over time, we are able to extract and study the trend of topics in the news and detect interesting peaks in news coverage over the life of the topic. We visualize these peaks in trending news topics using automatically extracted keywords and iconic images, and introduce a novel multimodal algorithm for naming speakers in the news. We also present two intuitive and novel interfaces for navigating news content by topics and their related news events as well as serendipitously viewing a news topic. These two interfaces trade-off between user-controlled search and serendipitous exploration of news while retaining the story context.

11:45 Matthias Zeppelzauer: Detection and Tracking of Elephants in Wildlife Video
Biologists often have to investigate large amounts of video in behavioral studies of animals. These videos are usually not sufficiently indexed which makes the finding of objects of interest a time-consuming task. In this talk I present a fully automated method for the detection and tracking of elephants in wildlife video. The method dynamically learns a color model of elephants from a few training images. Based on the color model, it localizes elephants in video sequences with different backgrounds and lighting conditions. The method exploits temporal clues from the video to improve the robustness of the approach and to obtain spatial and temporal consistent detections. The proposed method detects elephants (and groups of elephants) of different sizes and poses, performing different activities. The method is robust to occlusions (e.g. by vegetation) and correctly handles camera motion and different lighting conditions. Experiments show that both, near and far distant elephants can be detected and tracked reliably. The proposed method enables biologists efficient and direct access to their video collections which facilitates further behavioral and ecological studies. The method does not make hard constraints on the species of elephants themselves and is thus easily adaptable to other animal species.