Xavier Giró

Position![]() |
|
---|---|
Associate Professor | xavier.giro@upc.edu |
Office | Phone |
---|---|
D5-117 (Barcelona - Campus Nord)TR2-102 (Terrassa - ESEIAAT) | +34 934 015 769 |
Biography
Xavier Giro-i-Nieto is an associate professor at the Universitat Politecnica de Catalunya (UPC) in Barcelona, as member of the Intelligent Data Science and Artificial Intelligence Research Center (IDEAI-UPC) and Image Processing Group (GPI), and also a visiting researcher at Barcelona Supercomputing Center (BSC). He graduated in Telecommuncations Engineering at ETSETB (UPC) in 2000, after completing his master thesis on image compression at the Vrije Universiteit in Brussels (VUB) with Prof. Peter Schelkens. After working one year in Sony Brussels, he started a Phd on computer vision, supervised by Prof. Ferran Marqués. In parallel, he designed and taught courses at the ESEIAAT (video content delivery) and ETSETB (deep learning) schools at UPC, as well as the Master in Computer Vision of Barcelona (video analysis). He visited multiple times the Digital Video and MultiMedia laboratory directed by Prof. Shih-Fu Chang at Columbia University in New York between 2008-2014, with whom keeps collaborating. He also works closely with the Insight Center of Data Analytics at Dublin City University, as well as his industrial partners at Vilynx, Mediapro, and Crisalix. He serves as associate editor at IEEE Transactions in Multimedia and reviews for top tier conferences in machine learning, computer vision and multimedia.
Top 10 Latest News
- June 2020: I will serve as chair for the ICMR 2020 Doctoral Symposium in Dublin. If you are working on a Phd on multimedia and retrieval, please consider submitting your work.
- June 2020: I will present the tutorial One Perceptron to Rule Them All: Language, Vision, Audio and Speech in ACM ICMR 2020 in Dublin.
- December 2019: Paper on video object visual grounding to be presented in the NeurIPS 2019 Workshop on Visually Grounded Interaction and Language (ViGIL).
- December 2019: Paper on hate speech detection in memes to be presented in the NeurIPS 2019 Workshop on AI for Social good.
- October 2019: Two papers (one and two) to be presented in the ICCV 2019 Workshops on ICCV 2019 Workshop on Geometry Meets Deep Learning (GMDL) and ICCV 2019 3D Face Alignment in the Wild Challenge (FAWD). Joint work with Crisalix.
- Octiber 2019: Industry ytack & demo paper at ISWC 2019 on event detection from broadcaster news RSS feeds.
- October 2019: Extended abstract on video object linguistic grounding to be presented in the ACM Multimedia Workshop on Multimodal Understanding and Learning for Embodied Applications (MULEA).
- Sepetmber 2019: Paper on video saliency prediction presented in BMVC 2019, joint work with Insight Center for Data Analytics at Dublin City University.
- June 2019: Best paper award at the CVPR 2019 DeepVision workshop on weak labels for segmentation.
- June 2019: Two presented in CVPR 2019: end-to-end video object segmentation with UOC, and inverse cooking with Facebook.
Awards: Best paper award at the CVPR 2019 DeepVision Workshop, Best scanpath prediction in Salient360 ICME Challenge 2017, Best poster award at LSCVS NIPS workshop 2016, Best poster award at ICMR 2016, Among Top 10% papers in ICIP 2015, Winner of the LSUN Saliency prediction challenge in CVPRW 2015, 2nd place in ChaLearn Cultural Event Recognition Challenge in CVPRW 2015, 2nd place in MediaEval Social Event Detection 2014, 3rd place in MediaEval Social Event Detection 2013, Winner of the Videobrowser Showdown in MMM 2012.
Scientific IDs: Google Scholar, WoK Researcher ID: M-5834-2013, ORCID: 0000-0002-9935-5332, Scopus Author ID: 35098596700, UPC Futur
Selected Service: Associate Editor of IEEE Transactions on Multimedia (2017-2019), Associate editor of ACM SIGMM records, Area Chair of ACM Multimedia (2016, 2019), Doctoral Symposium Chair in ICMR 2020, Organizer of Lifelogging Tools and Applications (LTA) workshop at ACM Multimedia 2016 & 2017.
Conference Committees: NeurIPS (2019, 2018, 2017), ICLR (2018, 2017), ICML (2019, 2018), CVPR 2019, ICCV (2019, 2017), ACM ICMR (2019, 2018, 2017), ACM Multimedia (2019, 2018, 2017, 2016, 2014), ICIP (2014, 2003), EUSIPCO 2011.
Workshop Committees: MUSA 2017@ACMMM, VSM 2016@ECCV & 2017@ICCV, EPIC 2016@ECCV & 2017@ICCV, ISM 2016, CBMI (2016, 2015,2014), CrowdMM 2015, MMSys 2015 Dataset Track, SMAP 2015, MediaEval 2014, SMAP 2014, SEWM 2014, MMSys Dataset 2014, SMAP 2013, ICIP 2003.
Journal reviewer: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Multimedia Tools and Applications (MTAP), EURASIP Journal on Image and Video Processing, Multimedia Systems (MMSJ), Image and Vision Computing (IMAVIS).
Journal Articles top
“Multiresolution co-clustering for uncalibrated multiview segmentation”, Signal Processing: Image Communication, 2019.![]() |
,
“Scanpath and Saliency Prediction on 360 Degree Images”, Elsevier Signal Processing: Image Communication, 2018.![]() |
,
“Introduction to the special issue: Egocentric Vision and Lifelogging”, Journal of Visual Communication and Image Representation, 2018.![]() |
,
“From Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment Prediction”, Image and Vision Computing, 2017.![]() |
,
“Assessment of Crowdsourcing and Gamification Loss in User-Assisted Object Segmentation”, Multimedia Tools and Applications, vol. 23, no. 75, 2016. | ,
Book Chapters and Bookstop
“VLX-Stories: building an online Event Knowledge Base with Emerging Entity detection”, in The Semantic Web – ISWC 2019, Auckland, New Zealand: , In Press. | ,
“Sentiment concept embedding for visual affect recognition”, in Multimodal Behavior Analysis in theWild, 1st ed., Elsevier, 2018. | ,
“Object Retrieval with Deep Convolutional Features”, in Deep Learning for Image Processing Applications, vol. 31, Amsterdam, The Netherlands: IOS Press, 2017. | ,
“Hierarchical Object Detection with Deep Reinforcement Learning”, in Deep Learning for Image Processing Applications, vol. 31, Amsterdam, The Netherlands: IOS Press, 2017. | ,
“Hierarchical Navigation and Visual Search for Video Keyframe Retrieval”, in Advances in Multimedia Modeling, vol. 7131, Springer Berlin / Heidelberg, 2012, pp. 652-654. | ,
Conference Papers top
“Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation”, in NeurIPS 2019 Workshop on AI for Social Good, Vancouver, Canada, In Press.![]() |
,
“Recurrent Instance Segmentation using Sequences of Referring Expressions”, in NeurIPS workshop on Visually Grounded Interaction and Language (ViGIL), Vancouver, Canada, In Press.![]() |
,
“VLX-Stories: a Semantically Linked Event platform for media publishers”, in International Semantic Web Conference (ISWC), Auckland, New Zealand, In Press. | ,
“Multi-View 3D Face Reconstruction in the Wild using Siamese Networks”, in ICCV 2019 Workshop on 3D Face Alignment in the Wild Challenge Workshop (3DFAW), Seoul, South Corea, 2019.![]() |
,
“Assessing Knee OA Severity with CNN attention-based end-to-end architectures”, in International Conference on Medical Imaging with Deep Learning (MIDL) 2019, London, United Kingdom, 2019.![]() |
,
Theses top
“Computer Vision beyond the visible: Image understanding through language”, Universitat Politecnica de Catalunya, Barcelona, 2019. | ,
“Visual Object Analysis using Regions and Local Features”, 2016.![]() |
,
“Part-Based Object Retrieval With Binary Partition Trees”, Universitat Politècnica de Catalunya (UPC), Barcelona, Catalonia, 2012.![]() |
,
Other top
“Multimodal Hate Speech Detection in Memes”. 2019.![]() |
, Ms Thesis |
“One Perceptron to Rule Them All: Language and Vision”. 2019.![]() |
, Presentation |
“Video Saliency Prediction with Deep Neural Networks”. 2019.![]() |
, Ms Thesis |
“Deep Learning Representations for All (a.ka. the AI hype)”. 2019.![]() |
, Presentation |
“Wav2Pix: Enhancement and Evaluation of a Speech-conditioned Image Generator”. 2019. | ,Ms Thesis |
Projects top
![]() |
Deep Learning for Video Analytics in Sport Events | National | Feb 2018 | Jan 2021 |
![]() |
MALEGRA - Multimodal Signal Processing and Machine Learning on Graphs | National | Jan 2017 | Dec 2020 |
![]() |
Deep Learning for 3D reconstruction and Simulation of Aesthetic Procedures | National | Jul 2017 | Jun 2020 |
![]() |
Large Scale Video Tagging with Knowledge Bases | National | Jun 2017 | May 2020 |
![]() |
Speech2Signs: Spoken to Sign Language Translation using Neural Networks | Other | Nov 2017 | Nov 2018 |
Research Areas top
![]() |
Region-based image and video processing | Internal | Jan 1992 | Dec 2020 |
![]() |
Lifelogging | Internal | Feb 2014 | Dec 2020 |
![]() |
Affective Computing | Internal | Jan 2015 | Dec 2020 |
![]() |
Deep learning | Internal | Jun 2014 | Dec 2020 |
![]() |
Saliency prediction | Internal | Feb 2015 | Dec 2019 |
Demos and Resources top
|
EgoMon Gaze & Video Dataset | Dataset | Jul 2016 |
![]() |
UPC at CVPRW Visual Question Answering Challenge 2016 | Software | Jun 2016 |
![]() |
UPC at CVPRW ActivityNet Challenge 2016 | Software | Jun 2016 |
![]() |
Faster R-CNN Features for Instance Search (software) | Software | May 2016 |
![]() |
Sentiment maps generator | Software | Apr 2016 |
Teaching top
Acronym | Title | Level | College |
---|---|---|---|
BIOM | Biometric Technologies | Master in Telecommunications Engineering (MET) | ETSETB - Telecom BCN |
DLAI | Deep Learning for Artificial Intelligence | Master MET | ETSETB TelecomBCN |
DLCV | Deep Learning for Computer Vision | Master in Telecommunications Engineering (MET) | ETSETB Telecom BCN |
DLMM | Deep Learning for Multimedia | Master & PhD | Dublin City University |
DLSL | Deep Learning for Speech and Language | BSc, MSc & Phd | ETSETB TelecomBCN |
IDL | Introduction to Deep Learning | BSc | ETSETB TelecomBCN |
GDSA | Multimedia Content Management and Delivery | Degree in Audiovisual Systems (3rd year) | Escola Superior d'Enginyeries Industrials, Aeroespacial i Audiovisual de Terrassa (ESEIAAT) |
VA | Video Analysis | Master in Computer Vision (MCV) | UAB, UOC, UPC & UPF |