Exploring Automatic Speech Recognition with TensorFlow

Escur J. Exploring Automatic Speech Recognition with TensorFlow. Costa-jussà MR, Giró-i-Nieto X. 2018.

(829.82 KB)

Abstract

Advisors: Marta R. Costa-jussà (TALP, UPC) and Xavier Giro-i-Nieto (GPI, UPC)

Grade: A (9.8/10.9)

Speech recognition is the task aiming to identify words in spoken language and convert them into text. This bachelor's thesis focuses on using deep learning techniques to build an end-to-end Speech Recognition system. As a preliminary step, we overview the most relevant methods carried out over the last several years. Then, we study one of the latest proposals for this end-to-end approach that uses a sequence to sequence model with attention-based mechanisms. Next, we successfully reproduce the model and test it over the TIMIT database. We analyze the similarities and differences between the current implementation proposal and the original theoretical work. And finally, we experiment and contrast using different parameters (e.g. number of layer units, learning rates and batch sizes) and reduce the Phoneme Error Rate in almost 12% relative.

Source code by containing the fork from the Nabu project used in this work.

Exploring Automatic Speech Recognition with Rensorflow from Universitat Politècnica de Catalunya

Projects

	Deep learning
	Speech2Signs: Spoken to Sign Language Translation using Neural Networks

Image Processing Group

Search form

User login

Abstract

Projects