Deep learning models do not only achieve superior performances in image recognition tasks, but also in predicting where and when users focus their attention. This talk will provide an overview of how convolutional neural networks have been trained to predict saliency maps that describe the probability of fixing the gaze on each image location. Different solution have been proposed for this task, and our recent work has added a temporal dimension by predicting the gaze scanpath over 360 degree images for VR/AR. These techniques allow simulating eye tracker data with no need of user data collection.