Video prediction has for a long time received attention within the field of computer vision, but it has gained importance during the last decade with the popularization of deep neural networks and their applications to computer vision. In this thesis, the main focus will be to linearize the dynamics of time sequences by exploiting the spatial context that video offers, with the final scope of obtaining better predictions. In the first place, we provide the theoretical base for dynamics. Following, we present several modifications for an existing deterministic predictor network called Dynamical Atoms-based Network (DYAN) [1], which models time sequences as the output of Linear Time-Invariant (LTI) systems using system identification and dynamics foundations. The solutions present different levels of success and an in some cases they beat the State Of The Art (SOTA) for at least one dataset, in the metrics SSIM, MSE and MMF. We also present two novel convolutional autoencoder architectures (LODAEs) for low order dynamics manifold embedding, strongly based on deep neural networks, with the primary aim of giving a generalized solution for mapping video sequences into a new manifold, to adapt them to the pipeline of predictors such as DYAN, based on system identification. The results for the LODAEs are promising as they seem to achieve their goal for a very simple synthetic dataset by lowering the order of the latent space sequences and providing good reconstructions and in some cases, predictions.