Deep Neural Networks have been used to tackle a wide variety of tasks achieving great performance. However, there is still a lack of knowledge of how the training of these models converge and how weights relate to their properties. In this thesis we investigate the structure of the weight space and try to disentangle its properties. Attention mechanisms are introduced to capture relations among neurons’ weights that help in weight reconstruction, hyper-parameter classification and accuracy prediction. Our approach further has the potential to work with variable input size allowing different network width, depth or even architecture types.