The present invention is directed, in general, to a method and a system to automatically transform an image using neural networks. More specifically, the invention relates to a controllable image generation through an image representation and several conditions using a conditional Neural Network.

The method comprises receiving, by a processing unit, at least one image and processing the received image to obtain an image representation thereof (i.e. an intermediate representation of the initial image that captures high level features and low level properties of the image and that is structured in an understandable way for a conditional Neural Network such as a deep generative Neural Network). The method also includes receiving, by an encoding unit, one or more references (e.g. other images, text, labels, combinations thereof, or even other data describing how the received image should be transformed) and encoding the received one or more references into one or more features, the latter being further provided to a conditional Neural Network as a condition(s). In addition, the method further applies the conditional Neural Network to transform the obtained image representation into a resulting conditioned image based on said condition(s).