The Earth’s surface is mostly water-covered and the ocean is the source of a significant slice on natural resources and renewable energies. However, only a small fraction of the ocean has been surveyed. Being able to estimate the 3D model of the environment from a single video eases the task of surveying the underwater environment, saves costs and opens doors to autonomous exploration of unknown environments. In order to estimate the 3D structure of a vehicle’s surrounding environment, we propose a deep learning based Simultaneous Localization and Mapping (SLAM) method. With our method, it is possible to predict a depth map of a given video frame while, at the same time, estimate the movement of the vehicle between different frames. Our method is completely self-supervised, meaning that it only requires a dataset of videos, without ground truth, to be trained. We propose a novel learning based depth map prior using Generative Adversarial Networks (GANs) to improve the depth map prediction results. We evaluate the performance of our method on the KITTI dataset and on a private dataset of subsea inspection videos. We show that our method outperforms state of the art SLAM methods in both depth prediction and pose estimation tasks. In particular, our method achieves a mean Absolute Trajectory Error of 1.6 feet in our private subsea test dataset.