The Unmanned Aerial Multi-hop Network or Flying Ad-Hoc Network is a special type of mobile ad hoc networks, consisting of multiple UAVs(Unmanned Aerial Vehicles) to perform a variety of missions such as ISR(Intelligent Surveillance and Reconnaissance), sensing data collection, etc. In general, UAVs have problems with poor connectivity and low network performance due to their dynamic mobility and limited resources. Therefore, it is very important to design a routing protocol that operates in light-weight and adaptive manners. In this paper, we propose a double Q-learning based routing protocol that takes into account the minimum number of hops and link quality towards a destination. The proposed scheme adjusts an interval of node discovery messages according to network conditions to reduce the control message overhead. Via the OPNET simulator, we have performed a validation study of the proposed scheme and found out the fact that its packet delivery ratio becomes higher but the delay of data transmission is lower, compared to the existing QMR(Q-learning based Multi-objective optimization Routing) scheme as well as the simplest Q-learning based routing protocol.