A multi-UAV network is a wireless multi-hop network consisting of several Unmanned Aerial Vehicles (UAVs) that are supposed to communicate with a centralized control center. Due to high mobility, such a dynamic network faces frequent changes in network topology, resulting in poor wireless link quality and even frequent disconnection. UAVs' computational capability is also bound to a limited threshold. Therefore, it is important to design a routing protocol that works in a lightweight and adaptive manner. We propose an intelligent routing protocol for Multi-UAV Networks that ensure minimum hops to the destination and better link quality by employing the Q-learning technique. The performance of the proposed scheme was evaluated through the OPNET simulator. A preliminary result shows that it can improve the routing performance in terms of the end-To-end delivery and packet delivery ratio, compared to the de facto ad hoc routing protocol.