Abstract:
In distributed networks such as ad-hoc and device-to-device (D2D) networks, no base station exists and conveying global channel state information (CSI) between users is costly or simply impractical. When the CSI is time-varying and unknown to the users, the users face the challenge of both learning the channel statistics online and converging to good channel allocation. This introduces a multi-armed bandit (MAB) scenario with multiple decision makers. If two or more users choose the same channel, a collision occurs and they all receive zero reward. We propose a distributed channel allocation algorithm in which each user converges to the optimal allocation while achieving an order optimal regret of O (log T), where T denotes the length of time horizon. The algorithm is based on a carrier sensing multiple access (CSMA) implementation of the distributed auction algorithm. It does not require any exchange of information between users. Users need only to observe a single channel at a time and sense if there is a transmission on that channel, without decoding the transmissions or identifying the transmitting users. We compare the performance of the proposed algorithm with the state-of-the-art scheme using simulations of realistic long term evolution (LTE) channels.