Abstract:
Botnets pose a major threat to the information security of organizations and individuals. The bots (malware infected hosts) receive commands and updates from the Command and Control (C&C) servers, and hence, contacting and communicating with these servers is an essential requirement of bots. However, once a malware is identified in the infected host, it is easy to find its C&C server and block it, if the domain names of the servers are hard-coded in the malware. To counter such detection, many malwares families use probabilistic algorithms known as domain generation algorithms (DGAs) to generate domain names for the C&C servers. This makes it difficult to track down the C&C servers of the Botnet even after the malware is identified. In this paper, we propose a probabilistic approach for the identification of domain names which are likely to be generated by a malware using DGA. The proposed solution is based on the hypothesis that human generated domain names are usually inspired by the words from a particular language (say English), whereas DGA generated domain names should contain random sub-strings in it. Results show that the percentage of false negatives in the detection of DGA generated domain names using the proposed method is less than 29% across 30 DGA families considered by us in our experimentation.