Spam Collection and Detection System
People now feel more comfortable socializing over the internet through popular social networking and media websites than face to face. Thus, the social media websites are thriving more and more nowadays. Like others YouTube is a vastly popular social media site which is expanding at very fast pace. YouTube depends mostly on user created contents and sharing and spreading. Business entities and public figures are taking advantage of this popularity by creating their own page and shared information among the large number of visitors. However, due to this popularity, YouTube has become more susceptible to different types of unwanted and malicious spammer. Currently, YouTube does not have any way to handle its video spammers. It only considers mass comments or messages to be part of spamming. To increase the popularity of a video, malicious users post video response spam, where the video content is not related to the topic being discussed in the particular video or does not contain the media it is supposed to. In this research, we explore different attributes that could lead to video spammers. We first collect data of YouTube videos and manually classify them as either legitimate videos or spams. We then devise a number of attributes of videos which could potentially be used to detect spams. We apply Microsoft SQL Server Data Mining Tools (SSDT) to provide a heuristic for classifying an arbitrary video as either spam or legitimate. Our result demonstrates that in the long run we could successfully classify videos as spam or legitimate videos for most of the cases.