- Open Access
- Authors : Mahantesh Borgi , Viraj Malik , Breznew Colaco , Pratik Dessai , Harsha Chari , Shailendra Aswale
- Paper ID : IJERTV10IS050336
- Volume & Issue : Volume 10, Issue 05 (May 2021)
- Published (First Online): 29-05-2021
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Advertisement Click Fraud Detection System: A Survey
Mahantesh Borgi
Dept. Of Computer Engineering Shree Rayeshwar Institute Of Engineering & Information Technology Shiroda-India
Pratik Dessai
Dept. Of Computer Engineering Shree Rayeshwar Institute Of Engineering & Information Technology Shiroda-India
Viraj Malik
Dept. Of Computer Engineering Shree Rayeshwar Institute Of Engineering & Information Technology Shiroda-India
Harsha Chari
Dept. Of Computer Engineering Shree Rayeshwar Institute Of Engineering & Information Technology Shiroda-India
Breznew Colaco
Dept. Of Computer Engineering Shree Rayeshwar Institute Of Engineering & Information Technology Shiroda-India
Shailendra Aswale
Dept. Of Computer Engineering Shree Rayeshwar Institute Of Engineering & Information Technology Shiroda-India
Abstract:- Web services have become an integral part of our day-to-day life including the various advertisements viewed on websites. Revenue is generated by companies through advertisement by selling clicks (known as Pay-Per-Click model). The company is paid for each click performed by the user on link published on the webpage. The clicked link redirects the user to sponsoring companys content. Invalid clicks that are generated either by humans or through a software as a malpractice for earning money is known as click- fraud. Several different time features are combined into a time print. Machine learning is performed for understanding up to what extent the unusual time prints occur in a data for distinguishing invalid clicks and identifying click fraud. The results generated shows that time prints are useful tool for the improvement of the quality of click fraud analysis and increases the overall accuracy of the observations.
Keywords Fraud detection, Logistic Regression, Support Vector Machine, Traffic Analysis, JavaScript
I. INTRODUCTION
According to [1], in the online advertising market, the advertiser takes decision about the investments he would make for the advertising, the advertiser then chooses a rate- plan created by the publisher which is suitable for his fixed budget and submits the advertisement for publishing. The ad network is responsible for contracts and legal matters with a publisher that displays the advertisement and receive a part of payment as a commission depending on the impressions made on the advertisement displayed.
A variety of revenue models are used by the users and the publishers which can attract more customers to them and can help them earn profit as well. One such revenue model which is most commonly seen in the industry is the Pay-Per- Click(PPC) model. According to [2], a fixed amount is paid by advertiser to the publisher for each click made by the user. Click-fraud indulges in malpractices for deceiving the revenue model such as PPC by creating records for false clicking activities on online advertisement. The deceiver may have intentions of creating losses to the advertiser by increasing the false impression so that the advertiser pays
more or he may be the agent of the publisher to increase the profits.
Pay Per Click (PPC) model has been implemented in
-
which shows that auctions and bidding play a vital role in the advertising economy. This has created opportunities for various IT companies and have boosted their wealth at a very high rate. The model exhibits properties such as a greater relevance and effective conversion rates where keywords are entered by a user for searching a specific advertised product.
According to [8] the Advertiser is the one who aims to make a business with a profit through product selling by means of online advertising. Promotions can be shown on webpage directly as a part of website or can be added as a pop-up related to an action of a user. The ad-network generates economy by taking payments from the advertiser for the promotion on a website or an application. The creation, development, maintenance and the deployment of the webpage or the application is the responsibility of the publisher. Ad Network behaves like a middle-man between the advertiser and publisher and streamlines the process.
Click fraud should not only be controlled from the technical point of view, but also involve the issue of integrity. If you want to cheat, no technology can be completely overcome and directly impact its market future. To solve click fraud well, or develop an effective strategy to prevent click fraud, or replace the current bidding ranking with another new business model, the current legislation is not perfect, consideration to make use of laws to strengthen the constraint on click fraud is required. The purpose of the study in [21] is to prevent the click fraud, reduce the harm of click fraud to the Internet, and make efforts to build a healthy and harmonious network.
This paper has been categorized as follows: Section II gives the overview about the different existing research works and also explains the methodologies used previously. Section III provides an overview to the ideas and concepts
in the proposed model. Section IV concludes the paper with the observations and scope for future improvements.
II. LITERATURE REVIEW
The advertiser may encounter the problem of Online Click Fraud in various ways. With the high increase in the internet usage the rate of click frauds has seen a steep climb over past few years. Multiple approaches for tackling the issues have been put forth by various researchers. The following literature gives a basic descriptive idea about various existing computational methodologies available.
-
REGRESSION
Machine Learning methods such as time-print and traditional classification model are used in [3]. Click data associated with time to check the behavior of users at particular time is analyzed. In [7] Traffic module, Analysis module, and Calculation module are used to monitor the behavior of user for click fraud detection.
Pixel cluster used in [8] for transit monitoring cursor movements uses applet or java script to keep track of transit of the cursor, it analysis the cursor movements and determines the confidence level to check whether cursor movements are indicative of fraud. Client-side scripting and
server-side scripting is performed in [10]. It has click verification website to identify valid and invalid clicks and prevents advertising from getting billed for fraudulent activity.
Support vector machine (SVM) model are implanted in
-
where AUC (Area Under Curve) value of proposed algorithm is significantly improved in comparison with ELM and the weighted ELM based method. The investigation on Bot signature, Distribution tests, Scale families and reference curves were used to design Microsoft click filtration system. The design in [21] includes real-time and near real-time components and it also has consistence rapid update capability in fraud detection.
In [22] JavaScript support and mouse event test were carried out along with browser functionality test and browser behavior examination that collected information about real world click traffic. To perform effective identification of human and bot clicks a vital element is the testing throughout the clients functionality at the advertisers side, which is implemented at both the server as well as client sides incurring negligible overhead. Filtering on Rule based, Classification based and Clustering based are carried over data in [30].
TABLE I: REGRESSION
Sr No.
Title
Methodology And Tols
Techniques
Dataset
Merits
Demerits
Results
Accuracy
[3] Identification of Click Fraud and Review of Existing Detection Algorithm. (2019)
1.Machine learning 2.Naïve-Bayes
Mix Adjustment Algorithm
NA
A loop is created to monitor and optimize upon testing on many systems which are dynamic in nature.
The modification of the algorithm as well as its extension still needs to be accurately worked upon.
Testing can be done using the algorithm as well as comparison can be done to enhance its performance further
99.6%
[7] Real-Time Ad Click Fraud Detection (2020)
1.Machine Learning, 2.Naïve-Bayes 3.Heuristics
Kaggle Records: 184,903,890
1.There is a very limited cost of utility for classification in a real time scenario 2.Datasets required for detection of click fraud is sparse in availability
On the basis of precision the methods of machine learning performed using neural network have best precision
MLP
= 95.43%,
Naïve-bayes
=89.97%
Gradient Boosting
=71%
[8] Deep Learning- based Model to Fight Against Ad Click Fraud (2020)
1.Machine Learning. 2.Multinomial Naïve bayes
Kaggle Records: 184,903,890
1. High accuracy is achieved even on small dataset. 2.Probability of fake and real clicks is computed.
Supervised GAN gives good accuracy.
89.7%
[10] Click Fraud Detection and Prevention System for Ad Network. (2018)
1.Machine Learning 2.Naïve-Bayes 3.Python Language
Rues:
1.Online Methods 2.Offline methods
NA
Good performance against different types of attack
Click Fraud Detection using Online and Offline rules gives good result.
NA
-
Logged Event based ,Client Based
-
Sessionization, Heuristics 3.Explored, Conversion Sparsity
-
Achieves high precision on unbalance dataset.
-
Compute costs below 1µs per click.
-
Logistic regression,
-
Random Forest
-
Auto Encoder
-
Semi Supervised GAN
-
Require prior classification to anomalies.
-
Loss error will be high for bots.
-
Low frequency attacks are undetected.
-
Lack of public data sets about click related subjects,
[17] Click Fraud Detection: Adversarial Pattern Recognition over 5 Years at Microsoft (2017)
1.Machine learning. 2.Naïve-Bayes 3.Data mining
1.Bot Signatures 2.Machine induced decision trees
Kaggle
Clickbots can be configured to blend in to avoid detection.
Adversarial Pattern Recognition has been steadily improved, enhanced and is very consistent in detection of frauds.
NA
[21] The study on preventing click fraud in internet advertising. (2020)
1.Machine Learning 2.Naïve-Bayes
Random forest classification algorithm
Records: 5772649
Third Party Detection technologies to prevent click fraud based on technical measures is efficient than singular algorithms.
93 %
[22] Machine Learning based Ad- click detection system. (2019)
1.Machine Leaning 2.Naïve Bayes.
1.Time stamp 2.Scikit Learning Technique
NA
Dataset required for training analysis is difficult to get.
Logistic regression effectively predicts weather the person clicked on ads or not
96%
[30] Malicious Click Detection in Web Advertising Data. (2018)
1.Machine Learning 2.Naïve Bayes 3.Decision table,
1.Random Forest 2.Classifiers 3.Filtering:
Rule-based
Classification- based
Clustering-based
Records: 35 million
1.Coverage of a blacklist is limited. 2.For the limitation of supervised method, false positive is inevitable.
Malicious clicks are detected by performing an in- depth analysis using framework
97.60%
-
The design containing components based on real-time as well as near-real-time are consistent
-
The rapid update capabilities are very consistent in detection of frauds.
-
High accuracy is obtained in this method.
-
It can efficiently unbalance data.
-
Encountered problems in data analysis.
-
Real dataset and deep neural network for analysis is difficult to find.
-
Logistic regression. 3.Naïve Bayes.
-
Decision Tree.
-
Uses machine learning techniques to give best precision.
-
As the data keeps on growing, the accuracy increases.
-
Highly efficient
-
Result Verification achieves high prediction using Filtering.
-
-
-
SUPPORT VECTOR MACHINE (SVM)
The analysis in [9] concludes that performance of classification schemes is improved when data is pre- processed by performing feature selection and resampling. Resampling is done using SMOTE and then performing classification using Adaboost with random forest as base classifier which in turn produced the most improved results. Weights, ranks and compared rank are assigned in [13] along with the threshold. The algorithm can be implemented in a system that involves the processing of multiple entities which are required to test and compare the results in multiple environments.
Python Programming language has been used in [16] where online rules and offline rules are implemented to check the IP address, the number of clicks on the IP address and movement of cursor before clicking on the ad. This data is used to determine the normal person behavior and fraudulent behavior. Probability-based model approach is implemented in [25] which is better than learning based probabilistic estimator model. It has distorted towards particular time indexed label that hampers the accuracy, it is possible to even predict a behavior on the seconds scale of time or even a smaller timescale which gives better results.
TABLE II: SUPPORT VENTOR MACHINE(SVM)
Sr No.
Title
Methodology And Tools
Techniques
Dataset
Merits
Demerits
Results
Accuracy
[9] A Click Fraud Detection Scheme based on Cost Sensitive BPNN and ABC in Mobile Advertising.
(2018)
1.Support Vector Machine(SVM) 2.Neural Network
3. Artificial Bee Colony
Classification
Buzz City Training: 5,862,839
Testing: 2,598,815
For Cost ratio 7 it does not give best results.
Click Fraud detection using BPNN and ABC gives good result as per this approach
NA
[13] User click fraud detection method based on Top rank k frequent pattern mining
(2019)
1.Support Vector Machine(SVM) 2.Top Rank k algorithm 3.Linear Regression
1.Click frequency 2.Time complexity (time
spent on ad, arrival time)
NA
Dynamic clicks cannot be dealt well when te algorithm of frequent pattern mining is used without the optimization of Top-Rank-k.
The outcome proves that the method used is quite efficient in
its correctness and checks patterns using graphs.
98%
[16] Advertisement Click- Through Rate Prediction Based on the
Weighted-ELM and
1.Support Vector Machine(SVM) 2.WELM-
Adaboost
1.Data preprocessing 2.Area Under Curve
NA
The value of AUC (Area Under Curve) of the algorithm
The accuracy in prediction is quite low because of the imbalance in the
WELM-
Adaboost algorithm is significantly
NA
-
CS-BPNN
-
Synthetic Minority Over-Sampling Technique (SMOTE)
-
Gives best result for almost all cost ratio.
-
Optimizes the feature selection and weights simultaneously.
-
It is highly accurate.
-
It is better than traditional frequent pattern mining.
Adaboost Algorithm (2017)
algorithm 3.Logistic regression 4.Prediction Model based on Deep Neural Network
proposed has a significant improvement when it is compared to the methods based on ELM and the Weighted -ELM.
advertising data being distributed
better than ELM and Weighted ELM method
[25] Click Stream Data Analysis
(2016)
1.Support Vector Machine(SVM) 2.MARS
3. Decision tree 4.Naive Bayes
1.WEKA:
Clustering
SVC classifier
Train: 3,173,834
Test: 2,598,815
Decision tree is consistent in fraud detection.
Can be used for the pre- processing when online processing is required.
59.38%
-
Access to user click information is limited.
-
Lower accuracy using more attributes.
-
-
TRAFFIC ANALYSIS
The algorithm in [5] helps to determine the frequently occurring patterns without using top-k optimization with real time clicks. The method extracts information about click process events, it also describes the frequency of click of the sample events. Using the weightage of click stream it is able to calculate evaluation score, the data is used to generate click stream density function. However, the selection of k value in this method is very important and is optimized in subsequent research. [11] refers to the Internet advertising service platform using defense system that have been established using the third party to detect the click fraud which uses random forest classification algorithm producing highly accurate results of positive class 93% and negative class 91%.
The proposed fraud detection in [14] uses CS-BPNN classification and SMOTE. BPNN is implemented in the variable click-fraud detection algorithm environment which avoids the local optimization of neural network and feature redundancy using Artificial Bee Colony. Data mining, honeypot, traffic analysis is used in [23] to verify whether the advertisement is normal, so that if any ad is being detected then honeypot sticks to it and exposes the bot leading to blocking of its IP. In [29] Content match advertising, social engineering attacks, Phishing and Blacklist evasion bidding style in which Bing search process is used which consists of policies. If any of the policy have not been satisfied by the ad then that ad will be detected as fraudulent.
TABLE III: TRAFFIC ANALYSIS
Sr No.
Title
Methodology And Tools
Techniques
Dataset
Merits
Demerits
Results
Accuracy
[5] Click Fraud Detection using Traffic Analysis (2019)
Traffic Analysis
1.Traffic matrix construction 2.Traffic partitioning
3. Pooling
Records:217, 334,190
It helps to separate out clicks made by malware that are present in legitimate click streams
Can be defeated by reducing the network loads.
Organic click fraud attacks are detected by efficiently searching for patterns which are repetitive in nature coming from clickstreams of an ad network.
NA
[11] Behavioral Verification: Preventing Report Fraud in Decentralized Advert Distribution System.
(2017)
1. Traffic Analysis 2.Ad-Report method. 3.Python
Language
1.Cost-Per-Click 2.Cost-Per- Impression
3.Cost-Per-Action.
NA
Filtering Ad- Reports on basis of honest and dishonest users.
NA
[14] Click fraud monitoring based on advertising traffic.
(2017)
Traffic Analysis
1.Traffic Module 2.Analysis module
3.Fraud module 4.Calculation Module
NA
It Includes techniques for analyzing multiple aspects of advertising traffic.
Client identification may be inefficient due to single aspect of advertisement traffic
The technology is directed towards analyzing aspects of advertising traffic and monitoring click fraud.
NA
[23] A survey on Online Advertising and Click Fraud detection. (2020)
1.Traffic analysis 2.k-Nearest Neighbours.
1.Data mining 2.Machine learning 3.Honeypot
Records: 200 million
1.High performance 2. Reliability 3.Availability over traditional Models.
The bluff ads may not be clicked if they are irrelevant.
Machine learning and deep learning provide accurate solution to the problem of click fraud.
98%
-
Multiple reports generated in short time.
-
It gets better result because it checks target users.
-
User privacy decreases
-
Digital signature were not completely detected.
[29] Search Advertiser Fraud
(2017)
1.Traffic Analysis 2.Bings search engine platform.
Anomaly detection strategies
Real Time Data (Input Data for Search Engine)
1. Accepts manual reporting. 2.Payment fraud detection is high.
Detection of the
Web crawler can be avoided by attackers using Cloaking on advertised web page.
Bings policies have successfully contained fraud that may be costly.
NA
-
-
JAVASCRIPT
Light-GBM (Light Gradient Boosting Machine) algorithm is implemented in [2]. When there is a data which contains multiple attributes,feature parallelism can be implemented concurrently. When a large amount of data needs to be processed concurrently the data parallelism is brought into use. Voting parallelism focuses on data in which attribute has multiple features and votes for decision making. The prime advantage is the high performance along with the reliability and availability over traditional logistic regression models. Feature engineering of the data along with appropriate selection allows to improve the detection performance. In [4] the methodologies used are Logistic Regression, Support Vector, Machine Random Forest and Multinomial Naïve bayes, where 100000 random inputs were given and the discriminator classified
63000 as valid clicks and remaining as invalid clicks and it also provided good accuracy in determining the valid and invalid click.
SDK library (real ad network) click generation attacks are detected
in [15]. Potentil risk of automated occurrence of an online fraud attacks in mobile advertising is evaluated and it also showed that 75% of the network was highly vulnerable to click fraud attacks. Pixel Clustering and methods like cost-per-click, cost-per-impression and cost-per-action are implemented in [15] where ads were filtered on the basis of user clicks and fraudulent users (whether Botnets or click frame) for great amount of data in minimal amount of time.
TABLE IV: JAVASCRIPT
Sr No.
Title
Methodology And Tools
Techniques
Dataset
Merits
Demerits
Results
Accuracy
[2] Fighting Click-Fraud from the User Side. (2016)
1.Fight Click-Fraud (FCFraud) 2.JavaScript
Records:165, 426
1.Inability to detect complex JavaScript.
FCFraud acts as addition to the techniques of server-based detection
85.6%
[4] Click Fraud Detection on the Advertiser Side. (2016)
1.JavaScript 2.Proactive functionality testing. 3.Passive browsing behavior examination.
Records: 9900
1.It is not easy for the clickbots to pass the functionality.
2 Functionality of client can be properly tested.
Superior clickbots can spoof the information of application layer.
Effective in identifying while incurring negligible overhead.
99.1%
[15] Method for performing real-time click fraud detection, prevention and reporting for online advertising
(2016)
1.Client-side Scripting code 2.Server-side tracking code
1. HTTP Requests(GET Function) 2)JavaScript support and mouse event test.
3.Browser functionality test.
NA
If the rules set are not satisfied, the user will be redirected to a non-paying advertisement.
It is prone to additional risks such as parasite ware.
Helps in Identification of clicks that are valid and invalid to prevent from being billed for fraudulent activity.
NA
[20] Pixel Cluster Transit Monitoring for detecting Click Fraud. (2016)
1.Pixel Clustering. 2.JavaScript
1.Cluster Analysis 2.Applet 3.Javascript.
NA
1.Real time analysis is done using JavaScript on Webpage.
JavaScript may crash during execution.
Uses applet or java script to keep track of transit of the cursor to detect false clicks
NA
-
HTTP Requests(GET Function)
-
JavaScript support and mouse event test. 3.Blacklist.
-
It detects the clickbots in a busy time period.
-
It works on different displays
-
JavaScript support and mouse event test.
-
Browser functionality test. 3)Browsing behavior examination
-
-
MISCELLANEOUS
-
In [1], Machine learning, Heuristic search and Naïve-Bayes method were used for fraud detection on very high dataset using
script on advertisers website is included which collects the important information about used behavior.
Java Script validation is performed in [2] to check click on
Logged Event Based, Client Base, Heuristic search, Click IPs andHTTP request to detect the malware whether it executes the attack Explored Conversion Sparsity which achieves very high precisionin a time-period when there is a high traffic(busy period). on unbalanced dataset. In [6] Data Analysis Algorithm such as k-FCFraud also works in case a user has a touchscreen monitor. Nearest neighbor for recognizing click fraud are used. Special javaAdding FCFraud to the OS can serve the online advertising.
Attention Mechanism, Stacked Autoencoder and Sparse Data
Prediction is implemented in [18] which has improved the click through rate and most importantly it automatically learns from the data with any human domain knowledge.
In [19] techniques like Logistics regression, Naïve Bayes, Support vector machine and decision tree have been used. Frequency of time-stamp of user on website is checked and the time user spend on the website is analyzed. Traffic partitioning, pooling and isolating click spam are used in
-
for traffic analysis. The network clickstream is analyzed thoroughly for recurring patterns that may show up due to anomalous activities thus increasing the efficiency .
Automated click fraud based on clickable captchas is used in [26] which requires simple operation and less storage space. Click fraud is identified based on valid users. In [27] an android application has been designed which provides high true positive rates and low false negative rates, it accesses the phone application to check for frauds on the web ads. Exploratory data analysis, data pre-processing and data prediction is performed in [28] for click fraud detection, the proposed model has been developed to detect and minimize the malwares.
TABLE V: MISCELLANEOUS
Sr No.
Title
Methodology And Tools
Techniques
Dataset
Merits
Demerits
Results
Accuracy
[1] Light GBM Machine Learning Algorithm to Online Click Fraud Detection.
(2019)
LightGBM (Light Gradient Boosting Machine) algorithm.
Kaggle (Records:200 million).
Models
Insufficient resources for training.
Improves the detection performance step by step.
98%
[6] Detection of Advertisement Click Fraud Using Machine Learning
(2020)
XGBoost algorithm
NA
The percentage of advertisement click fraud is found significant compared to regression model.
Cost of prediction utility for classification in is very limited.
Detect and minimize the malwares that monetizes using click fraud.
NA
[12] Crowdsourcing for Click Fraud Detection (2019)
Android Application.
CFCA
Records: 500,000
Performs additional step to merge its library.
Click Fraud Crowd- sourcing gives good result.
93%
[18] A Multi-time-scale Time Series Analysis for Click Fraud Forecasting using Binary Labeled Imbalanced Dataset (2019)
1.Learning Based Probabilistic Estimator 2.Probabilistic Based (PB)
Modelling
3.AR (Auto Regressive modelling)
1.Data Preprocessing 2.Data Smoothing
Kaggle Record: 184,903,890
This approach allows behavior to be forecasted in seconds and time scales .
It is difficult
to detect Botnet which can click an ad by impersonating itself as a genuine user.
Probability model is found to be good than the learning based probabilistic estimator model
96%
[19] Data Analysis Algorithm for Click Fraud Recognition. (2018)
1.k-Nearest Neighbours. 2.Differential Evolution.
1.Clustering 2.Classification 3.Data collection
Records: 30,000
1.The operator can evaluate succeeding clicks by his own. 2.It can be used for various sizes of data.
Real algorithms are hard to get which deals with user classifications of organic versus non organic.
k-NN
classifier application for clustering algorithm is very good.
97.11%
[24] A survey on online click fraud execution and analysis.
(2018)
CAPTCHA
Generation and Validation
NA
Click fraud is identified based on valid users using CAPTCHA
validation.
64.07%.
[26] Click Fraud Detection (2019)
SMOTE
ADASYN
NCL
1.AdaBoost 2.Bagging 3.LogitBoost 4.Rotation Forest
NA
Precision is high for large dataset compared to regression model.
Re-sampling resulted in reduction in performance measures.
Classification improved when data was pre- processed by performing feature selection.
92.60%
[27] Combating online fraud attacks in mobile-based advertising (2016)
1.Honey advertisement. 2.Detecting anomalous behaviours.
NA
Analyzed potential risks of services of mobile advertisement.
Current results are not sufficient because there are very less
Evaluation of potential risk in mobile advertising related to
75 %
-
Feature parallelism.
-
Data parallelism 3.Voting parallel
-
High performance
-
Reliability 3.Availability over traditional
-
Exploratory data analysis
-
Data pre- processing 3.Data prediction
-
Click Fraud Crowd- sourcing
-
Android Studio
-
Provides high true positive rate.
-
It has low false negative rate.
-
Group blooms filter algorithm.
-
Time blooms filter algorithm.
-
Require simpler operations.
-
Verifies Input from user which is difficult for a bot to generate.
-
Loading captchas require time and space.
-
Critical issue is to identify copy clicks.
-
DATA MINING:
-
ROC Plots
-
Implementation of android app
-
SDK library
-
METHODOLOGY
Data collection: Dataset is collected from Kaggle named TalkingData AdTracking Fraud Detection Challenge which contains attributes such as click_time, ip, app, device, OS, channel, attributed_type and is_attribute but we are focused on ip address and click_time to detect the fraud.
Data Preprocessing: Collected dataset from Kaggle needs to be preprocessed in order to eliminate the null and unwanted values by using WEKA/python tool. WEKA has application such as Preprocess, Classify, Cluster, Select Attribute and Visualize.
The system needs to implement the rules which are called as Offline Rules. Offline rules will appear such as Blacklist- Rule and Time-Period Rule, where Blacklist-Rule will maintain the Blacklist IPs which occur frequently. When the user visits the website its IP address will be matched with the Blacklist IPs to check if it a fraudulent IP. Time-Period Rule will detect the fraud based on the time that the normal person and fraud agent/botnet will utilize to click on the ad.
The model uses Logistic regression method. This method basically implements the Sigmoid Activation Function where the decisions can be made as following:
Fig1.: Working of Sigmoid Function
The result 1 denotes that the resulting click is genuine. The result 0 denotes that the resulting click is malicious.
Sigmoid Activation Function determine the independent variable based on the dependent variable in the form of 0 and 1.
A system architecture is proposed as shown below:
number of ad networks that are tested.
automated online fraud attacks.
[28] Click-through rate (CTR) prediction (2018)
1.AUC (area under ROC) 2.Logloss (cross entropy)
Training: 149,639,105
Test: 20,257,594
The pre-training stage introduces an overhead which reduces efficiency.
Model mines the relationship between features, which improves the CTR
79.81%
-
Sparse data prediction Methods
-
ASAE Model
-
Pre-training is not required.
-
High- and low- order feature interactions are learnt.
number of ad networks that are tested.
automated online fraud attacks.
[28] Click-through rate (CTR) prediction (2018)
1.AUC (area under ROC) 2.Logloss (cross entropy)
Training: 149,639,105
Test: 20,257,594
The pre-training stage introduces an overhead which reduces efficiency.
Model mines the relationship between features, which improves the CTR
79.81%
-
Sparse data prediction Methods
-
ASAE Model
-
Pre-training is not required.
-
High- and low- order feature interactions are learnt.
Fig.2: Proposed Model System
A request for data from server/google server/system is made from the database side. Only required data (Valid IP address and time of click) will be requested into database so that the memory consumption is efficient. The data will be further send for offline analysis which will check for rules such Blacklist rule and time period rule. Further the results will be validated and send to database. From database transaction of data will be analyzed by server for detection of click fraud activities.
-
-
CONCLUSION
-
The paper provides a brief overview and comparison between the different techniques used in advertisement click fraud detection. The traditional techniques used earlier for click detection and click validation are used for the purpose. Every technique mentioned has its own merits and demerits. A few common approaches like Logistic Regression, Traffic Analysis and JavaScript, Support Vector Machine are summarized and represented in the form of tables. The implementation of the traditional methods is cumbersome. The traditional methods are less accurate.
It has been observed that techniques under the JavaScript enabled system have an average accuracy of 90%. In system approach the advertisers can independently detect click fraud activities performed by clickbots and human clickers.
From the above tables most of the time logistic regression and offline rules based on user interface system gives a good accuracy in detecting advertisement click fraud. The Regression model gives higher accuracy for higher number of attributes and the offline rule(Blacklist Rule) helps in
minimizing overhead processes of similar data. The user interface system deals with the fetching of advertisement and usually also monitors the activity of a user with respect to the input functions mentioned. It also tracks the time spent by the users before clicking an advertisement to which is the most important validation property to verify if the click can be a genuine or a fraudulent.
REFERENCES
[1]. Elena-Adriana and Gabriela, Light GBM Machine Learning Algorithm to Online Click Fraud Detection, researchgate.net, 2019. [2]. Md Shahrear Iqbal, Mohammad Zulkernine, Fehmi Jaafar,Yuan Gu, FCFraud: Fighting Click-Fraud from the User Side, IEEE, 2016. [3]. Mukesh Patel School of Technology, Identification of Click Fraud and Review of Existing Detection Algorithms, IEEE, 2019. [4]. Haitao Xu, Daiping Liu, Aaron Koehl, Click Fraud Detection on the Advertiser Side, eecs.northwestern.edu, 2016. [5]. Shishir Nagaraja, Ryan Shah,University of Strathclyde, Clicktk: Click Fraud Detection using Traffic Analysis, WiSec '19: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks, 2019. [6]. B. Viruthika, Suman Sangeeta Das, E Manish Kumar, D Prabhu, Detection of Advertisement Click Fraud Using Machine Learning, International Journal of Advanced Science and Technology, 2020. [7]. Apoorva Srivastava, San Jose State University, REAL-TIME AD CLICK FRAUD DETECTION, scholarworks.sjsu.edu, 2020. [8]. G. S. Thejas, Kianoosh G. Boroojeni, Kshitij Chandna, Isha Bhatia,S. S. Lyenger, N. R. Sunitha, Deep Learning-based Model to Fight Against Ad Click Fraud, Florida International University, ACM SE '20: Proceedings of the 2020 ACM Southeast Conference, 2020.
[9]. Xin Zhang, A Click Fraud Detection Scheme based on Cost sensitive BPNN and ABC in Mobile Advertising, Xuejun Liu,Han Guo School of Computer Science and Technology Nanjing Tech University Nanjing, China, IEEE, 2018. [10]. Paulo S. Almeida, João J. C. Gondim, Click Fraud Detection and Prevention System for Ad Networks, researchgate.net, 2018. [11]. Qi Wang, Linzhi Li, Yadong Xu, LiLi Yang, Advertisement Click- through Rate Prediction, cse.scu.edu, 2017. [12]. Riwa Mouawi , Imad H. Elhajj, Ali Chehab and Ayman Kayssi, Crowdsourcing for click fraud detection, researchgate.net, 2019. [13]. Lijiao Pan, User click fraud detection method based on Top-Rank-k frequent pattern mining, Shibiao Mu and Yingyan Wang Yiwu Industrial & Commercial College, worldscientific.com, 2019. [14]. Kam Kouladjie, Seattle, CLICK FRAUD MONITORING BASED ON ADVERTISING TRAFFIC, US PATENT, 2017. [15]. Method For Performing Real-Time Click Fraud Detection, Prevention And Reporting For Online Advertising,VALIDCLICK, INC., Conway, US PATENT, 2016.
[16]. Sen Zhang,Qiang Fu,Wendong Xiao, Advertisement Click-Through Rate Prediction Based on the Weighted-ELM and Adaboost Algorithm, hindawi.com/journals/sp/2017/2938369, 2017. [17]. Jing Ying Zhang, Click Fraud Detection: Adversarial Pattern Recognition over 5 Years at Microsoft ,Brendan Kitts, ouyangchen.com, 2017. [18]. Thejas G. S., Jayesh Soni , Kianoosh G. Boroojeni, A Multi-time- scale Time Series Analysis for Click Fraud Forecasting using Binary Labeled Imbalanced Dataset, people.cis.fiu.edu, 2019. [19]. Marcin Gabryel, Data Analysis Algorithm for Click Fraud Recognition, Institute of Computational Intelligence, Czestochowa University of Technology, researchgate.net, 2018. [20]. Patrick O'Sullivan, Ballsbridge (IE), Pixel cluster transit monitoring for detecting click fraud, US PATENT, 2016. [21]. Zhi Li, Weichen Jia,School of Media and Law, The Study on Preventing Click Fraud in Internet Advertising, Zhejiang University Ningbo Institute of Technology, Ningbo, China, csroc.org.tw/journal/JOC31-3/JOC3103-20, 2020. [22]. S. Saraswathi, Vallidevi Krishnamurthy, Machine Learning Based Ad-click Prediction System, .researchgate.net, 2019. [23]. Nayanaba Gohil, Arvind D Meniya, A Survey on Online Advertising and Click fraud detection, Nayanaba Gohil Departmentof Information Technology Shantilal Shah Engineering College Bhavnagar, Gujarat, India, .researchgate.net, 2020.
[24]. Dr.R.Kayalvizhi,Kapil Khattar,Piya Mishra, A Survey on Online Click Fraud Execution and Analysis, ripublication.com/ijaer18/ijaerv13n18_59, 2018. [25]. Ladislav Beránek, Václav Nýdl, Radim Remes, Click Stream Data Analysis for Online Fraud Detection in E-Commerce, semanticscholar.org, 2016. [26]. Kaveri Kar, A Study on Machine Learning Approach using Ensemble Learners for Click Fraud Detection in Online Advertisement, gujaratresearchsociety.in/index.php /JGRS/article, 2019. [27]. Geumhwan , Junsung Cho, Youngbae Song, Combating online fraud attacks in mobile-based advertising, jis- eurasipjournals.springeropen.com, 2016. [28]. Qianqian Wang,Fangai Liu,Shuning Xing,Xiaohui Zhao, Click- through rate (CTR) prediction, hindawi.com/journals, 2018. [29]. Joe DeBlasio, Saikat Guha, Geoffrey M. Voelker, Alex C. Snoeren, Search Advertiser Fraud, 2017. [30]. Leyi Song , Xueqing Gong , Xiaofeng He , Rong Zhang , Aoying Zhou, Malicious Click Detection in Web Advertising Data, East China Normal University 3663 North Zhongshan Road, Shanghai, China, citeseerx.ist.psu.edu, 2018.