The User-Friendly Classification of Social Behaviour Data with Group Identification

DOI : 10.17577/IJERTV2IS100678

Download Full-Text PDF Cite this Publication

Text Only Version

The User-Friendly Classification of Social Behaviour Data with Group Identification

M.Deepthi Chaitanya (M.Tech) S.Archana -M.Tech

Senior Associate Professor,

Aurora's Scientific, Aurora's Scientific,

Technological & Research Academy. Technological & Research Academy.

Department of Computer Science Department of Computer Science and Engineering Engineering

ABSTRACT: In the instantaneous technological world social networks is racing top. There is a rapid growth in the social network with hundreds and thousands of users. The king sized data growing by the social media is to identified the collective behavior of the user .The social media network with huge number of users ,extracted huge social dimension cannot even be held in the memory, which is having large computational problem. This paper gives a new approach with edge-clustering algorithm, to obtain distributed social dimensions. We designed to identify the collective behavior of the users in the social media. The grouping of similar users into a single cluster in the network is done by understanding the individual user behavior over the network. Social Network is a group of different type of cluster. Groups are designed based on the user behavior in the network. The scalability of the millions of users in the real world network will progress high. Irrelevant data of the users group can come down in the social network. The relevant information is been sent to the appropriate groups of users. The computation issue, resource wastage is avoided to an extent, with which the processor speed will increased. Classification of network data, with different million of users can be handled affectively. The graphical view of the users is been generated accurately.

KEYWORDS: classification with network data, edge-clustering algorithm, group identification, collective behavior, social dimension.

  1. INTRODUCTION & MOTIVATION

    In the instantaneous technological world social networks is racing top. There is a rapid growth in the social network with hundreds and thousands of users. The king sized data growing by the social media is to identified the collective behavior of the user .The social media network with huge number of users ,extracted huge social dimension cannot even be held in the memory, which is having large computational problem. This paper gives a new approach with edge-clustering algorithm, to obtain distributed social dimensions. We designed to identify the collective behavior of the users in the social media. The grouping of similar users into a single cluster in the network is done by understanding the individual user behavior over the network. Social Network is a group of different type of cluster. Groups are designed based on the user behavior in the network. The scalability of the millions of users in the real world network will progress high. Irrelevant data of the users group can come down in the social network. The relevant information is been sent to the appropriate groups of users. The computation issue, resource wastage is avoided to

    a extent, with which the processor speed will increased. Classification of network data, with different million of users can be handled affectively. The graphical view of the users is been generated accurately.

  2. BACKGROUND & RELATED WORK

    Supporting work activities

    1. Collective Behavior through Social dimension Extraction

      The Socio Dim framework demonstrates promising results toward predicting collective behavior. However, many challenges require further works. For example, social media networks are constantly evolving, with new users joining the social media networks and new contacts establishing between existing users each day. This dynamic nature of networks totally efficient collective behavior prediction. [1]

      Connections in a social network shows n- number of relations i.e., a social learning based on the so called social dimensions that are being introduced

      This framework suggests extracting social dimensions that represent the latent affiliations associated with users, and then applying supervised learning to determine which dimensions are informative for predicting the behavior. The above mentioned framework has n number of advantages, especially for those who use large scale networks, paving the way for the study of collective behavior in many real-world applications. [1] Social media such as MySpace, Twitter, Blog Catalog, YouTube and Flickr, making user connect with each other anytime and anywhere. The prolific and expanded use of social media has turned online interactions into a vital part of human experience. The large population actively involved in social media also provides great opportunities for business. Concomitant with the opportunities indicated by the rocketing online traffic in social media are the challenges for user/customer profiling, accurate user matching at many individual domains, recommendation as well as effective advertising and marketing. Social networking advertising are the best examples. There are many tasks in currently advertising in social media is been encountered. A recent study made a clear point from the research IDC suggested that just 57% of all users of social net- works clicked on an ad in the last year, and only 11% of those clicks lead to a

      buy new things". [2] Focusing on social networking sites can only collect very limited user the profile information are caused due to privacy issue or with the sharing of information on the network. The connections social network advertizing problem can be generalized to the study of collective behavior. Here, behavior can include a broad range of users join group, connect to people

      , following ads on the screen , becoming interested in certain topics, chatting with people of certain type, etc. Collective behavior refers to behaviors of individuals who are exposed in a social network environment. [1] Collective behavior is not simply the aggregation of individuals' behavior.. Independent behavior is been exhibited by the users in there connected environments. So that is clear that the, one's behavior can be influenced by the behavior of his/her friends. This leads to behavior correlation between users those are connected. Thus the correlation of collective behavior can also be explained by homophily.[7]This is even observed in the online environment also. In other words, similar users tend to become friends, leading to similar behavior between connections in a social network. Take games as an example. If our friends are playing so much too any game definitely we will also, got attracted to that.

    2. Social Dimensions

      Actors DRS SPMVV AURORA

      priya 1 o 1

      user 1 1 1 0

      … … … . . .

      Differentiating pair wise relations based on network connectivity alone is by no means a task which can be easily completed. Here the social dimensions of the actors are also looked upon alternatively. Social dimensions are introduced to represent the relations associated with users, with each dimension denoting one relation. Suppose two users ui and uj are connected because of relationship R, both ui and uj should have a non- zero entry in the social dimension which represents

      R. Let us revisit the example in. The relations between the user and his friends can be characterized by three affiliations: DRS, SPMVV University, and Aurora

      The corresponding social dimensions of actors in Figure 1 are displayed in Table 1. In the above mentioned table, if one single actor belongs to one single affiliation, then the entry whch is corresponding to it is a non-zero. Since priya is a student DRS, his social dimension includes a non- zero entry

      For the DRS dimension to capture the relationship of his DRS friends and him. Social dimensions finds interaction patterns presented in a network. Note that one users is very likely to be involved in multiple different social dimensions (e.g., priya is participates in 3 different relations in above mentioned table). This can only be consistent with a kind of nature which is multi-facet in human social life that one is likely to be involved in distinctive relations with different people.

  3. Heterogeneous Relations in Social Networks

    Collective inference is adopted in machine learning community to make predictions about collective behavior. Collective inference is required towards an equilibrium status such that the inconsistency between connected actors is minimized. The heterogeneity presented in network connectivitys can hinder the success of collective inference. Users can connect to their family, colleagues, college classmates, or some buddies met online. Directly applying collective inference to this kind of networks does not differentiate these connections, thus becoming risky for prediction of collective-behavior.[1]

  4. SocioDim Framework

    The social dimensions shown in Table 1 are constructed based on the explicit information of relations. The friends of priya at DRS tend to interact with everyone and other as well. Thus implementing a latent social dimension. This boils down to a classical community identification problem. A requirement is that one user is allowed to be assigned to multiple users community. After we extract the social dimensions (social), we treat them as features that are normal and combine them with the behavioral information to conduct supervised learning. Different task might represent the user behavior in different ways. [1]In certain cases, we can represent the behavior output. For instance, whether a user joins a group, whether he likes a game, whether he belongs to that branch in college. In some other cases, it might be true that the behavior output is represented more properly using continuous numbers, like the probability that a user clicks on an ad and the frequency that a user visits a group that interests him. Depending on the representation of the behavior (discrete or continuous values), a classifier or a regression learner can be used. This supervised learning is critical as it will determine which dimensions are relevant

    to the target behavior and assign proper weights to different social dimensions. In

    Summary, a social-dimension based learning framework SocioDim can be applied to handle the network heterogeneity. [1]

    It consists of two steps, with each addressing one challenge sketched in the previous section:

    _ Extract meaningful social dimensions based on network connectivity via community detection.

    _ Determine relevant social dimensions through supervised learning.

    Prediction is straightforward once a learned model is ready, since the social dimensions have been calculated for all actors. Applying the constructed model to the social dimensions of the actors without behavior information, we obtain the behavior predictions. This Socio Dim framework basically assumes the affiliation membership of actors determines an individuals behavior. This can be understood more clearly in an example. [2] Social media such as blogs, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among instances, Relational learning has been advised, and collective inference depending on network connectivity is adopted. However, connections in social media are often multi-dimensional. An actor can connect to another actor for many reasons, e.g., alumni of an institution, colleagues, who are living in the same city, sharing similar interests, etc.

    Collective inference normally does not differentiate these type of connections.

    Social dimensions based on network information, and then utilize them as features for discriminative learning. These social dimensions describe diverse affiliations of actors hidden in the network, and the discriminative learning can automatically determine which affiliations are better aligned with the class labels. This kind of scheme are preferred when multiple diverse relations are associated with the same kind of network. We perform rigorous experiments on social media data (one from a real- world blog site and the other from a popular content sharing website).The proposed model outperforms representative relational learning methods based on collective inference, especially when few labelled data are available. Existing methods are also examined. [2] Concomitant with the opportunities indicated by the rocketing online traffic in social media1 are the challenges for profiling of user/customer, accurate search for users, recommendation as well as effective advertising and marketing. Another problem is social network advertisement. Present generation, advertising in social media has encountered many challenges. A recent study of the research firm IDC suggested that just 57% of all users of social networks clicked on an ad in the last year, and only 11% of those clicks were successful. Social networking sites can only collect limited user profile information, because of the privacy issue or because the user declines to share the true information. Each dimension can be considered at

  5. Social dimension

    There are many concerns regarding scalability of

    SocDim with modularity maximization:

    • The social dimensions extracted according to modularity maximization are dense. Suppose there are 1 million actors in a network and 1, 000 dimensions1 are extracted. Suppose standard double precision numbers are being used for holding the full matrix alone requires

    1M × 1K × 8 = 8G memory. This large-size dense matrix poses thorny challenges for the extraction of social dimensions as well as the subsequent discriminative learning. [4]

    The modularity maximization requires the computation of the top eigenvectors of a modularity matrix which is of size n × n where n is the number of actors in a network.

    In this research, we suggest, extract-latent.

    the description of a likely affiliation between social users. Social dimensions, the power of discriminative learning such as SVM or logistic regression to automatically select the relevant social dimensions for classification.

    In the prediction phase, different from existing relational learning methods, collective inference becomes unnecessary as the selected social dimensions have already included the relevant network connectivity information. This proposed framework is flexible and allows for the combination of other features such as user profiles or social content information [3]. The study of collective behavior is to understand how individuals behave in a social network environments. Social media generates large data ever day like Twitter, Flickr and YouTube present opportunities and challenges to studying collective behavior in a large scale.

    With the big size of the networking world, involving hundreds of thousands or even millions of users. The scale of networks entails scalable learning of models for predictions based on collective behavior. To talk about issues related to scalability, we suggest an edge-centric clustering scheme to extract sparse social dimensions. With the help of sparse social dimensions, the approach based on social dimension can efficiently handle networks of millions of actors while demonstrating comparable prediction performance as other

    Non-scalable,methods.

    Networks in social media tend to evolve, with there are many new users joins and many new connections are been established. This dynamic nature of networks entails efficient update of the model for collective behavior prediction. Efficient online up date ofeigenvectors with expanding matrices remains a challenge.

    Consequently, it is imperative to develop scalable methods that can handle large-scale networks efficiently without extensive memory requirement. In the next section, we elucidate an edge centric clustering scheme to extract sparse social dimensions. With the scheme, we can update the social dimensions efficiently when new nodes or new edges arrive in a network. [3]

  6. FINDINGS

  1. ALGORITHAMS THAT ARE MAINLY USED IN THIS WORK:

    1. Clustering edge instance (edge partition):

    Partition of the edges into disjoint sets, the edges as data instance with their nodes. The clustering algorithm like k-means clustering can be applied for the process of finding disjoint partitions.

    This sparsity can help accelerate the clustering process if worked upon wisely. We un approve that the centroids of k-means should also be feature- sparse.

    In order to efficiently identify instances relevant to one centroid, we build a mapping from features (nodes) to instances (edges) beforehand. Once when the mapping is done , we can easily identify the relevant instances by checking the non-zero features of the centroid . Similar to k-means, this algorithm also increases, within similarity of clusters.

    k-means clustering algorithm

    Algorithmic steps for k-means clustering

    Let X = {x1,x2,x3,..,xn} be the set of data points and V = {v1,v2,.,vc} be the set of centers.

    1. Randomly select c cluster centers.

      1. Calculate the distance between each cluster

      2. centers.

    1. Assign the data point to the cluster center whose distance from the cluster center is minimum of all the cluster centers..

    2. Recalculate the new cluster center using:

      where, ci represents the number of data points in

      ith cluster.

    3. Recalculate the distance between each data point and new obtained cluster centers.

    4. If no data point was reassigned then stop, otherwise repeat from step 3).

    As a simple k-means is adopted to extract dimensions that are social, it is easy to update dimensions that are social if a given network changes. If some new member is joining the

    k-means is one of the simplest unsupervised learning algorniethtwmosrkthaatndsoalvenetwhe cwoenlnl ekcntioown nccolmusetseriunpg , we problem. The procedure follows a simple and easy way taosscilganssifythae givneenwdlyata seemt etrhgreodugheadgceertaitno the number of clusters (assume k clusters) fixed apriori. The mcaoirnresidpeoandiisngto dcelfuinsteerks.cenTtheers, ounpedaftoironeacohf the cluster. These centers should be placed in a cunning cwenatyroidbsecwauitshe thoef dairfrfievraelntoflonceawtioncocnanuescetsions is different result. So, the better choice is to place them as amlsuochexaasctpo. sTsihbilse kf-amr eaawnasysfcrhoemmeeaicsh oatphpelri.ed for

    The next step is to take each point belonging to a given

    data set and associate it to the nearest center. When no point is pending, the

    dynamic large- scale networks.

  2. LEARNING COLLECTIVEBEHAVIOUR

    first step is completed and an early group age is done. At this point we need to re-calculate k new

    centroids as barycenter of

    the clusters resulting from the previous step. After we have these k new centroids, a new binding has to be done between the same data set points and the nearest new center. A loop has been generated. As a result of this loop we may notice that the k centers change their location step by step until no more changes are done or in other words centers

    do not move any more. Finally, this algorithm aims at minimizing an objective function know as squared error function given by:

    where,

    ||xi – vj|| is the Euclidean distance between xi and vj. ci is the number of data points in ith cluster.

    c is the number of cluster centers.

    Input: network dataset, labels of some nodes in the network data, consider some number of social dimensions;

    Output: labels of unlabeled nodes in the network dataset.

    1. convert network datasets into edge-centric view.

    2. perform k-means with edge- clustering.

    3. construct social dimensions based on edge partitioning . A node thus constructed based on edge partioning belongs to one community as long as any of its neighbouring edges is in that community.

    4. apply regularization pattern to social dimensions.

    5. construct classier based on social dimensions of labelled nodes.

    6. use the classifier to predict labels of unlabeled ones based on their social dimensions.

    7. HOMOPHILY: Some characteristics are more bounding than others that is, users are more likely to seek someone like themselves on that dimension. For example, bankers might want to find bankers other more so than people with same brank want to find other people with same branch We would say that smoking is more strongly bounding than eye color because people with a given smoking status are less likely to cross the boundary to choose someone with a different smoking status than someone with brown eyes would be to choose a partner with blue eyes.

      To determine the bounding strength of categorical and bucketed descriptors in the data set, we compared the percentage of contacts between two users who shared the same value for a characteristic

      .with the percentage of contacts we would expect to share the value if one male user and one female user from the active user population were paired randomly.

      1. Social networks usually involve collections of objects that are jointly linked into huge relational networks. Analysis on social networks is important due to the growing availability of data on novel social networks, e.g. citation networks, Web 2.0 social networks as in face book, and the hyperlinked internet. Recently, the infinite hidden relational model (IHRM) has been developed for the analysis of complex relational domains. The IHRM extends the expressiveness of a relational Social networks usually consist of rich collections of objects, which are linked into complex relational networks. Statistical relational learning (SRL) is an emerging area of machine learning research which attempts to combine expressive knowledge representation formalisms with statistical approaches to perform probabilistic inference and learning on relational networks. SRL provides effective tools for social network modeling and analysis, such as community discovery and product recommendation. Social networks are graphically represented as a sociogram. In this simple relational network, a common task is to make predictions on unknown relationships (friend-ship) based on known relationships and person profiles (e.g., gender). We can use probabilistic approaches to model the relational network such that the quantities of interest can be inferred with statistical techniques. For each potential edge, a random variable (RV) is introduced that describes the state of the edge.[6]

      2. A multi-mode network typically consists of multiple heterogeneous social actors among which various types of communications could occur. Finding the communities in a multi-mode network can help understand the structural properties of the social network, address the problems that are unbalanced and data shortage , and assist tasks like targeted marketing and finding influential actors within or in between groups. In more simpler

        model by introducing for each object an infinite- dimensional hidden variable as part of a Dirichlet process mixture model. In this paper we discuss how the IHRM can be used to model and analyze social networks. In such an IHRM-based social network model, each edge is as sociated with a random variable (RV) and the probabilistic dependencies between these RVs are specified by the model based on the relational structure. The variables that are hidden, one for every single object, are able to transport inormation such that on-local probabilistic dependencies can be obtained. The IHRM provides ejective relationship prediction and cluster analysis for social networks. The experimental analysis is performed on two social network applications. The first application is an analysis of the cooperative effect in a recommendation framework where both user properties and item properties are taken into consideration . The results of the experiment demonstrate that the IHRM provides good prediction accuracy for user preference on movies and gives interpretable clusters of users and items. In the second experiment we apply the IHRM to Sampson's monastery data, and obtain a grouping of the actors that agrees with results from previous publications.

        a new mean field approximation to inference in the IHRM.[6]

        terms, a network and the membership of groups often keep changing gradually. In a multi-mode network, both the actor membership and interactions can keep evolving constantly , which has a challenging problem of that of identifying community evolution. In this research, we address this issue by employing the temporal information to analyze a multi-mode network. A framework that is spectral and also whoes scalability issue are studied carefully. Experiments which were conducted on both synthetic data and real-world large scale networks demonstrate the efficacy of our algorithm and suggest its generality in solving problems with complex relationships.

    8. Proposed Approach: In a social networking environment, the behavior of the similar users are been identified, and the groups are been done automatically, so that the information interchange will be very affectively done. Graphical view is provided to the users so that they can easy understand the people they are connected. The admin will have a clear view how the users in the network are been connected and how they are involved with each other so that the affective recommendations are been given The edge view is untainted.

    9. End section

      In this paper we are creating the groups based on the users in the particular network. For avoiding the forward of information to other group related users. Here we are creating group directly according to that group friends are categorized. We are identifying the online users in friend list. Then we generate the graphical view of users based on the group.

    10. Conclusion:

We propose an edge-centric clustering scheme to extract social dimensions and a scalable k-means variant to handle edge clustering. Essentially, each edge is treated as one data instance, and the connected nodes are the corresponding features. Then, the proposed k-means clustering algorithm can be applied to partition the edges into disjoint sets, with each set representing one possible affiliation. With this edge-centric view, we show that the extracted social dimensions are guaranteed to be sparse. This model, based on the sparse social dimensions, shows comparable prediction performance with earlier social dimension approaches. An incomparable advantage of our model is that it easily scales to handle networks with millions of actors while the earlier models fail. This scalable approach offers a viable solution to effective learning of online collective behavior on a large scale.

References

  1. L. Tang and H. Liu, Toward predicting collective behavior via social dimension extraction, IEEE Intelligent Systems, vol. 25, pp. 1925, 2010.

  2. , Relational learning via latent social dimensions, in KDD 09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2009, pp. 817 826.

  3. L. Tang and H. Liu, Scalable learning of collective behavior based on sparse social dimensions, in CIKM 09: Proceeding of the 18th ACM conference on Information and knowledge management. New York, NY, USA: ACM, 2009, pp. 11071116.

[4]. Dr. M.H. Dunham ,Dr. Lee, Sin-Min San Jose State University.

[5]. Homophily in Online Dating: When Do You Like Someone Like Yourself?

Andrew T. Fiore and Judith S. Donath [6]. Z. Xu, V. Tresp, S. Yu, and K. Yu,

Nonparametric Relational Learning for Social Network Analysis, [7]. L. Tang, H. Liu, J. Zhang, and Z. Nazeri,

Community Evolution in Dynamic Multi-Mode Networks,

S.ARCHANA,received the M.Tech in Software Engineering from JNTU, Hyderabad AP-India. She is working as a Sr. Asst Prof in the Dept of CSE, ASTRA.

M.DEEPTHI CHAITANYA, completed her graduation from SRI PADMAVATHI MAHILA UNIVERSITY,THIRUPATHI, in the field of computer science. Currently she is working towards his masters degree in the field of COMPUTER SCIENCE from Auroras Scientific, Technological and Research Academy.

Leave a Reply