- Open Access
- Authors : Sainath Veerla, Bhavan Kumar Basavaraju, Sai Srivathsav Aripirala
- Paper ID : IJERTV13IS060043
- Volume & Issue : Volume 13, Issue 06 (June 2024)
- Published (First Online): 12-06-2024
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Enhancing Yelp’s Recommendation System: Personalized and Equitable Solutions for Business Visibility
Sainath Veerla
Department of Applied Data Science San Jose State University (SJSU) San Jose, USA
Bhavan Kumar Basavaraju Sai Srivathsav Aripirala Department of Applied Data Science Department of Applied Data Science San Jose State University (SJSU) San Jose State University (SJSU)
San Jose, USA San Jose, USA
Abstract In the digital age, platforms like Yelp revolutionize how consumers connect with local businesses. However, as reliance on such platforms grows, so do expectations for personalized and accurate recommendations. Yelp's current recommendation systems often fall short, leading to dissatisfaction and decreased engagement. This project aims to enhance user experience through improved recommendation accuracy, ensuring recommendations reflect individual preferences. Business visibility on Yelp is crucial, especially for smaller or newer establishments that struggle against more established ones. These businesses often miss growth opportunities due to inadequate exposure. By developing a more effective recommendation system, we aim to increase their visibility, driving higher traffic and enhancing their success, providing a more equitable platform for all businesses. Yelp collects extensive data, including user reviews, business attributes, and interaction logs. Despite this wealth of information, there is a gap in its use to enhance recommendations. This project leverages advanced data mining techniques to unlock deeper insights, substantially improving personalization and relevance, ensuring Yelp remains at the forefront of innovation.
Keywords Yelp; recommendations; accuracy; businesses; reviews
-
INTRODUCTION
Yelp revolutionized how users discover, rate, and review local businesses, amassing vast data, including user reviews, business attributes, and interactions. This rich dataset offers unique insights to enhance connections between businesses and consumers. Despite advancements, Yelps recommendation system faces challenges. The primary issue is underutilizing available data, limiting its ability to understand and predict user preferences. Although Yelp offers personalized recommendations, there is room for improvement in real-time data processing and integrating multi-modal data sources (e.g., images, videos).
The credibility of reviews and authenticity of user ratings remain contentious. Yelp has invested in algorithms to detect and mitigate fake reviews and spam, crucial for maintaining user trust and system reliability. Enhancing Yelps recommendation system is imperative. By employing advanced data mining techniques, the project aims to address these challenges, exploring deep learning and natural language processing to refine recommendations and identify fake reviews more efficiently, improving Yelps ecosystem. Initially, Yelp's system sorted businesses based on ratings and
review volume. As the platform grew, it integrated metrics like review frequency and reviewer reliability. Incorporating AI now allows Yelp to dynamically update recommendations based on user interactions and feedback.
-
LITERATURE REVIEW
-
The paper discusses advanced matrix factorization techniques tailored for recommender systems, which improve prediction and recommendation quality by capturing the underlying patterns in user-item interactions. These techniques address scalability and sparse challenges inherent to recommender systems, showcasing their effectiveness in large datasets. [2] The author presents item-based collaborative filtering, a method which contrasts with traditional user-based approaches. It focuses on calculating item similarities rather than user similarities, leading to more scalable recommendations and improved handling of high-dimensional data. [3] The paper discusses about SCENE, a two-stage personalized news recommendation system designed to handle scalability. The system initially filters news articles using a clustering method and then applies personalized ranking to enhance recommendation relevance. The approach improves computational efficiency and accuracy in delivering tailored news content to users. [4] The author presents a hierarchical recommendation system that enhances collaborative filtering techniques by structuring data at multiple levels. This hierarchy improves the systems scalability and accuracy in capturing diverse user preferences, making recommendations more precise and effective. [5] The paper provides an in-depth analysis of various methods for recommending points of interest (POIs), addressing the unique challenges and requirements of POI recommendation systems. It comprehensively evaluates existing techniques, highlighting their strengths and weaknesses, and discusses metrics for assessing recommendation quality. The authors also identify future research directions, emphasizing the need for more sophisticated methods to handle dynamic user preferences and contextual information, thus aiming to enhance the overall user experience in POI recommendation systems. [6] This study explores the integration of user-generated tips and reviews to improve location recommendations on Yelp. The authors demonstrate that combining tips, which are concise and focused pieces of advice, with detailed reviews enhances the recommendation system's performance. By leveraging both types of user inputs, the system can provide more accurate and
relevant suggestions. The paper presents methodologies for effectively merging these data sources and showcases their application through experiments, revealing improved recommendation accuracy and user satisfaction. [7] This seminal paper introduces the concept of collaborative filtering, which has become a cornerstone of modern recommender systems. The authors describe the Information Tapestry system, which utilizes collaborative filtering to filter and recommend information based on user interests. By aggregating preferences and feedback from multiple users, the system can identify patterns and suggest relevant content. This approach addresses the limitations of content-based filtering and significantly improves recommendation quality, particularly in scenarios with high-dimensional data. The paper's innovative ideas laid the groundwork for future developments in personalized recommendation technologies.
The insights and methodologies from these papers provide a solid foundation for enhancing Yelp's recommendation system. By incorporating advanced matrix factorization techniques, item-based collaborative filtering, hierarchical structures, and the complementary usage of tips and reviews, the project can significantly improve recommendation accuracy and personalization. Additionally, leveraging the foundational concepts of collaborative filtering will help address scalability and high-dimensional data challenges.
-
-
METHODOLOGY
-
Data Collection
The dataset, sourced from the Yelp website, provides detailed information on business attributes, user-generated reviews, and user interaction patterns. The business data includes location and category details, the review data captures user sentiments through ratings and textual content, and the user data offers insights into user behavior. This rich and diverse dataset is essential for developing robust content-based, collaborative filtering, and hybrid recommendation systems that provide accurate and personalized suggestions.
-
Data Pre-procssing
The preprocessing phase involves cleaning and transforming the train dataset. The dataset is thoroughly checked for null values in critical columns for recommendation, such as text and business_id. To improve data quality, the review text was converted to lowercase, non-word characters were removed, and common stop words were eliminated. This process helped to reduce noise in the data and focus on the meaningful content of the reviews.
-
Model Architecture and Training
In this project, we have implemented three different recommendation systems: content-based, collaborative filtering-based, and hybrid recommendation systems. Each system leverages different aspects of the data to provide personalized recommendations and addresses the challenges of scalability and sparse in collaborative filtering.
-
Content-Based Recommendation System: The content- based recommendation system relies on the features of the items to make recommendations. In this system, the review text is vectorized using Term Frequency-Inverse Document Frequency (TF-IDF) to create feature vectors representing the content of the reviews. This allows the system to quantitatively handle text data and effectively capture the essence of each review, making it possible to compare and analyze the textual content in a meaningful way. Cosine similarity is then used to measure the similarity between the user's profile (constructed from their review history) and the items. Recommendations are made based on the highest similarity scores, suggesting items that are most similar to what the user has previously liked. This method focuses on the specific attributes of items that the user has interacted with, providing personalized suggestions based on the content.
Figure 2. Scree plot
-
Collaborative Filtering-Based Recommendation System: The collaborative filtering-based recommendation system uses matrix factorization to address the sparsity of the interaction matrix. Initially, a sparse matrix is created to represent the strength of interaction between users and businesses. To manage this sparsity, Singular Value Decomposition (SVD) is performed on the sparse interaction matrix. SVD decomposes the matrix into three lower- dimensional matrices: user factors, item factors, and singular values in a diagonal matrix. Figure 2 explains the amount of variance captured by each component. This reduces the dimensionality of the data, making computations more efficient and highlighting the underlying structure in the interaction patterns. Once the matrix is decomposed, the interaction matrix is reconstructed using these decomposed matrices. The reconstructed matrix fills in the missing values by leveraging the latent factors identified during the decomposition. This step allows the system to predict the interaction strength between users and businesses that were previously unobserved, thus overcoming the sparsity issue. By reconstructing the matrix, the system can generate a complete set of predicted interactions, which is essential for providing comprehensive recommendations.Predictions are then made by multiplying the decomposed factors, which provides an estimation of the interaction strength between users and businesses that were not part of the original sparse matrix. This step is vital for the recommendation process as it enables the system to suggest new businesses to users based on the predicted interactions. By capturing the latent features that influence user-business interactions, the collaborative filtering model can provide more accurate and personalized recommendations.
-
Hybrid Recommendation System: The hybrid recommendation system combines both content-based and collaborative filtering approaches to leverage the strengths of each method. In this system, recommendations from both the content-based and collaborative filtering models are combined. Initially, equal weights are assigned to both models, as customer feedback is not yet included to adjust these weights dynamically. In practice, weights would be adjusted based on user preferences: if users show a higher interest in content-based recommendations, that component would be given a higher weight, and vice versa. The scores from both recommendation systems are combined according to the assigned weights, and the final recommendations are
sorted in descending order based on these combined scores. This approach aims to provide more balanced and personalized recommendations by considering both the content features and user interaction patterns.
-
-
Experimental Setup
The experiments focused on leveraging clustering in both content-based and collaborative filtering models to improve recommendations. In the content-based recommendation system, items were clustered based on their TF-IDF vector representations. By clustering items with similar content features, the system could provide recommendations that are more focused and relevant. When a user interacts with an item, the system recommends other items from the same cluster, assuming that similar items will likely interest the user.
In the collaborative filtering recommendation system, users were clustered based on their predicted ratings using k-means clustering. This approach grouped users with similar preferences, allowing the recommendation system to leverage the preferences of users within the same cluster. Instead of relying solely on individual user data, the system considers the collective preferences of users in the same cluster, enhancing the robustness and accuracy of the recommendations. This method effectively addresses the sparse issue by aggregating preferences across similar users.
Both the clustered models were integrated into a hybrid system to check for better recommendations. The hybrid model, combining both content-based and collaborative filtering approaches, benefited from the strengths of both methods. By incorporating clustering in both models, the hybrid system provided more balanced and personalized recommendations, effectively leveraging content features and user interaction patterns.
-
Evaluation Metrics
Precision@N and Recall@N, [9], provide insights into how well the system is performing in terms of recommending relevant items to users.
Figure 3. Precision
Precision@N- Precision@N evaluates the accuracy of the top-N recommendations by checking how many of those recommended items are actually relevant.
Figure 4. Recall
Recall@N- Recall@N evaluates the coverage of the relevant items in the top-N recommendations by checking how many of the relevant items are included in the recommendations.
Where:
-
TopN(ui) is the set of top- items recommended to user ui.
-
L(ui) is the set of relevant items for user ui in the test set.
-
U is the set of all users.
The objective of these experiments is to investigate the effectiveness of using clustering and textual content to improve content-based recommendation systems. The findings of these analyses aid in comprehending the possible enhancements in system performance and suggestion relevancy that each technique offers.
-
-
Results
The analysis of different recommendation techniques shows that the Unified Social Geographic (USG) method by Werneck [5], has the highest Recall@10, suggesting superior effectiveness in identifying relevant items. However, it has lower precision compared to the collaborative filtering method, which exhibits the highest Precision@10, indicating greater accuracy. GeoMF method records the lowest in both recall and precision, highlighting potential challenges in merging geographical factors with matrix factorization.
Content
Collab
Hybrid
USG
GeoMF
Recall@10
0.0667
0.0667
0.0893
0.0377
0.0312
Preciison@10
0.159
0.0248
0.0197
0.036
0.0309
Table 1. Results
approach in generating relevant and comprehensive recommendations.
The research resulted in development of a robust recommendation system for Yelp using advanced data mining techniques that has the potential to significantly enhance user experience and business visibility. By leveraging a comprehensive dataset that includes user reviews, business attributes, and interaction logs, we have implemented and compared three recommendation systems: content-based, collaborative filtering, and hybrid models.
I. Discussion and Future Improvements
The analysis of different recommendation techniques shows that a better approach of clustering to understand the user interactions and provide better recommendations that will be able to understand the user's needs. Although the model shows promising performance the methodology does not include feedback from the user to strongly back up the recommendations made by the model. Therefore, we intend to add a user feedback loop that runs in real time might be revolutionary. In order to ensure that the suggestions stay relevant and valuable over time, the system may continually modify and enhance its models based on direct user feedback by enabling users to assess the usefulness of recommendations instantly.
Table 2 focuses on clustering-based approaches and indicates that collaborative filtering excels in Recall@10 with a score of 0.1271, pointing to its strength in capturing a broader set of relevant recommendations compared to content-based and hybrid methods. However, all methods show relatively low precision, with collaborative filtering again leading at 0.0332. These results underline the trade-offs between recall and precision in different recommendation and clustering approaches.
Table 2. Clustering results
Clustering based
Content
Collab
Hybrid
Recall@10
0.0634
0.1271
0.1078
Preciison@10
0.0163
0.0332
0.0288
-
Conclusion
Data Preparation
The data preparation phase involves transforming raw .json files into .CSV for analysis. Initially, business, review, and user datasets were loaded from their respective sources and combined based on unique identifiers (business_id and user_id). This merging process resulted in a unified dataset that includes user interactions with businesses, the content of reviews, and business locations.
To focus the analysis on a region with significant user activity, interactions related to businesses located in Illinois (LV) were selected due to the higher interaction volume in this state compared to others.
Additionally, users with fewer than three interactions were filtered out to base the analysis on more active users, enhancing the reliability of the subsequent recommendations. The dataset was split into training, validation, and testing sets using stratified sampling based on user_id to ensure a consistent user distribution across datasets. The split ratio was set to 80% for training data, 10% for validation, and 10% for testing data.
Figure 1. Distribution of stars
The experimental results demonstrate that there is a chance of improvement in recommendation systems with clustering. By grouping similar items and users, these models can better capture user preferences and interaction patterns, leading to enhanced recommendation accuracy. Evaluation metrics, such as Precision@N and Recall@N, confirm the efficacy of our
REFERENCES
-
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. IEEE Transactions on Knowledge and Data Engineering, 42(8), 30-37.
-
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web (pp. 285-295). Hong Kong.
-
Li, L., Wang, D., Li, T., Knox, D., & Padmanabhan, B.-H. (2011). SCENE: A scalable two-stage personalized news recommendation system. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 125-134). Beijing, China.
-
Park, J. S., & Zhou, M. X. (1999). A hierarchical recommendation system based on collaborative filtering. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (pp. 395- 399). Tokyo, Japan.unpublished.
-
Werneck, H., Silva, N., Viana, M., Pereira, A. C., Mourao, F., & Rocha,
L. (2021). Points of interest recommendations: methods, evaluation, and future directions. Information Systems, 101, 101789.
-
Gupta, S., Pathak, S., & Mitra, B. (2015). Complementary usage of tips and reviews for location recommendation in Yelp. In Lecture notes in computer science (pp. 720731). https://doi.org/10.1007/978-3-319- 18032-8_56
-
David Goldberg, David Nichols, Brian M. Oki, and Douglas Terry. 1992. Using collaborative filtering to weave an information tapestry. Commun. ACM 35, 12 (Dec. 1992), 6170.
https://doi.org/10.1145/138859.138867
-
Gábor Takács, István Pilászy, Bottyán Németh, and Domonkos Tikk. 2007. Major components of the gravity recommendation system. SIGKDD Explor. Newsl. 9, 2 (December 2007), 8083. https://doi.org/10.1145/1345448.1345466.
-
Gupta, Jyoti & Gadge, Jayant. (2015). Performance analysis of recommendation system based on collaborative filtering and demographics. Proceedings – 2015 International Conference on Communication, Information and Computing Technology, ICCICT 2015. 10.1109/ICCICT.2015.7045675.