Contextual Personality-aware Recommender System Versus Big Data Recommender System

Many personality theories suggest that personality influences c ustomer shopping preference. Thus, this research analyses the potential ability to improve the accuracy of the collaborative filtering r ecommender s ystem by i ncorporating t he F ive-Factor M odel personality traits data obtained from customer text reviews. The study uses a large Amazon dataset with customer reviews and information about verified customer product p urchases. However, evaluation results show that the model leveraging big data by using the whole Amazon dataset provides better recommendations than the recommender systems trained in the contexts of the customer personality traits.


Introduction
Recommender systems (RSs) are nowadays a very important element that is influencing customer digital experience in electronic services. Many major companies such as Amazon, Netflix, or Spotify are successfully employing effective RSs in their businesses and are seeking to improve their algorithms even further. Therefore, research in this domain seems justified.
There are three main types of recommender systems: content-based (CB), collaborative filtering ( CF), a nd h ybrid r ecommender s ystems [1]. C B u ses s imilarities a mong i tems, e.g., recommending movies of the same genre or news articles on the same topic. A slightly different technique is used in CF. It exploits similarity and relationships among users to provide recommendations [2]. RSs algorithms exploit a different kind of information about the user or items to provide the most accurate recommendations [3].
Exploiting customer personality data seems very appealing to many researchers since it was explored in many studies related to RSs [4]- [6]. Personality theories researchers claim that human personality traits have a significant influence on customer preferences and subsequently on behavior [7], [8]. Therefore, they seem to be a promising predictor of customer behavior. It is especially important in digital markets where customer personality characteristics can be inferred from their digital footprints [9], [10].

Customer Personality Traits Identification
In the existing literature, there are many different personality models and personality descriptions [11]- [13]. However, the most commonly used personality model is the Five-Factor Model (FFM), also known as the Big Five model, proposed by [14] and extended by the work of [15]. According to this approach, there are five basic dimensions of personality: extraversion, neuroticism, openness to experience, agreeableness, and conscientiousness. The Big Five model has been verified in a significant number of empirical studies and has been subjected to psychometric verification on many occasions [16], [17].
Considering the above, personality traits can be successfully used in many different research applications and business scenarios. However, before personality traits can be used, they must be identified in the first place. The most obvious and usually the most reliable approach for identifying FFM personality traits is through psychological questionaries. There were developed many different questionaries for this purpose [18], [19]. However, those questionaries require the user a considerable take time to complete and it is not an easy task to persuade users to donate their time to complete them. Therefore, collecting such data using this technique can be is very expensive and large-scale datasets with personality traits data collected from questionaries are extremely rare. For this reason, researchers and practitioners are trying to infer customer or user personality traits from other data sources such as social media [20], [21], multiple types of digital footprints [22], user-written texts [23]- [26], or speech and video (e.g., face detection and analysis) [25].

Personality-based Recommender Systems
Through the years, there have been many attempts to incorporate personality traits into RSs. Several publications by [27]- [30] propose and examine an interesting application of personalitybased RS (TWIN) in online tourism domain. Their RS produces recommendations based on the user personality model retrieved from the plain text. For their study, they have collected 14,000 text reviews of 1,030 people. To evaluate the performance of the TWIN system they applied their RS to suggest hotels by filtering out reviews produced by people with like-minded views to those of the user. Unfortunately, most of their work is focused on extracting the right personality type from the text, and little is said about the efficiency of the recommendations provided by the RS.
The study carried out by [31] introduced a novel Active-Learning (AL) technique for addressing the cold-start problem in RSs. Their proposed technique uses the FFM as its basis to provide a user with personalized rating requests, without completely relying on explicit feedback (e.g. ratings) or implicit feedback (e.g. item views or purchases) which is usually not available in cold-start situations. Their study claims that their AL method leads to a higher increase in the number of acquired user ratings in comparison to a state-of-the-art rating elicitation strategy. The downside of this study is undoubtedly a small evaluation dataset that covered only 108 participants (required to fill a personality questionnaire).
Very extensive state-of-the-art research related to the application of personality data in RS was presented by [32]. The paper describes different personality models (with the main focus on FFM), a correlation between personality and user preferences, personality identification techniques, an overview of the publicly available datasets for RS, different applications of personality data in RS (cold-start problem, diversity cross-domain recommendations, group recommendations), and open issues and challenges related to the usage of personality in RS.
An interesting example of how to incorporate user personality profile acquires through analysis of the written reviews to RS domain is presented by [5]. Their goal in this study was to incorporate user personality traits into RS and find out whether it would allow improving the accuracy of predicted ratings. The technique used for rating prediction was Kernelized Probabilistic Matrix Factorization (KPMF). The evaluation of their study was based on the experiment which was conducted on the (crawled) IMDB dataset of 2,087 users and 3,500 movies. The ability to identify the personality traits was based on a supervised model trained on the publicly available MyPersonality dataset (social media dataset of 250 users with their personality traits). They have trained six different models and calculated RMSEs based on the test dataset. The results suggest that the worst score was achieved by the non-optimized Matrix Factorization model, and the most effective model uses a combination of the textual features and the predicted personality scores. Unfortunately, KPMF does not seem to be easily scalable for big datasets.
The six month's study on 1,800 users described by [33] also suggests that it is possible to improve user satisfaction when we integrate users' personality traits into the process of generating recommendations. A recent line of research keeps investigating possible applications of personality data in the RS, especially, in the context of user digital footprints such as text reviews.
The study by [34] identifies the lifestyle of a customer by analyzing text reviews published on Amazon and predicts consumers' purchasing preferences. The interesting results of their experiment conducted on Amazon Review Dataset show that online lifestyles significantly improve recommendation performance and outperform the widely used FFM personality traits as a whole.
A similar study on Amazon Review Dataset was conducted by [35]. The paper suggests that movie preferences correlate with specific product purchase preferences. This finding seems to be in line with the lifestyle preferences correlation.
Another FFM personality-based RS based on text reviews was proposed by [36]. However, the authors added to the model user's level of knowledge about various domains. Their results claim that the proposed model performs better in both MAE and RMSE metrics compared to the other two models (CTR and TWIN).
Finally, RS for e-clothing store based on personality traits, demographics, and behavior of customers in time context was presented by [37]. Their proposed method was compared with different baselines (matrix factorization and ensemble). The results revealed that the proposed method led to a significant improvement in traditional CF performance, and with a significant difference (more than 40%), performed better than all baselines.

Literature Review Conclusions
Summing up this literature review, the most common model of personality is the FFM, which is composed of the factors openness, conscientiousness, extraversion, agreeableness, and neuroticism. It is suitable for RS since it can be quantified with feature vectors that describe the degree to which each factor is expressed in a user. There are different ways of acquisition of personality traits factors. Generally, those techniques can be grouped into explicit techniques (e.g., questionnaires) and implicit techniques (e.g., identification based on social media, text, or other electronic behavior). While explicit techniques provide relatively accurate assessments of the personalities they are intrusive and time consuming for potential users. However, predicting personality from online texts is a growing trend for researchers. Moreover, FFM traits can be incorporated to RS using pre-filtering [38], KPMF [39], Convex collective matrix factorization [40], or Consistent collective matrix [41]. However, all the approaches besides pre-filtering are not easily scalable and implementable in big data environments. Most researchers working in the area of RS agree that user personality data can improve the quality of recommendations. However, there are still open issues and challenges that need to be addressed to improve the adoption of personality in RS. First of all, most of the studies were based only on a small number of participants (very often ranging from 50 to about 100 participants). Therefore, there is a significant research gap of studies leveraging Big Data for personality-based RS. Moreover, many of the state-of-the-art methods are not easily scalable for large datasets and Big Data technologies.

Research Framework
The main goal of the experiment conducted in this study was to integrate the information contained in the users' text reviews into a RS, and in particular, investigate whether FFM personality traits, as reflected in the text generated by users, would allow improving the Root Mean Square Error (RMSE) of predicted ratings. To incorporate FFM personality traits into the Collaborativefiltering model it was decided to use the contextual pre-filtering technique since it is easily scalable and the easiest to implement in the first place. Figure 1 presents a research framework for the designed experiment. The first step of the experiment was preprocessing the Amazon Reviews. Then, based on the text-reviews, FFM personality traits of Amazon users were identified. For the given group of users product purchases with ratings were extracted and merged with personality data. The next step involved creating a RS model that incorporated personality traits (using pre-filtering) and RS without taking into account personality traits. Finally, both RS models were evaluated and compared.

Dataset
The analysis was carried out using a subset of Amazon Reviews Dataset collected by [42] and publicly available 1 . The initial dataset covers 233.1 million Amazon reviews between 1998 and 2018. However, to capture the latest trends of customers' behavior and to limit computational power required to process the data, the selected subset used for this study covered the last two years available in the dataset (from the 1st of October 2016 to the 1st of October 2018). Moreover, only reviews of the users with at least five text reviews were selected. Additional filtering was applied to remove empty reviews, errors, and those that did not have a verified purchase status. The final dataset used in this study covered 34,467,155 reviews of 2,968,635 users. The dataset size applied in this study is a significant advantage since there are very few studies of personality-based recommender systems that leverage big data. Different subsets of the same Amazon Reviews Dataset were used in different research scenarios by [34], [36], [43]- [45] and many more scholars. For the purpose of machine learning algorithms, this dataset was divided into the training dataset and testing dataset in proportion 80:20. Therefore, the training dataset covered 27,573,175 customer records and the testing dataset covered 6,893,980 customer records.

Personality Prediction Engine
To identify FFM personality traits from the text reviews there was used a pre-trained model based on the research with open source code published by [46]. The author of the code was inspired by the work of [24]. Publicly available pre-trained model 2 , according to the author, was trained on four different datasets: Stream-of-consciousness Essays, The NRC Emotion Lexicon, Myers-Briggs Personality Type Dataset, and the Scraped Data From Reddit. Streamof-consciousness Essays dataset is a publicly available dataset of 2,468 anonymous essays tagged with the authors' FFM personality traits. It is the gold standard from psychology since the data was collected in a controlled environment [47]. The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). It is also publicly available 3 and covers about 14,000 words. Myers-Briggs Personality Type Dataset can be found freely on Kaggle 4 . This dataset was collected through the PersonalityCafe forum and provides 8,600 rows of data on peoples' personality type, as well as what they have written. Finally, the Scraped Data From Reddit is the only dataset that is not publicly available. This dataset was used in the research by [48] and was provided by the author of the paper. It covers scraped data from personality subreddits, where people show their personality types in the forum and therefore provide labeled text comments and posts.
The author of the pre-trained model combined all those different sources into one mutual dataset, extracted features from text to vectorize the data with bags of words and GloVe approach, and tested several supervised classification learning algorithms (SVM, Decision Tree, Naive Bayes, Logistic Regression, and Random Forest). The best models for predicting specific FFM personality traits were selected for the final model. The evaluation of the model presented by the author achieves the following accuracy 77.18% (Extraversion), 61.74% (Neuroticism), 75.51% (Agreeableness), 70.34% (Conscientiousness), and 80.39% (Openness). Those results are within the range of the state-of-the-art papers analyzed in the literature review. Therefore, the usage of this pre-trained model seems justified. Similar approaches were presented in papers by [5] or [36].

Product Recommender Engine
For the purpose of the research, the CF algorithm was chosen for the RS engine since it is relatively accessible in the implementation across different domains and the Amazon Review Dataset contains the (essential for the CF) product ratings. Specifically, Alternating Least Squares (ALS) matrix factorization technique available in the spark.ml library was implemented in PySpark according to the Spark documentation 5 . The experiment based on two approaches was designed. The first approach aimed to construct a RS based on one large dataset ignoring the personality traits, while the second approach involved a pre-filtering technique to incorporate personality traits into the RS.

Big Data RS.
The first approach was based on one large training dataset containing solely user ratings for products they purchased. To perform hyperparameter tuning based on 5-fold cross-validation, the sub dataset containing 567,917 records (user ratings) was selected (using random sampling). It allowed to significantly reduce computing power required to train and cross-validate models with a different set of hyperparameters. The set of paremeters for the model tuning was based on the experience with other similar projects and the literature. The following code snipped represents a parameter grid used in hyperparameter tuning: Then, having selected the best hyperparameters the model was trained on the whole training dataset (without sampling) and evaluated on the test dataset using RMSE score.

Contextual Personality-aware RS.
The second approach aimed to investigate whether FFM personality traits, as reflected in the text generated by users, would allow improving the Root Mean Square Error (RMSE) of predicted ratings. The pre-filtering technique was used to divide the training dataset into homogenous datasets according to the identified personality traits of the users. The threshold for personality traits was set to 0.5 since the evaluation of this predictive model (described by the author) also used the same value. It means that if a given user was assigned by the model value 1 then the probability of a given FFM trait for him/her was more than 0.5, otherwise, 0 was assigned. The subsets of the training dataset were created using two filters: selecting user data according to every FFM combination and selecting user data according to the particular personality traits (selecting users with a given FFM trait, ignoring other traits).

Evaluation Criteria
Recommender systems are popularly evaluated through two main measures: Root Mean Squared Error (RMSE) and Mean Absolute Error(MAE) [49], [50]. However, most cost functions in Machine Learning avoid using MAE and rather use a sum of squared errors or Root Means Squared Error. Moreover, the famous Netflix Prize competition also selected RMSE score as the evaluation criteria [51]. Therefore, for this study RMSE is used as an evaluation metric. The smaller RMSE, the better the RS. In this case, it allowed comparing RS without personality traits and RS with incorporated personality traits.

Evaluation Results
Regarding the Big Data RS, the results of the hyperparameter tuning of the ALS model revealed that the best performing model (according to RMSE) consisted of the following parameters: als.rank=150 and als.regParam=0.15. Then, the ALS model with those hyperparameters was trained on the whole training dataset. Evaluation conducted on the test dataset achieved the RMSE score of 1.1498. It seems to be a satisfactory result for a RS. Evaluation of the Contextual Personality-aware RS was also conducted on the same test dataset. 37 subsets were selected that correspond to the combinations of the personality data traits. Then, for each personality-homogenous group, there were trained a RS model with hyperparameter tuning (the same parameter grid as used in the RS without personality traits). Each RS was evaluated on part of the test dataset which covered users with corresponding personality traits. Detailed evaluation results with comparison are presented in the Table 1 and Table 2.
The above results indicate that the RS trained on personality-homogenous groups achieves worse average RMSE scores than the RS trained on one diversified big dataset. It may indicate that applying big data is more efficient than using smaller homogenous personality-based groups. EXT

Discussion
Based on the evaluation results, the paper contributes in pointing out to other researchers that, even though personality traits are indeed very important in RS, incorporating personality traits using contextual pre-filtering is not as efficient as leveraging the whole dataset. Since those findings are based on Amazon.com's large dataset covering many different product domains, it allows expecting that the results can be generalized to other e-commerce platforms as well.

Limitations
Every study has limitations and this research is no exception. First of all, this experiment was based only on verified reviews. Hence, the people who purchased the product without reviewing it are not considered in this analysis. Secondly, to identify FFM personality traits from the text reviews there was used a pre-trained model trained on different texts. Otherwise, the study would require a huge number of Amazon users to fill in the FFM personality traits questionnaire which would be a difficult task to accomplish. However, as mentioned before, similar approaches were used by other researchers in this domain as well. Finally, the analysis was based on user accounts that might be shared with others (e.g. members of the family).

Future Work
First of all, future research should investigate further fragmentation of personality trait levels rather than having only two states (0 and 1). Exploiting different levels of personality traits (e.g., low, medium, and high levels of extraversion) may improve the accuracy of RS. Moreover, other techniques than pre-filtering can be explored to incorporate personality traits into RS. Finally, exploring similarities in the way users write text reviews (different than FFM personality traits) by applying NLP techniques may be also a good direction to extend this study.