10,16,2021

 Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks2019-08-03   ${\displaystyle \cong }$ Modeling sequential interactions between users and items/products is crucial in domains such as e-commerce, social networking, and education. Representation learning presents an attractive opportunity to model the dynamic evolution of users and items, where each user/item can be embedded in a Euclidean space and its evolution can be modeled by an embedding trajectory in this space. However, existing dynamic embedding methods generate embeddings only when users take actions and do not explicitly model the future trajectory of the user/item in the embedding space. Here we propose JODIE, a coupled recurrent neural network model that learns the embedding trajectories of users and items. JODIE employs two recurrent neural networks to update the embedding of a user and an item at every interaction. Crucially, JODIE also models the future embedding trajectory of a user/item. To this end, it introduces a novel projection operator that learns to estimate the embedding of the user at any time in the future. These estimated embeddings are then used to predict future user-item interactions. To make the method scalable, we develop a t-Batch algorithm that creates time-consistent batches and leads to 9x faster training. We conduct six experiments to validate JODIE on two prediction tasks---future interaction prediction and state change prediction---using four real-world datasets. We show that JODIE outperforms six state-of-the-art algorithms in these tasks by at least 20% in predicting future interactions and 12% in state change prediction. Learning Dynamic Embeddings from Temporal Interactions2018-12-05   ${\displaystyle \cong }$ Modeling a sequence of interactions between users and items (e.g., products, posts, or courses) is crucial in domains such as e-commerce, social networking, and education to predict future interactions. Representation learning presents an attractive solution to model the dynamic evolution of user and item properties, where each user/item can be embedded in a euclidean space and its evolution can be modeled by dynamic changes in embedding. However, existing embedding methods either generate static embeddings, treat users and items independently, or are not scalable. Here we present JODIE, a coupled recurrent model to jointly learn the dynamic embeddings of users and items from a sequence of user-item interactions. JODIE has three components. First, the update component updates the user and item embedding from each interaction using their previous embeddings with the two mutually-recursive Recurrent Neural Networks. Second, a novel projection component is trained to forecast the embedding of users at any future time. Finally, the prediction component directly predicts the embedding of the item in a future interaction. For models that learn from a sequence of interactions, traditional training data batching cannot be done due to complex user-user dependencies. Therefore, we present a novel batching algorithm called t-Batch that generates time-consistent batches of training data that can run in parallel, giving massive speed-up. We conduct six experiments on two prediction tasks---future interaction prediction and state change prediction---using four real-world datasets. We show that JODIE outperforms six state-of-the-art algorithms in these tasks by up to 22.4%. Moreover, we show that JODIE is highly scalable and up to 9.2x faster than comparable models. As an additional experiment, we illustrate that JODIE can predict student drop-out from courses five interactions in advance. Deep Coevolutionary Network: Embedding User and Item Features for Recommendation2017-02-28   ${\displaystyle \cong }$ Recommender systems often use latent features to explain the behaviors of users and capture the properties of items. As users interact with different items over time, user and item features can influence each other, evolve and co-evolve over time. The compatibility of user and item's feature further influence the future interaction between users and items. Recently, point process based models have been proposed in the literature aiming to capture the temporally evolving nature of these latent features. However, these models often make strong parametric assumptions about the evolution process of the user and item latent features, which may not reflect the reality, and has limited power in expressing the complex and nonlinear dynamics underlying these processes. To address these limitations, we propose a novel deep coevolutionary network model (DeepCoevolve), for learning user and item features based on their interaction graph. DeepCoevolve use recurrent neural network (RNN) over evolving networks to define the intensity function in point processes, which allows the model to capture complex mutual influence between users and items, and the feature evolution over time. We also develop an efficient procedure for training the model parameters, and show that the learned models lead to significant improvements in recommendation and activity prediction compared to previous state-of-the-arts parametric models. Dynamic Graph Collaborative Filtering2021-01-07   ${\displaystyle \cong }$ Dynamic recommendation is essential for modern recommender systems to provide real-time predictions based on sequential data. In real-world scenarios, the popularity of items and interests of users change over time. Based on this assumption, many previous works focus on interaction sequences and learn evolutionary embeddings of users and items. However, we argue that sequence-based models are not able to capture collaborative information among users and items directly. Here we propose Dynamic Graph Collaborative Filtering (DGCF), a novel framework leveraging dynamic graphs to capture collaborative and sequential relations of both items and users at the same time. We propose three update mechanisms: zero-order 'inheritance', first-order 'propagation', and second-order 'aggregation', to represent the impact on a user or item when a new interaction occurs. Based on them, we update related user and item embeddings simultaneously when interactions occur in turn, and then use the latest embeddings to make recommendations. Extensive experiments conducted on three public datasets show that DGCF significantly outperforms the state-of-the-art dynamic recommendation methods up to 30. Our approach achieves higher performance when the dataset contains less action repetition, indicating the effectiveness of integrating dynamic collaborative information. User Embedding based Neighborhood Aggregation Method for Inductive Recommendation2021-02-15   ${\displaystyle \cong }$ We consider the problem of learning latent features (aka embedding) for users and items in a recommendation setting. Given only a user-item interaction graph, the goal is to recommend items for each user. Traditional approaches employ matrix factorization-based collaborative filtering methods. Recent methods using graph convolutional networks (e.g., LightGCN) achieve state-of-the-art performance. They learn both user and item embedding. One major drawback of most existing methods is that they are not inductive; they do not generalize for users and items unseen during training. Besides, existing network models are quite complex, difficult to train and scale. Motivated by LightGCN, we propose a graph convolutional network modeling approach for collaborative filtering CF-GCN. We solely learn user embedding and derive item embedding using light variant CF-LGCN-U performing neighborhood aggregation, making it scalable due to reduced model complexity. CF-LGCN-U models naturally possess the inductive capability for new items, and we propose a simple solution to generalize for new users. We show how the proposed models are related to LightGCN. As a by-product, we suggest a simple solution to make LightGCN inductive. We perform comprehensive experiments on several benchmark datasets and demonstrate the capabilities of the proposed approach. Experimental results show that similar or better generalization performance is achievable than the state of the art methods in both transductive and inductive settings. Dual-embedding based Neural Collaborative Filtering for Recommender Systems2021-02-04   ${\displaystyle \cong }$ Among various recommender techniques, collaborative filtering (CF) is the most successful one. And a key problem in CF is how to represent users and items. Previous works usually represent a user (an item) as a vector of latent factors (aka. \textit{embedding}) and then model the interactions between users and items based on the representations. Despite its effectiveness, we argue that it's insufficient to yield satisfactory embeddings for collaborative filtering. Inspired by the idea of SVD++ that represents users based on themselves and their interacted items, we propose a general collaborative filtering framework named DNCF, short for Dual-embedding based Neural Collaborative Filtering, to utilize historical interactions to enhance the representation. In addition to learning the primitive embedding for a user (an item), we introduce an additional embedding from the perspective of the interacted items (users) to augment the user (item) representation. Extensive experiments on four publicly datasets demonstrated the effectiveness of our proposed DNCF framework by comparing its performance with several traditional matrix factorization models and other state-of-the-art deep learning based recommender models. A Correlation Maximization Approach for Cross Domain Co-Embeddings2018-09-10   ${\displaystyle \cong }$ Although modern recommendation systems can exploit the structure in users' item feedback, most are powerless in the face of new users who provide no structure for them to exploit. In this paper we introduce ImplicitCE, an algorithm for recommending items to new users during their sign-up flow. ImplicitCE works by transforming users' implicit feedback towards auxiliary domain items into an embedding in the target domain item embedding space. ImplicitCE learns these embedding spaces and transformation function in an end-to-end fashion and can co-embed users and items with any differentiable similarity function. To train ImplicitCE we explore methods for maximizing the correlations between model predictions and users' affinities and introduce Sample Correlation Update, a novel and extremely simple training strategy. Finally, we show that ImplicitCE trained with Sample Correlation Update outperforms a variety of state of the art algorithms and loss functions on both a large scale Twitter dataset and the DBLP dataset. Feature-based factorized Bilinear Similarity Model for Cold-Start Top-n Item Recommendation2019-04-22   ${\displaystyle \cong }$ Recommending new items to existing users has remained a challenging problem due to absence of user's past preferences for these items. The user personalized non-collaborative methods based on item features can be used to address this item cold-start problem. These methods rely on similarities between the target item and user's previous preferred items. While computing similarities based on item features, these methods overlook the interactions among the features of the items and consider them independently. Modeling interactions among features can be helpful as some features, when considered together, provide a stronger signal on the relevance of an item when compared to case where features are considered independently. To address this important issue, in this work we introduce the Feature-based factorized Bilinear Similarity Model (FBSM), which learns factorized bilinear similarity model for TOP-n recommendation of new items, given the information about items preferred by users in past as well as the features of these items. We carry out extensive empirical evaluations on benchmark datasets, and we find that the proposed FBSM approach improves upon traditional non-collaborative methods in terms of recommendation performance. Moreover, the proposed approach also learns insightful interactions among item features from data, which lead to deep understanding on how these interactions contribute to personalized recommendation. Fusion Strategies for Learning User Embeddings with Neural Networks2019-01-08   ${\displaystyle \cong }$ Growing amounts of online user data motivate the need for automated processing techniques. In case of user ratings, one interesting option is to use neural networks for learning to predict ratings given an item and a user. While training for prediction, such an approach at the same time learns to map each user to a vector, a so-called user embedding. Such embeddings can for example be valuable for estimating user similarity. However, there are various ways how item and user information can be combined in neural networks, and it is unclear how the way of combining affects the resulting embeddings. In this paper, we run an experiment on movie ratings data, where we analyze the effect on embedding quality caused by several fusion strategies in neural networks. For evaluating embedding quality, we propose a novel measure, Pair-Distance Correlation, which quantifies the condition that similar users should have similar embedding vectors. We find that the fusion strategy affects results in terms of both prediction performance and embedding quality. Surprisingly, we find that prediction performance not necessarily reflects embedding quality. This suggests that if embeddings are of interest, the common tendency to select models based on their prediction ability should be reconsidered. Pairwise Interactive Graph Attention Network for Context-Aware Recommendation2019-11-18   ${\displaystyle \cong }$ Context-aware recommender systems (CARS), which consider rich side information to improve recommendation performance, have caught more and more attention in both academia and industry. How to predict user preferences from diverse contextual features is the core of CARS. Several recent models pay attention to user behaviors and use specifically designed structures to extract adaptive user interests from history behaviors. However, few works take item history interactions into consideration, which leads to the insufficiency of item feature representation and item attraction extraction. From these observations, we model the user-item interaction as a dynamic interaction graph (DIG) and proposed a GNN-based model called Pairwise Interactive Graph Attention Network (PIGAT) to capture dynamic user interests and item attractions simultaneously. PIGAT introduces the attention mechanism to consider the importance of each interacted user/item to both the user and the item, which captures user interests, item attractions and their influence on the recommendation context. Moreover, confidence embeddings are applied to interactions to distinguish the confidence of interactions occurring at different times. Then more expressive user/item representations and adaptive interaction features are generated, which benefits the recommendation performance especially when involving long-tail items. We conduct experiments on three real-world datasets to demonstrate the effectiveness of PIGAT. Collaborative Filtering with Information-Rich and Information-Sparse Entities2014-03-06   ${\displaystyle \cong }$ In this paper, we consider a popular model for collaborative filtering in recommender systems where some users of a website rate some items, such as movies, and the goal is to recover the ratings of some or all of the unrated items of each user. In particular, we consider both the clustering model, where only users (or items) are clustered, and the co-clustering model, where both users and items are clustered, and further, we assume that some users rate many items (information-rich users) and some users rate only a few items (information-sparse users). When users (or items) are clustered, our algorithm can recover the rating matrix with $?(MK \log M)$ noisy entries while $MK$ entries are necessary, where $K$ is the number of clusters and $M$ is the number of items. In the case of co-clustering, we prove that $K^2$ entries are necessary for recovering the rating matrix, and our algorithm achieves this lower bound within a logarithmic factor when $K$ is sufficiently large. We compare our algorithms with a well-known algorithms called alternating minimization (AM), and a similarity score-based algorithm known as the popularity-among-friends (PAF) algorithm by applying all three to the MovieLens and Netflix data sets. Our co-clustering algorithm and AM have similar overall error rates when recovering the rating matrix, both of which are lower than the error rate under PAF. But more importantly, the error rate of our co-clustering algorithm is significantly lower than AM and PAF in the scenarios of interest in recommender systems: when recommending a few items to each user or when recommending items to users who only rated a few items (these users are the majority of the total user population). The performance difference increases even more when noise is added to the datasets. Scalable Realistic Recommendation Datasets through Fractal Expansions2019-02-20   ${\displaystyle \cong }$ Recommender System research suffers currently from a disconnect between the size of academic data sets and the scale of industrial production systems. In order to bridge that gap we propose to generate more massive user/item interaction data sets by expanding pre-existing public data sets. User/item incidence matrices record interactions between users and items on a given platform as a large sparse matrix whose rows correspond to users and whose columns correspond to items. Our technique expands such matrices to larger numbers of rows (users), columns (items) and non zero values (interactions) while preserving key higher order statistical properties. We adapt the Kronecker Graph Theory to user/item incidence matrices and show that the corresponding fractal expansions preserve the fat-tailed distributions of user engagements, item popularity and singular value spectra of user/item interaction matrices. Preserving such properties is key to building large realistic synthetic data sets which in turn can be employed reliably to benchmark Recommender Systems and the systems employed to train them. We provide algorithms to produce such expansions and apply them to the MovieLens 20 million data set comprising 20 million ratings of 27K movies by 138K users. The resulting expanded data set has 10 billion ratings, 864K items and 2 million users in its smaller version and can be scaled up or down. A larger version features 655 billion ratings, 7 million items and 17 million users. Addressing the Item Cold-start Problem by Attribute-driven Active Learning2018-05-23   ${\displaystyle \cong }$ In recommender systems, cold-start issues are situations where no previous events, e.g. ratings, are known for certain users or items. In this paper, we focus on the item cold-start problem. Both content information (e.g. item attributes) and initial user ratings are valuable for seizing users' preferences on a new item. However, previous methods for the item cold-start problem either 1) incorporate content information into collaborative filtering to perform hybrid recommendation, or 2) actively select users to rate the new item without considering content information and then do collaborative filtering. In this paper, we propose a novel recommendation scheme for the item cold-start problem by leverage both active learning and items' attribute information. Specifically, we design useful user selection criteria based on items' attributes and users' rating history, and combine the criteria in an optimization framework for selecting users. By exploiting the feedback ratings, users' previous ratings and items' attributes, we then generate accurate rating predictions for the other unselected users. Experimental results on two real-world datasets show the superiority of our proposed method over traditional methods. A Latent Source Model for Online Collaborative Filtering2014-10-31   ${\displaystyle \cong }$ Despite the prevalence of collaborative filtering in recommendation systems, there has been little theoretical development on why and how well it works, especially in the "online" setting, where items are recommended to users over time. We address this theoretical gap by introducing a model for online recommendation systems, cast item recommendation under the model as a learning problem, and analyze the performance of a cosine-similarity collaborative filtering method. In our model, each of $n$ users either likes or dislikes each of $m$ items. We assume there to be $k$ types of users, and all the users of a given type share a common string of probabilities determining the chance of liking each item. At each time step, we recommend an item to each user, where a key distinction from related bandit literature is that once a user consumes an item (e.g., watches a movie), then that item cannot be recommended to the same user again. The goal is to maximize the number of likable items recommended to users over time. Our main result establishes that after nearly $\log(km)$ initial learning time steps, a simple collaborative filtering algorithm achieves essentially optimal performance without knowing $k$. The algorithm has an exploitation step that uses cosine similarity and two types of exploration steps, one to explore the space of items (standard in the literature) and the other to explore similarity between users (novel to this work). Sparse-Interest Network for Sequential Recommendation2021-02-18   ${\displaystyle \cong }$ Recent methods in sequential recommendation focus on learning an overall embedding vector from a user's behavior sequence for the next-item recommendation. However, from empirical analysis, we discovered that a user's behavior sequence often contains multiple conceptually distinct items, while a unified embedding vector is primarily affected by one's most recent frequent actions. Thus, it may fail to infer the next preferred item if conceptually similar items are not dominant in recent interactions. To this end, an alternative solution is to represent each user with multiple embedding vectors encoding different aspects of the user's intentions. Nevertheless, recent work on multi-interest embedding usually considers a small number of concepts discovered via clustering, which may not be comparable to the large pool of item categories in real systems. It is a non-trivial task to effectively model a large number of diverse conceptual prototypes, as items are often not conceptually well clustered in fine granularity. Besides, an individual usually interacts with only a sparse set of concepts. In light of this, we propose a novel \textbf{S}parse \textbf{I}nterest \textbf{NE}twork (SINE) for sequential recommendation. Our sparse-interest module can adaptively infer a sparse set of concepts for each user from the large concept pool and output multiple embeddings accordingly. Given multiple interest embeddings, we develop an interest aggregation module to actively predict the user's current intention and then use it to explicitly model multiple interests for next-item prediction. Empirical results on several public benchmark datasets and one large-scale industrial dataset demonstrate that SINE can achieve substantial improvement over state-of-the-art methods. Neural Graph Collaborative Filtering2020-07-03   ${\displaystyle \cong }$ Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user's (or an item's) embedding by mapping from pre-existing features that describe the user (or the item), such as ID and attributes. We argue that an inherent drawback of such methods is that, the collaborative signal, which is latent in user-item interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect. In this work, we propose to integrate the user-item interactions -- more specifically the bipartite graph structure -- into the embedding process. We develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the user-item graph structure by propagating embeddings on it. This leads to the expressive modeling of high-order connectivity in user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner. We conduct extensive experiments on three public benchmarks, demonstrating significant improvements over several state-of-the-art models like HOP-Rec and Collaborative Memory Network. Further analysis verifies the importance of embedding propagation for learning better user and item representations, justifying the rationality and effectiveness of NGCF. Codes are available at https://github.com/xiangwang1223/neural_graph_collaborative_filtering. Personalized Adaptive Meta Learning for Cold-start User Preference Prediction2020-12-22   ${\displaystyle \cong }$ A common challenge in personalized user preference prediction is the cold-start problem. Due to the lack of user-item interactions, directly learning from the new users' log data causes serious over-fitting problem. Recently, many existing studies regard the cold-start personalized preference prediction as a few-shot learning problem, where each user is the task and recommended items are the classes, and the gradient-based meta learning method (MAML) is leveraged to address this challenge. However, in real-world application, the users are not uniformly distributed (i.e., different users may have different browsing history, recommended items, and user profiles. We define the major users as the users in the groups with large numbers of users sharing similar user information, and other users are the minor users), existing MAML approaches tend to fit the major users and ignore the minor users. To address this cold-start task-overfitting problem, we propose a novel personalized adaptive meta learning approach to consider both the major and the minor users with three key contributions: 1) We are the first to present a personalized adaptive learning rate meta-learning approach to improve the performance of MAML by focusing on both the major and minor users. 2) To provide better personalized learning rates for each user, we introduce a similarity-based method to find similar users as a reference and a tree-based method to store users' features for fast search. 3) To reduce the memory usage, we design a memory agnostic regularizer to further reduce the space complexity to constant while maintain the performance. Experiments on MovieLens, BookCrossing, and real-world production datasets reveal that our method outperforms the state-of-the-art methods dramatically for both the minor and major users. A Markov Decision Process Analysis of the Cold Start Problem in Bayesian Information Filtering2014-10-28   ${\displaystyle \cong }$ We consider the information filtering problem, in which we face a stream of items, and must decide which ones to forward to a user to maximize the number of relevant items shown, minus a penalty for each irrelevant item shown. Forwarding decisions are made separately in a personalized way for each user. We focus on the cold-start setting for this problem, in which we have limited historical data on the user's preferences, and must rely on feedback from forwarded articles to learn which the fraction of items relevant to the user in each of several item categories. Performing well in this setting requires trading exploration vs. exploitation, forwarding items that are likely to be irrelevant, to allow learning that will improve later performance. In a Bayesian setting, and using Markov decision processes, we show how the Bayes-optimal forwarding algorithm can be computed efficiently when the user will examine each forwarded article, and how an upper bound on the Bayes-optimal procedure and a heuristic index policy can be obtained for the setting when the user will examine only a limited number of forwarded items. We present results from simulation experiments using parameters estimated using historical data from arXiv.org. Freudian and Newtonian Recurrent Cell for Sequential Recommendation2021-02-11   ${\displaystyle \cong }$ A sequential recommender system aims to recommend attractive items to users based on behaviour patterns. The predominant sequential recommendation models are based on natural language processing models, such as the gated recurrent unit, that embed items in some defined space and grasp the user's long-term and short-term preferences based on the item embeddings. However, these approaches lack fundamental insight into how such models are related to the user's inherent decision-making process. To provide this insight, we propose a novel recurrent cell, namely FaNC, from Freudian and Newtonian perspectives. FaNC divides the user's state into conscious and unconscious states, and the user's decision process is modelled by Freud's two principles: the pleasure principle and reality principle. To model the pleasure principle, i.e., free-floating user's instinct, we place the user's unconscious state and item embeddings in the same latent space and subject them to Newton's law of gravitation. Moreover, to recommend items to users, we model the reality principle, i.e., balancing the conscious and unconscious states, via a gating function. Based on extensive experiments on various benchmark datasets, this paper provides insight into the characteristics of the proposed model. FaNC initiates a new direction of sequential recommendations at the convergence of psychoanalysis and recommender systems. Regret in Online Recommendation Systems2020-10-23   ${\displaystyle \cong }$ This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time. In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items. Importantly, an item cannot be recommended twice to the same user. The probabilities that a user likes each item are unknown. The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities. We investigate various structural assumptions on these probabilities: we derive for each structure regret lower bounds, and devise algorithms achieving these limits. Interestingly, our analysis reveals the relative weights of the different components of regret: the component due to the constraint of not presenting the same item twice to the same user, that due to learning the chances users like items, and finally that arising when learning the underlying structure.