Table of Contents

1. Introduction

The digital economy is fundamentally underpinned by the ability to filter, rank and present information. Recommender Systems (RS) have evolved from auxiliary features into the central nervous system of modern internet platforms, driving user engagement, revenue and content discovery across e-commerce, streaming media, social networking and increasingly, critical sectors such as healthcare and finance. This report presents an exhaustive systematic review of the field from 2017 to 2025, a period marked by a paradigm shift from traditional Matrix Factorization (MF) techniques to sophisticated Deep Learning (DL) architectures and most recently, the transformative integration of Large Language Models (LLMs) and Generative AI.

The trajectory of recommender systems research reflects a broader trend in artificial intelligence: the transition from specialized, task-specific models to general-purpose, semantic-aware architectures. In the early 2010s, the field was dominated by Collaborative Filtering (CF) and Latent Factor Models, which excelled at capturing user-item interactions but struggled with data sparsity and “cold start” problems. The advent of Deep Learning introduced non-linear capability, allowing for the processing of massive, heterogeneous datasets. By 2024 and 2025, the focus has shifted again towards “Generative Recommendation,” where agents powered by LLMs do not merely rank items but engage in multi-turn conversations, reason about user intent and generate explanations for their choices.¹

This review synthesizes insights from over 80 recent studies, technical reports and industrial white papers. It addresses the “translation of theoretical advancements into practical solutions,” a critical gap often observed between academic research and industrial deployment.³ While academia often prioritizes incremental improvements in metrics like Normalized Discounted Cumulative Gain (NDCG), industry practitioners grapple with system-level challenges such as latency, scalability and the normative complexities of fairness and bias.⁴ Furthermore, we examine the specialized requirements of high-stakes domains like healthcare, where the cost of a bad recommendation is measured not in lost clicks, but in adverse health outcomes.⁶

2. Methodology of Systematic Reviews in Recommender Systems

To understand the state of the art, it is essential to first establish the rigorous frameworks used to curate and analyze the exploding volume of literature. The proliferation of publications across venues like ACM RecSys, SIGIR, KDD and arXiv necessitates disciplined filtering to separate noise from signal.

2.1 The PRISMA Framework and Standards of Rigor

Rigorous systematic reviews in this domain increasingly adhere to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The PRISMA 2020 statement provides a 27-item checklist designed to ensure transparency, reproducibility and comprehensiveness.⁷ This framework compels researchers to explicitly document their search strategies, including the databases queried (e.g., Scopus, Web of Science, Google Scholar, IEEE Xplore) and the specific Boolean query strings employed (e.g., (“recommender system” OR “recommendation engine”) AND (“deep learning” OR “LLM” OR “reinforcement learning”)).⁷

The selection process typically involves a multi-stage filtering mechanism:

Identification: Gathering raw records from databases.
Screening: Removing duplicates and applying exclusion criteria (e.g., non-English texts, papers published before 2017 or short workshop papers lacking empirical validation).³
Eligibility: Full-text review to ensure the study addresses specific research questions (RQs).
Inclusion: The final set of papers is analyzed for data extraction.

Data extraction in modern RS reviews goes beyond simple bibliographic details. Systematic reviews now extract structured data regarding algorithm classes (e.g., CNN, RNN, Transformer, GNN), specific datasets utilized (e.g., MovieLens, Amazon, MIMIC-III for healthcare), evaluation metrics (e.g., RMSE, Precision@K, Fairness metrics) and the hardware infrastructure used for experiments.³

2.2 Core Research Questions (RQs) Driving the Field (2024-2025)

An analysis of recent surveys reveals a convergence on four primary Research Questions that reflect the maturing priorities of the field ⁹:

RQ1: Architectural Evolution. How have RS algorithms evolved from traditional matrix factorization to deep learning, graph-based models and generative AI? This question seeks to trace the lineage of technical innovation and the obsolescence of older methods.¹⁰
RQ2: The Theory-Practice Gap. What strategies facilitate the translation of theoretical advancements (e.g., complex GNNs) into practical, scalable industrial applications? This addresses the “deployment gap” where state-of-the-art academic models often fail in production due to latency or cost.³
RQ3: Evaluation and Normative Standards. Are existing evaluation metrics adequate for measuring user satisfaction, fairness and trust? This RQ highlights the shift from accuracy-only metrics to multi-objective optimization including diversity and serendipity.⁴
RQ4: Domain Specificity. How do challenges and solutions differ across sectors such as e-commerce, healthcare, finance and education? This acknowledges that a “one-size-fits-all” algorithm is no longer viable.³

3. The Industrial Standard: Deep Learning Architectures

While academic literature explores a vast array of experimental models, industrial recommender systems—those powering platforms like Amazon, Netflix and TikTok—have converged on specific architectural patterns. These patterns are dictated by the need to balance prediction accuracy with the intense computational demands of serving millions of queries per second (QPS) with sub-millisecond latency.

3.1 The Retrieval and Ranking Funnel

A central theme in industrial RS is the “Funnel Architecture.” It is computationally infeasible to score every item in a catalog of millions (or billions) for every user request using a complex model. Therefore, practical systems invariably employ a multi-stage pipeline ¹¹:

Retrieval (Candidate Generation): This stage utilizes fast, lightweight algorithms to select a small subset (e.g., hundreds or thousands) of potentially relevant items from the massive catalog. The goal is high recall—ensuring the “correct” items are in the candidate set.¹²
Ranking (Scoring): The retrieved candidates are processed by a heavy, complex model that scores them with high precision. The goal here is to order the items optimally based on the predicted probability of engagement (click, purchase, watch).
Re-ranking: The final list is adjusted to satisfy business logic, such as removing duplicates, ensuring diversity (not showing 10 shoes in a row), enforcing fairness constraints or boosting fresh content.¹¹

3.2 Two-Tower Architectures: The Engine of Retrieval

For the retrieval stage, the Two-Tower model (also known as the Dual Encoder) has emerged as the pervasive industry standard.¹³

3.2.1 Mechanism and Architecture

The Two-Tower architecture is defined by its separation of user and item processing into two distinct neural networks (towers) that do not interact until the final layer.

User Tower: This network takes user features as input. These features may include stable attributes (User ID, demographics, geography) and dynamic features (sequence of recent clicks, search queries). The tower processes these through embedding layers and dense layers (MLP) to output a fixed-size vector embedding, $u$.¹³
Item Tower: Similarly, this network processes item features (Item ID, category, description text, image embeddings) to output an item embedding, $v$.
Similarity Computation: The relevance score is calculated as the dot product or cosine similarity between the user and item embeddings: $s(u, v) = u^T v$.¹⁵

3.2.2 Scalability and Inference

The defining advantage of the Two-Tower model is its compatibility with Approximate Nearest Neighbor (ANN) search. Because the item tower does not depend on user input, item embeddings can be pre-computed and indexed in a vector database (e.g., FAISS, HNSW).¹⁴ During online inference, the system only needs to compute the user embedding in real-time and query the vector database to retrieve the $k$ nearest items. This reduces the time complexity from $O(N)$ (scanning the whole catalog) to $O(\log N)$ or better, making it feasible to handle catalogs of billions of items.¹²

3.2.3 Addressing Cold Start

Unlike traditional Collaborative Filtering, which relies solely on historical ID interactions, Two-Tower models can naturally incorporate content features (side information). If a new item is added, the Item Tower can generate an embedding based on its text description or metadata immediately, allowing it to be recommended before it has received any user interactions. This significantly mitigates the Item Cold Start problem.¹⁶

3.3 Deep Learning Recommendation Model (DLRM): The Standard for Ranking

For the ranking stage, where the number of candidates is small (e.g., 500) and precision is paramount, Meta’s Deep Learning Recommendation Model (DLRM) has become a reference architecture, widely adopted and optimized across the industry.¹⁷

3.3.1 Architectural Components

DLRM is explicitly designed to handle the mix of categorical (sparse) and numerical (dense) features typical in Click-Through Rate (CTR) prediction tasks.

Sparse Features & Embeddings: Categorical inputs (e.g., User ID, City, Device Type) are high-cardinality and sparse. DLRM maps these to dense vectors using massive embedding tables. These tables can be enormous (terabytes in size), requiring model-parallelism distributed across multiple GPUs/TPUs.¹⁸
Dense Features & Bottom MLP: Numerical inputs (e.g., age, time since last visit) are processed by a “Bottom MLP” (Multilayer Perceptron) to transform them into the same vector dimension as the embeddings.
Feature Interaction Layer: This is the core innovation. Instead of simply concatenating all features (which ignores how they relate), DLRM explicitly computes the dot product between all pairs of embedding vectors and processed dense features. This captures second-order interactions (e.g., the correlation between “User Location” and “Item Category”) efficiently.¹⁷
Top MLP: The output of the interaction layer is concatenated with the processed dense features and fed into a “Top MLP,” which outputs the final prediction probability (e.g., probability of a click).¹⁹

3.3.2 Variants: Wide & Deep and DeepFM

DLRM is part of a family of ranking architectures.

Wide & Deep: Developed by Google, this model combines a “Wide” linear model (which memorizes specific feature co-occurrences, e.g., “User bought milk AND cookies”) with a “Deep” neural network (which generalizes to unseen combinations). This architecture balances the memorization power of linear models with the generalization power of deep learning.²⁰
DeepFM: This architecture replaces the “Wide” component with a Factorization Machine (FM), which automatically learns feature interactions without manual feature engineering, further automating the ranking process.¹⁹

3.4 Graph Neural Networks (GNNs): Capturing Structural Signals

While Two-Tower and DLRM models excel at processing tabular and sequence data, they often treat user-item interactions as independent pairs. Graph Neural Networks (GNNs) have gained traction for their ability to model the high-order connectivity inherent in interaction graphs.²¹

3.4.1 High-Order Connectivity

In a user-item bipartite graph, a GNN can propagate information beyond direct neighbors. A user node aggregates information not just from items they interacted with (1-hop), but also from items interacted with by similar users (2-hop and 3-hop neighbors). This allows the system to capture collaborative signals that are missed by simpler models.²¹ Algorithms like LightGCN simplify standard GCNs by removing non-linear activations, proving that the primary benefit comes from the graph propagation mechanism itself.

3.4.2 Graph Foundation Models (GFMs)

A emerging trend in 2024–2025 is the fusion of GNNs with Large Language Models to create Graph Foundation Models (GFMs). Traditional GNNs suffer from “structural bias” (struggling with nodes that have few edges) and lack semantic understanding of node content. GFMs address this by using LLMs to encode the textual content of nodes (users/items) and GNNs to model the structural relationships. This hybrid approach leverages the reasoning power of LLMs with the structural awareness of GNNs.²¹ For instance, frameworks like LLMGR inject GNN-learned structural embeddings into the token sequence of an LLM, fine-tuning the model to understand both text and graph context.²³

4. The Generative Revolution: Large Language Models (LLMs) in RS

The integration of Large Language Models (LLMs) represents the most significant disruption in Recommender Systems since the advent of deep learning. This shift, often termed LLM4Rec or GenRec, moves beyond predicting IDs to understanding and generating content.²⁴ Traditional RS suffer from a lack of semantic understanding; an ID-based model knows that User 123 liked Item 456, but it doesn’t know why. LLMs, pre-trained on vast textual corpora, bring “world knowledge” and reasoning capabilities to the table.¹

4.1 Taxonomy of LLM-Enhanced RS

Recent systematic reviews classify LLM integration into three primary paradigms ¹:

4.1.1 Discriminative LLM for Recommendation (DLLM4Rec)

In this paradigm, the LLM is used as a powerful feature encoder within a traditional recommendation pipeline.

Mechanism: The LLM processes textual data (item descriptions, user reviews) to generate high-quality dense vector representations (embeddings). These embeddings are then fed into standard ranking models like DLRM or Two-Tower networks.
Benefit: This approach significantly improves the handling of textual side information and enhances performance in cold-start scenarios where interaction data is sparse but text data is available.²⁵

4.1.2 Generative LLM for Recommendation (GLLM4Rec)

Here, the LLM is used to directly generate recommendations. The recommendation task is formulated as a language generation problem.

Mechanism: The system constructs a prompt, such as: “User History: [Matrix, Inception, Interstellar]. Recommend a next movie.” The LLM generates the title of the recommended item as text.
Advantage: This allows for zero-shot recommendation capabilities, leveraging the LLM’s pre-trained knowledge of item relationships. It enables the system to provide explanations (“I recommended ‘Tenet’ because you enjoy complex time-travel plots…”) naturally.²

4.1.3 Hybrid and Interactive Paradigms

These approaches move RS from a passive prediction tool to an active Conversational Recommender System (CRS). The system engages in multi-turn dialogues to elicit user preferences.

Example: If a user asks for “a good shoe,” the system might ask, “Are you looking for running shoes or formal wear?” before making a suggestion. This mimics the interaction with a human sales assistant.¹

4.2 LLM-Powered Agents: The Next Frontier

A sophisticated evolution of the generative paradigm is the LLM-Based Agent. These agents simulate a human-like recommender and typically possess a modular architecture ¹:

Profile Module: This module constructs and maintains a comprehensive user persona. It synthesizes interaction history into high-level traits (e.g., “User is a sci-fi enthusiast who prefers complex plots and dislikes horror”).
Memory Module: Agents require both short-term memory (for the current conversation context) and long-term memory (to recall preferences across sessions). This allows for systematic experience accumulation.¹
Planning Module: This module formulates strategies. For complex requests (e.g., “Plan a travel itinerary”), the agent decomposes the task into sub-tasks (search flights, book hotels, suggest restaurants) and optimizes for multiple objectives.
Action Module: The agent executes decisions by interacting with the environment. This includes querying a vector database, calling an external API to check item availability or generating the final response to the user.¹

Table 1: Comparison of Traditional vs. LLM-Based Recommender Systems

Feature	Traditional RS (CF/DL)	LLM-Based RS (GenRec)
Input Data	User/Item IDs, Interaction Matrix	Natural Language, Text, Prompts
Core Mechanism	Matrix Factorization, Dot Product	Causal Language Modeling, Reasoning
Output	Ranked List of IDs	Textual Items, Explanations, Conversations
Cold Start	Poor performance (requires history)	Strong performance (uses semantic content)
Explainability	Low (Black box)	High (Natural language rationale)
Latency	Low (<100ms)	High (Seconds per token)

4.3 Techniques: Tuning and RAG

To adapt general-purpose LLMs for the specific requirements of recommendation, several techniques are employed:

Fine-Tuning: This involves updating the model weights on recommendation datasets. Interaction logs are converted into instruction-tuning formats (Input: “User history: A, B, C.” Output: “D”). This aligns the LLM’s generation probability with the user’s preference distribution.²⁴
Prompt Tuning / In-Context Learning (ICL): Instead of updating weights, researchers design prompts that include few-shot examples of successful recommendations. This allows the model to “learn” the recommendation task dynamically within the context window.²⁹
Retrieval-Augmented Generation (RAG): To solve the “hallucination” problem (recommending non-existent items) and the “freshness” problem (LLMs having outdated knowledge), RAG frameworks retrieve relevant item documents from an external knowledge base and insert them into the LLM’s context window. This grounds the generation in factual, up-to-date data.²⁸

5. Specialized Domains: Healthcare and Drug Discovery

Healthcare Recommender Systems (HRS) represent a “high-stakes” domain where the cost of error is not merely user dissatisfaction, but potential harm to patient health. Consequently, the architectural choices and evaluation criteria here differ significantly from media or e-commerce.⁶

5.1 Unique Challenges in HRS

Data Sparsity and Heterogeneity: Unlike e-commerce users who may generate thousands of clicks, patients typically have sparse interaction histories (e.g., a few hospital visits per year). Data is also highly heterogeneous, scattered across Electronic Health Records (EHRs), medical imaging, wearable device streams and genomic databases.³¹
Trust and Explainability: A “black box” recommendation (e.g., “Take Drug X”) is unacceptable in clinical settings. Systems must provide reasoning grounded in medical literature or patient history. “Explainable AI” (XAI) is a mandatory requirement, not an optional feature.⁶
Privacy and Regulation: Adherence to strict regulations like HIPAA (USA) and GDPR (Europe) restricts the use of centralized model training. This has driven the adoption of Federated Learning, where models are trained locally on patient devices or hospital servers without sharing raw data, aggregating only model updates.⁶

5.2 Drug Recommendation Systems

Recent case studies from 2024–2025 highlight the sophistication of AI in personalized medication management:

Sequence and Graph Modeling: Drug recommendation is often modeled as a sequence prediction task (predicting the next medication based on a sequence of diagnosis codes) or a graph link prediction task (linking patient nodes to drug nodes in a medical knowledge graph).
Diabetes Management Case Study: Research has focused on HRS for patients with diabetes using collaborative filtering and clustering. By analyzing data from devices like Continuous Glucose Monitors (CGMs), systems can identify patient clusters with similar metabolic profiles. The system then recommends insulin adjustments or lifestyle changes that have proven effective for “neighboring” patients in the same cluster.³²
Medical Emergency Case Study: In emergency room settings, time is critical. Systems have been developed using stacked Artificial Neural Networks (ANNs) to recommend drugs by rapidly analyzing vital signs and initial diagnostic codes. These systems also act as a safety layer, checking for potential Adverse Drug Reactions (ADRs) before a physician administers medication.⁶

5.3 Mental Health and Therapy

The domain of mental health has seen the rise of Personalized Therapy Recommender Systems:

Intervention Recommendation: These systems recommend specific therapeutic activities (e.g., Cognitive Behavioral Therapy exercises, breathing techniques) based on real-time mood tracking from mobile apps. They utilize context-aware filtering to suggest interventions that match the user’s current emotional state and location.³⁴
Hybrid Neuro-Symbolic Models: To ensure safety, modern systems combine deep learning (for pattern recognition in voice or text logs) with symbolic reasoning (using medical knowledge graphs). This ensures that recommendations strictly adhere to clinical guidelines and do not suggest harmful or inappropriate actions.³⁶

6. Reinforcement Learning: Optimizing for the Long Term

While supervised learning models (like DLRM) excel at predicting immediate feedback (CTR), they are often “myopic,” failing to optimize for long-term user satisfaction. Reinforcement Learning (RL) addresses this by modeling recommendation as a sequential decision-making process.³⁷

6.1 The MDP Formulation

In an RL-based RS, the problem is formulated as a Markov Decision Process (MDP):

State ($S_t$): The user’s current context, including interaction history and demographic profile.
Action ($A_t$): The item (or list of items) recommended by the system.
Reward ($R_t$): The user’s feedback (click, rating, dwell time).
Policy ($\pi$): The function that maps states to actions to maximize the cumulative reward over time (e.g., maximizing session duration or Customer Lifetime Value).³⁷

6.2 Algorithms and Challenges

Common algorithms include Deep Q-Networks (DQN) and Actor-Critic methods. The Actor network generates recommendations, while the Critic network estimates the value (expected future reward) of those recommendations.

Off-Policy Evaluation (OPE): The primary bottleneck for RL in industry is evaluation. Unlike robotics or games, where an agent can simulate millions of trials, an RS cannot easily “simulate” users without introducing bias. Training an RL agent directly on live users is risky (bad recommendations can drive users away). Therefore, heavy research focuses on Offline Reinforcement Learning and building accurate User Simulators to train agents on historical log data before deployment.³⁷
Hybrid Approaches: Recent innovations involve integrating GNNs with RL. The GNN generates high-quality state representations by aggregating graph information, which the RL agent then uses to make more informed policy decisions.³⁹

7. Challenges and Normative Considerations

As RS become ubiquitous, the focus of research has broadened from technical optimization to “Responsible AI,” addressing the societal and ethical implications of these systems.

7.1 Fairness and Bias: The Matthew Effect

Recommender systems are prone to amplifying biases present in training data. A pervasive issue is Popularity Bias, where models over-recommend popular items (the “head”) and ignore niche items (the “long tail”).

The Matthew Effect: This “rich-get-richer” phenomenon creates a feedback loop where popular items get more exposure, generating more interaction data, which in turn makes them even more likely to be recommended. This hurts aggregate diversity and stifles discovery for smaller content creators or businesses.⁴¹
Fairness Definitions:

User Fairness: Ensuring equitable recommendation quality across different demographic groups (e.g., ensuring a job recommendation engine does not favor one gender).
Provider Fairness: Ensuring items from different providers (e.g., minority authors) have an equal opportunity to be exposed to relevant users.⁴³
Mitigation: Techniques include pre-processing (re-sampling training data to balance representation), in-processing (adding fairness regularization terms to the loss function) and post-processing (re-ranking the output list to enforce diversity constraints).⁴²

7.2 Scalability and the Deployment Gap

The “deployment gap” remains a critical challenge.

Latency vs. Accuracy: A complex graph attention network or an autoregressive LLM might provide better recommendations, but if it takes 500ms to respond, it is unusable for a real-time feed.
Solutions: Industry relies on Knowledge Distillation (training smaller student models to mimic large teacher models) and Vector Search (offloading candidate generation to optimized ANN indices) to bridge this gap.¹³
Multi-Stage Evaluation: Industrial systems use a “funnel” of evaluation metrics. Offline metrics (AUC, LogLoss) are used for model development. Online metrics (A/B testing on Clicks, Dwell Time, Retention) are the final arbiter of success. A model with better offline AUC does not always yield better online retention.¹²

7.3 Reproducibility and “RS4Good”

A growing movement, “RS4Good,” challenges the sole focus on profit-driven metrics. Researchers advocate for RS that optimize for societal well-being, such as reducing polarization, promoting educational content or minimizing addiction.⁴⁴ Simultaneously, the field faces a reproducibility crisis. Studies have shown that many “state-of-the-art” DL models fail to outperform simple, well-tuned baselines (like Nearest Neighbors) when evaluated rigorously. This suggests that much apparent progress may be due to inconsistent experimental setups rather than genuine architectural breakthroughs.⁴⁵

8. Future Directions: The Road to 2025 and Beyond

The trajectory of Recommender Systems is pointing towards integration, universality and simulation.

8.1 Universal Behavioral Profiles (RecSys Challenge 2025)

The ACM RecSys Challenge 2025 highlights the industry’s push towards Universal Behavioral Profiles. The challenge tasks participants with building a single, comprehensive user representation from diverse raw interaction logs (clicks, add-to-cart, page views, search queries).

Goal: Create a user embedding that is generalizable. Instead of training separate models for “Churn Prediction,” “Product Propensity,” and “Category Propensity,” a single “Foundation Model for User Behavior” should be able to support all these downstream tasks via simple transfer learning.⁴⁶
Dataset: The challenge utilizes a massive dataset of ~150 million page views and ~19 million users, underscoring the scale at which these universal profiles must operate.

8.2 Green Recommender Systems

As models like DLRM and LLMs grow to billions of parameters, their energy consumption becomes a significant concern. Green RecSys is an emerging research area focused on training efficient models—using techniques like quantization, pruning and knowledge distillation—to reduce the carbon footprint of AI without sacrificing performance.⁴⁷

8.3 Multi-Agent Simulation

To overcome the limitations of offline evaluation, researchers are building Multi-Agent Simulators. In these virtual environments, LLM-powered agents act as “users” and “items.” These agents interact with each other and the recommender system, allowing researchers to test new policies (e.g., an RL strategy to reduce boredom) in a safe sandbox before deploying them to real human users.¹

9. Conclusion

The field of Recommender Systems has matured from a narrow focus on matrix factorization into a multidisciplinary domain at the intersection of deep learning, natural language processing and behavioral economics. The period from 2017 to 2025 has been defined by the industrialization of deep architectures like Two-Tower models and DLRM, which now serve as the backbone of the internet economy.

However, the current frontier is defined by the Generative Turn. Large Language Models are not merely enhancing existing pipelines; they are reimagining the recommendation interface from a static list of items to a dynamic, conversational exchange. This shift offers solutions to longstanding problems like cold start and explainability but introduces new challenges in latency, cost and hallucination.

As we look forward, the tension between optimization (clicks, revenue) and normative values (fairness, well-being) will define the next generation of systems. Whether through “RS4Good” initiatives, rigorous regulation in healthcare or the development of energy-efficient “Green” models, the future of RS lies in building systems that are not just accurate, but trustworthy, efficient and universally beneficial. The development of Universal Behavioral Profiles and Multi-Agent Simulators suggests a future where RS are more robust, generalizable and aligned with complex human needs than ever before.

Works cited

A Survey on LLM-powered Agents for … – ACL Anthology, https://aclanthology.org/2025.findings-emnlp.620.pdf
GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models – arXiv, https://arxiv.org/html/2507.06507v2
A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice, https://arxiv.org/html/2407.13699v4
We’re Still Doing It (All) Wrong: Recommender Systems, Fifteen Years Later – arXiv, https://arxiv.org/html/2509.09414v1
A Survey of Real-World Recommender Systems: Challenges, Constraints and Industrial Perspectives – arXiv, https://arxiv.org/html/2509.06002v1
Recommendation Systems in Healthcare: Challenges, Limitations and Its Applications, https://ieeexplore.ieee.org/document/10773682/
Evidence-Based Software Engineering: A Checklist-Based Approach to Assess the Abstracts of Reviews Self-Identifying as Systematic Reviews – MDPI, https://www.mdpi.com/2076-3417/12/18/9017
PRISMA statement, https://www.prisma-statement.org/
Systematic Literature Review on Recommender System: Approach, Problem, Evaluation Techniques, Datasets – eClass, https://eclass.hmu.gr/modules/document/file.php/TP374/Homework/1st%20Homework%20-%20SLR%20evaluation/Systematic%20Literature%20Reviews/2024%20-%20Systematic%20Literature%20Review%20on%20Recommender%20Systems.pdf
A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice, https://arxiv.org/html/2407.13699v1
A Survey of Real-World Recommender Systems: Challenges, Constraints and Industrial Perspectives – arXiv, https://arxiv.org/pdf/2509.06002
Innovative Recommendation Applications Using Two Tower Embeddings at Uber, https://www.uber.com/blog/innovative-recommendation-applications-using-two-tower-embeddings/
Understanding Two-Tower Architecture in Recommendation Systems – Tredence, https://www.tredence.com/blog/understanding-the-twotower-architecture-in-recommendation-systems
The Two-Tower Model for Recommendation Systems: A Deep Dive | Shaped Blog, https://www.shaped.ai/blog/the-two-tower-model-for-recommendation-systems-a-deep-dive
Training Highly Scalable Deep Recommender Systems on Databricks (Part 1), https://www.databricks.com/blog/training-deep-recommender-systems-1
Towards a Theoretical Understanding of Two-Stage Recommender Systems – arXiv, https://arxiv.org/html/2403.00802v1
Deep Learning Recommendation Models on AMD GPUs – ROCm™ Blogs, https://rocm.blogs.amd.com/artificial-intelligence/dlrm/README.html
[1906.00091] Deep Learning Recommendation Model for Personalization and Recommendation Systems – arXiv, https://arxiv.org/abs/1906.00091
Mastering Feature Interactions: A Deep Dive into DLRM-Style Ranking Models (Wide & Deep, DeepFM, etc.) – Shaped.ai, https://www.shaped.ai/blog/mastering-feature-interactions-a-deep-dive-into-dlrm-style-ranking-models-wide-deep-deepfm-etc
Using Neural Networks for Your Recommender System | NVIDIA Technical Blog, https://developer.nvidia.com/blog/using-neural-networks-for-your-recommender-system/
Graph Foundation Models for Recommendation: A Comprehensive …, https://arxiv.org/pdf/2502.08346
[2011.02260] Graph Neural Networks in Recommender Systems: A Survey – arXiv, https://arxiv.org/abs/2011.02260
Graph Foundation Models for Recommendation: A Comprehensive Survey – arXiv, https://arxiv.org/html/2502.08346v1
LLM4Rec: A Comprehensive Survey on the Integration of Large Language Models in Recommender Systems—Approaches, Applications and Challenges – MDPI, https://www.mdpi.com/1999-5903/17/6/252
[2305.19860] A Survey on Large Language Models for Recommendation – arXiv, https://arxiv.org/abs/2305.19860
Recommender Systems in the Era of Large Language Models (LLMs) – IEEE Xplore, https://ieeexplore.ieee.org/document/10506571/
Evaluating User Experience in Conversational Recommender Systems: A Systematic Review Across Classical and LLM-Powered Approaches – arXiv, https://arxiv.org/html/2508.02096
A Comprehensive Survey on LLM-Powered Recommender Systems: From Discriminative, Generative to Multi-Modal Paradigms – IEEE Xplore, https://ieeexplore.ieee.org/iel8/6287639/10820123/11129085.pdf
Large Language Models for Recommendation: Progresses and …, https://www2024.thewebconf.org/docs/tutorial-slides/large-language-models-for-recommendation.pdf
Recommender Systems in Health Professions Education: Protocol for a Scoping Review, https://www.researchprotocols.org/2025/1/e69979
(PDF) Health Recommender Systems: A Survey – ResearchGate, https://www.researchgate.net/publication/334373139_Health_Recommender_Systems_A_Survey
Drug Recommendation System for Diabetes Using a Collaborative Filtering and Clustering Approach: Development and Performance Evaluation – Journal of Medical Internet Research, https://www.jmir.org/2022/7/e37233/
(PDF) A Drug Recommendation System for Medical Emergencies using Machine Learning, https://www.researchgate.net/publication/384168966_A_Drug_Recommendation_System_for_Medical_Emergencies_using_Machine_Learning
(PDF) Revolutionizing Dementia Care: A Brief Survey of Personalized Therapy Recommender Systems – ResearchGate, https://www.researchgate.net/publication/379558172_Revolutionizing_Dementia_Care_A_Brief_Survey_of_Personalized_Therapy_Recommender_Systems
Can a Recommender System Support Treatment Personalisation in Digital Mental Health Therapy? — MIT Media Lab, https://www.media.mit.edu/publications/recommender-system-treatment-personalisation-in-digital-mental-health/
(PDF) Recommender System for Personalised Wellness Therapy – ResearchGate, https://www.researchgate.net/publication/270553924_Recommender_System_for_Personalised_Wellness_Therapy
A comprehensive survey on reinforcement learning-based recommender systems: State-of-the-art, challenges and future perspective – CEUR-WS, https://ceur-ws.org/Vol-3917/paper62.pdf
[2109.10665] A Survey on Reinforcement Learning for Recommender Systems – arXiv, https://arxiv.org/abs/2109.10665
Reinforcement Learning-Based Recommender Systems Enhanced With Graph Neural Networks – IEEE Xplore, https://ieeexplore.ieee.org/iel8/6287639/10820123/11123173.pdf
Efficient Integration of Reinforcement Learning in Graph Neural Networks-Based Recommender Systems – IEEE Xplore, https://ieeexplore.ieee.org/iel8/6287639/10380310/10795136.pdf
Fairness and Diversity in Recommender Systems: A Survey – arXiv, https://arxiv.org/html/2307.04644v2
Fairness and Diversity in Recommender Systems: A Survey – arXiv, https://arxiv.org/pdf/2307.04644
Popularity Bias in Recommender Systems: The Search for Fairness in the Long Tail – MDPI, https://www.mdpi.com/2078-2489/16/2/151
[2411.16645] Recommender Systems for Good (RS4Good): Survey of Use Cases and a Call to Action for Research that Matters – arXiv, https://arxiv.org/abs/2411.16645
State of Recommender Systems in 2025: Algorithms, Libraries and Trends – Reddit, https://www.reddit.com/r/recommendersystems/comments/1iwwxpr/state_of_recommender_systems_in_2025_algorithms/
RecSys Challenge 2025, http://www.recsyschallenge.com/
RS_c – Recommender-Systems News and Curated Lists of Resources, https://recommender-systems.com/

A Comprehensive Systematic Review of Recommender Systems