1 Introduction

Because of the successful application of the Conversational Recommender System (CRS) in recommendation [1], it has developed rapidly in recent years. CRS obtains the user's interest by dynamically asking the user's preference about item attributes in the form of query and answer and finally achieves the purpose of recommending items [2,3,4]. Different from traditional recommendation methods [5], Sarwar er al., 2001; [6], because CRS generates recommendation results through dynamic interaction with users, CRS has the natural advantage of revealing the reasons behind the recommendation results. CRS addresses two important questions: What attributes to ask? Another question When to recommend item ? From the analysis of the CRS modules, CRS mainly has two components [2], namely the conversation module and the recommendation module. These two modules will automatically decide the timing of ask and recommendation according to the state of system learning at that time. So far, various conversational recommendation tasks have been proposed [2, 7, 8]. This paper focuses on the setting of multi-round conversation recommendation (MCR), the purpose of which is to make reasonable recommendation results within the shortest number of conversations through multiple conversations with users.

In the process of interacting with the user, MCR obtains the user’s dynamic preference according to the user’s feedback on items and attributes, and this dynamic preference is generally stored in the candidate set. In general, the candidate attribute set preserves the user's positive feedback attributes (preference attributes). However, existing methods to estimate whether a user likes a certain attribute only rely on the user’s binary yes/no answer. We think this is unreasonable, because the user is likely to be unfamiliar with a certain attribute, so the answer is no, so this attribute is included in the set of negative feedback attributes, causing the system to miss the best opportunity to capture the user's preference. For example, Estimation–Action–Reflection (EAR) [2] only encodes attribute-level feedback as an input feature and uses item-level feedback as a training instance for online updates, without considering the authenticity of users for attribute feedback. Simple conversational path reasoning (SCPR) [9] proposes a conversational path reasoning framework (CPR), which models conversational recommendation as a random walk on a graph and takes advantage of the graph structure to use 1-hop attributes as candidate attributes to be asked. Although many attributes are filtered and the action space is narrowed, they do not consider whether the filtered attributes are reasonable but simply filter. Unified conversational recommender (UNICORN) [10] proposed a unified conversational recommendation strategy, which modeled the recommendation module, conversational module, and when to ask or recommend modules in conversational recommendation as a unified whole, making their method scalable and general, and able to maintain a stable training process. The filtering of candidate attribute sets is similar to SCPR [9]. Feedback-guided preference adaptation network (FPAN) [11] designs a gating component to modify the embeddings of rejected items according to positive feedback attributes, which are aggregated to estimate users' preferences.

To solve the aforementioned problems, inspired by [9, 10], this paper proposes a novel conversational recommendation framework Attribute Clipping based on Dynamic Graph (ACDG). Assume that before the first round of dialogue, the candidate attribute set is {a1, a2, a4, a5, a7, a3,...}. After the first round of dialogue starts, the system first asks the user for his attitude towards attribute a1. If the user accepts attribute a1, then attribute a1 will be removed from the candidate attribute set before the second round of ask (Because attribute a1 has been asked in the first round of dialogue, a1 has been accepted by the user and does not need to be asked again.), the candidate attribute set should include {a2, a4, a5, a7, a3,...}. If the user rejects attribute a1, then before the second round of ask, attribute a1 will also be removed from the candidate attribute set (because attribute a1 was rejected by the user in the first round of dialogue), and the candidate attribute set should include {a2, a4, a5, a7, a3,...}, because attribute a1 is not accepted by the user, the rejected attribute a1 may be included in the candidate attribute set again as the conversation progresses. However, the candidate attribute space is still very large at this time. After finally passing through the attribute cutter, some noise attributes will be removed. ACDG reduces the attribute space and improves the quality of dialogue. In addition, to make good use of user feedback on attributes and items, this paper uses deep reinforcement learning to guide the dialogue process. To learn a stable dialogue system and make full use of state information, a residual structure is designed to strengthen the utilization of state information.

Figure 1 is an example of an illustration of the inference process. A user TOM is looking for a “pop” music artist. At this time, the system preserves the candidate attributes processed by the Attribute Cutting Machine. The system first asks TOM about his preference for the attribute "country" from the candidate attributes, and his response is positive. TOM is then asked for its preference for the attribute "folk", and its response is negative. But we still put the attribute "folk" into the candidate attributes for the next turn of ask. As we all know, much folk music is also very pop music nowadays. At present, the user answers no. The probable reason is that the user is not familiar with the attribute "folk", because the dialogue has only reached the third turn. From a system-side analysis, the current system has not learned an accurate representation of user preferences. Following such rules, the quality of the conversation improves as the conversation progresses. End users will get satisfactory recommendations.

Fig. 1
figure 1

The illustrative example of ACDG

In summary, the contributions of this paper are as follows:

  • We propose a novel framework ACDG for a conversational recommendation. In the attribute clipping module, we can identify noisy attributes, retain the most ask-worthy attributes, improve dialogue quality, and enhance predictive performance.

  • We adopt Deep Q-network (DQN) with a residual structure to strengthen state learning, make full use of state information, make the dialogue process more stable, and improve the fluency of dialogue.

  • We conducted a set of experiments on four public benchmark datasets that are widely used in CRS. The experimental results show that the proposed method significantly outperforms existing methods.

2 Related works

2.1 Conversational recommender systems

The conversational recommendation system, as an emerging topic in the field of recommendation, has attracted the attention of many researchers in recent years. It overcomes the shortcomings of traditional recommendation methods [5, 6, 12,13,14] that cannot make excellent interpretability of recommendation results and cannot dynamically obtain user preferences. The essence of a conversational recommendation system is to achieve the purpose of recommending to users through multiple dynamic interactions with users. At present, the conversational recommendation system mainly has the following four directions [3]: (1) Question-based user preference elicitation [7, 15,16,17] elicits user preferences by asking explicit questions. The approach needs to address two important questions: what to ask? How to adjust recommendations based on user response? The former focuses on constructing questions to obtain as much information as possible, and the latter uses user feedback information to make more appropriate recommendations. (2) Multi-turn conversational recommendation strategies [2, 9, 14, 18, 19] systems need to repeatedly interact with users and dynamically adapt to user feedback during multiple conversations. An excellent strategy depends on when to ask questions and when to make recommendations. (3) Dialogue understanding and generation [20,21,22,23] focuses on understanding user interests and intentions, enabling CRS to accurately understand user emotions and intentions from raw natural language, and generate readable, fluent, consistent, and meaningful natural language information. (4) Exploitation-exploration trade-offs [24, 25, 25, 26] balance the E&E problem of cold-start users through interaction models such as MAB-based methods and meta-learning methods. Among the above methods, this paper focuses on the problem of multi-turn conversational recommendation. Closely related to our work, conversational recommender model (CRM) [1] considers multi-turn conversation, where the system asks users for attribute-level preferences multiple times during the dialogue and makes recommendations in the final round. EAR [2] extends the framework to a multi-round conversational recommendation setting, allowing the system to make multiple recommendations. SCPR [9] models the conversational recommendation problem as an interactive path reasoning problem on the graph. Although some irrelevant candidate attributes are eliminated, there are still many irrelevant candidate attributes in the candidate set. FPAN [11] uses a gate mechanism to gather user online feedback information and adjust user preferences through the feedback information. However, existing methods focus on exploring item-level feedback and attribute-level feedback, and do not consider whether the filtering of candidate item sets and candidate attribute sets is reasonable.

2.2 Dynamic graph reasoning filter system

Due to the recommendation method based on graph structure has preeminent interpretability and containing much potential semantic information [27], many researchers use graph structure to infer user's preference. Wang et al. [28] proposed an end-to-end framework, RippleNet, utilizing a knowledge graph (KG) to assist recommender systems. RippleNet automatically discovers users' hierarchical latent interests by iteratively propagating users' preferences in KG. Although they used graph structure constraints and multi-hop ideas to filter many nodes, they did not consider whether these nodes were helpful in mining user preferences. Lei et al. [9] used the idea of multi-hop reasoning in RippleNet to propose the idea of conversation path reasoning, which combined the multi-round dialogue recommendation system with graph path reasoning. It models the user-system dialogue as a walk on the graph, using user feedback to update the walking path. Although this method takes advantage of the characteristics of the graph structure and uses 1-hop reasoning to filter most of the attributes, for a space with a large attribute set, there are still many irrelevant attributes in the filtered attributes, resulting in a long dialogue time and consuming user patience. This paper proposes a method of dynamically clipping attributes based on user explicit feedback, which not only reduces the filtered attribute space but also increases user satisfaction with the attribute space to a certain extent.

Table 1 lists frequently used notations through this paper, and it does not include some symbols that are used locally.

Table 1 Main notations used in the paper

3 Problem definition

This paper focuses on the recommendation method that is most suitable for real-world scenarios: multi-round Conversational Recommendation (MCR). This method asks the user about the attribute and makes multiple recommendations in the conversation [9, 10, 29]. In terms of the system, the CRS maintains a comprehensive collection of items (denoted as V) for recommendations. Each item v is associated with a set of attributes \({A}_{v}\). At the start of each conversation session on the user side, an attribute \({a}_{u}\) is initialized by the user u to indicate their preferred attribute. Subsequently, CRS has the freedom to inquire about the user's preference for an attribute chosen from the pool of candidate attributes \({A}_{cand}\) or recommend a specific number of items (e.g., Top-K) from the set of candidate items \({V}_{cand}\). The user responds by accepting or rejecting the inquiring attribute or the recommended items based on their preferences. Following the approach adopted by Lei et al. [2], it is assumed that the user maintains clear preferences for all attributes and items. This process of system ask and user response continues until CRS achieves a successful recommendation or reaches the maximum number of turns T.

4 Proposed methods

The overview of the proposed method, ACDG, is depicted in Fig. 2, which consists of four main components: Offline Representation Learning, Reasoning Module, Attribute Clipping Module, and Residual Deep Q-Learning Network.

Fig. 2
figure 2

Overview of the proposed framework, ACDG

4.1 Offline representation learning

Inspired by [2, 9, 30], this paper chooses Factorization Machine (FM) as the pre-training model. The embeddings of users, items, and attributes are obtained through FM. In the MCR scenario, FM implements both item prediction and attribute prediction by using multi-task pairwise loss.

4.1.1 Item preference prediction

Following the improved method of FM in [9], the first task is to capture the user's preference for items, and the function is as follows:

$$\begin{array}{c}{s}_{uv}=f\left(u,v,{A}_{u}\right)={u}^{T}v+\sum_{a\in {A}_{u}}{v}^{T}a\#\end{array}$$
(1)

where u, v and a denote the embeddings of users, items, and attributes, respectively. The first item denotes the user's general interest in the item, and the second item denotes the user's favorite attributes and affinity of the item.

Follow [9] use a pairwise loss to optimize, the specific function is as follows:

$${L}_{item}=\sum_{\left(u,v,{v}{\prime}\right)\in {D}_{1}}-ln\sigma \left(f\left(u,v,{A}_{u}\right)-f\left(u,{v}{\prime},{A}_{u}\right)\right) + \sum_{\left(u,v,{v}{\prime}\right)\in {D}_{2}}-ln\sigma \left(f\left(u,v,{A}_{u}\right)-f\left(u,{v}{\prime},{A}_{u}\right)\right)+{\lambda }_{\Theta }{\Vert \Theta \Vert }^{2}$$
(2)

where

$${\text{D}}_{1} : = \{ \left( {{\text{u}},{\text{v}},\,v^{\prime}} \right)|v^{\prime} \in {\text{V}}_{{\text{u}}}^{ - } \} ,{\text{ V}}_{{\text{u}}}^{ - } {\text{: = V }}\backslash {\text{ V}}_{{\text{u}}}^{ + }$$
$${\text{D}}_{2} : = \left( {{\text{u}},{\text{v}},\,v^{\prime}} \right)|v^{\prime} \in \widehat{{{\text{V}}_{{\text{u}}}^{ - } }},{ }\widehat{{{\text{V}}_{{\text{u}}}^{ - } }}{\text{: = V}}_{{{\text{cand}}}} { }\backslash {\text{ V}}_{{\text{u}}}^{ + }$$

where \({D}_{1}\) and \({{\text{D}}}_{2}\) denote the training instance, \({V}_{u}^{-}\) denotes the set of items that the user has not interacted with, \({V}_{u}^{+}\) denotes the set of items that the user has interacted with, \(\sigma\) denotes the sigmoid function, and \({\lambda }_{\Theta }\) is a regularization parameter to prevent overfitting. \({V}_{cand}\) denotes the set of candidate items, which is dynamically changing and will be mentioned in later chapters.

4.1.2 Attribute preference prediction

Accurate attribute prediction is the second objective task. The role of this attribute preference prediction is mainly to pick attribute queries during the session. The specific formula is as follows:

$$\begin{array}{c}\widehat{g}\left(a|u,{A}_{u}\right)={u}^{T}a+\sum_{{a}_{i}\in {A}_{u}}{a}^{T}{a}_{i}\#\end{array}$$
(3)

The first item represents the user's preference for an attribute, and the second item can be understood as the affinity between two attributes.

Similar to the prediction of \({L}_{item}\), leverages on the pairwise loss for attribute prediction:

$$\begin{array}{c}{L}_{item}=\sum_{\left(u,a,{a}{\prime}\right)\in {D}_{3}}-ln\sigma \left(\widehat{g}\left(a|u,{A}_{u}\right)-\widehat{g}\left({a}{\prime}|u,{A}_{u}\right)\right)+{\lambda }_{\Theta }{\Vert \Theta \Vert }^{2}\#\end{array}$$
(4)

where

$$D_{3} = \{ \left( {u,a,\,a^{\prime}} \right)|a \in A_{v} ,a^{\prime} = A\backslash A_{v} \}$$

represents the attributes of the item v, so \(a\) and \({a}{\prime}\) denote the attribute instances that the item contains and does not contain, respectively.

4.1.3 Multi-task learning

Since Lei et al. [9] found that item prediction and attribute prediction can learn together to improve each other's performance, this paper also adopts multi-task pairwise loss to achieve these two goals according to their implementation method:

$$\begin{array}{c}L={L}_{item}+{L}_{attr}\#\end{array}$$
(5)

4.2 Reasoning module

First, we construct an undirected heterogeneous graph \(G=(U, V, A)\), which includes an item set V, an attribute set A, and a user set U. Each item v is connected to some attributes a, and each user u is connected to some items v. There are two triplet relationships in the graph, namely {user, relation, item} and {item, relation, attribute}. Then, initialize the attributes of user preferences, use the target attribute \({a}_{u}\) as the initialization attribute before the dialogue starts, randomly select an item v from the items that the user has interacted with, and then randomly select an attribute a from the item as the target attribute \({a}_{u}\) for the next dialogue. Immediately afterwards, the system first asks the user's preference about a certain attribute \({a}^{(t)}\) at time step t, and then the role of the queried attribute \({a}^{(t)}\) is changed to target attribute \({a}_{u}^{(t)}\). The \({a}_{u}^{(t)}\) is used as the seed node, the neighbor node is used as the intermediary node, and the 2-hop attribute nodes are saved and used as candidate attributes. Among them, the types of intermediary nodes are users and items. Specific examples are {\({a}_{u}^{t}\) – v1 – a1 – v2 – a2}, {\({a}_{u}^{t}\) – v1 – u1 – v2 – a1}, etc. Eventually a dynamic graph will be formed, named \({G}_{u}^{(t)} = ({A}_{cand}^{(t)}, {V}_{cand}^{(t)}, {u}^{(t)})\). This graph \({G}_{u}^{(t)}\) is extracted from graph G. The dynamic graph \({G}_{u}^{(t)}\) contains a user, items, and attributes. After the candidate attributes are clipped by the Attribute Clipping Module (introduced in Sect. 4.3), the final combination of candidate attributes is obtained. Then through the actions generated by Residual Deep Q-Learning (introduced in Sect. 4.4), the dynamic graph is dynamically updated online according to the user's feedback.

4.3 Attribute clipping module

This section mainly introduces how to obtain the final candidate attribute set \({F}_{{A}_{cand}}^{(t)}\) at time step t, \({A}_{cand}^{(t)}\) contains some attributes at time step t, After the \({A}_{cand}^{(t)}\) is obtained through the Reasoning Module, it will enter the Attribute Clipping stage. Although many attributes are filtered by taking advantage of the natural advantages of the graph structure, the resulting attribute space \({A}_{cand}^{(t)}\) is still large, so in this section we will introduce in detail how to further cut attribute nodes. First find the user set \({U}_{{a}_{u}}^{(t)}\) who likes this target attribute \({a}_{u}\), and then collect the attribute \({A}_{U}^{(t)}\) preferred by \({U}_{{a}_{u}}^{(t)}\). In order to avoid noise users in \({U}_{{a}_{u}}^{(t)}\), we will use the score of f(u, \({U}_{{a}_{u}}^{(t)}\)) to distinguish the importance of each user in \({U}_{{a}_{u}}^{(t)}\) to the current user u. Then we will use the score of f(\({a}_{u}\), \({A}_{U}^{(t)}\)) to distinguish the similarity between each attribute in \({A}_{U}^{(t)}\) and \({a}_{u}\), so as to achieve the purpose of clipping attribute nodes.

$${V}_{cand}^{\left(t\right)}={V}_{{A}_{u}^{\left(t\right)}}\backslash {V}_{rej}^{\left(t\right)},{A}_{cand}^{\left(t\right)}={A}_{cand}^{\left(t\right)}\backslash {(A}_{u}^{\left(t\right)}\cup {A}_{rej}^{\left(t\right)})$$
(6)
$$f\left( {u,U_{{a_{u} }}^{\left( t \right)} } \right) = \frac{{u \odot U_{{a_{u} }}^{\left( t \right)} }}{{u \times U_{{a_{u} }}^{\left( t \right)} }}$$
(7)
$$F_{{A_{cand} }}^{\left( t \right)} = f\left( {a_{u} ,A_{U}^{\left( t \right)} } \right) = \frac{{a_{u} \odot A_{U}^{\left( t \right)} }}{{a_{u} \times A_{U}^{\left( t \right)} }}$$
(8)

where u, \({U}_{{a}_{u}}^{(t)}\) denote the user embedding obtained through offline representation learning. \({a}_{u}\) and \({A}_{U}^{(t)}\) denote the attribute embedding obtained through offline representation learning. \(\odot\) denotes Hadamard product. It is used to denote the element-wise multiplication of two matrices or vectors, that is, to multiply the elements of two matrices or vectors with the same position to obtain a new matrix or vector. || || denotes the second norm of the vector.

4.4 Residual deep Q-learning

This section mainly introduces the generation of ACDG actions and the selection of actions. During the dialogue between the system and the user, we designed three vectors to represent the dialogue state \(s\), which are the number of candidate items, the number of positive feedback attributes, and the dialogue history. The input of Residual Deep Q-Learning is state \(s\), and the output action is ask or recommendation. State \(s\) is concatenated by these three vectors:

$$\begin{array}{c}s = {s}_{his}\oplus {s}_{item} \oplus {s}_{pos}\#\end{array}$$
(9)

where \({s}_{his}\) denotes the history of the conversation. It is expected that the guidance agent can make smarter decisions and generate more fluent dialogues. \({s}_{item}\) records the number of candidate items. When the space for candidate items is small, this is the best time to recommend. \({s}_{pos}\) record the number of positive feedback attributes. Because users' feedback on attributes is the most direct and explicit, the number of attributes can reflect the breadth of user interests to a certain extent.

When the output action is ask, the system will select the most worthy of attribute from the candidate attribute \({A}_{cand}^{(t)}\) for ask. When the output action is recommendation, the system will select Top-K items from the candidate item \({V}_{cand}^{(t)}\) for recommendation. We use the inner product between the embeddings of the two vectors as the user's affinity score for the item, and select the next item to ask about based on the score.

$$\begin{array}{c}{s}_{uv}={u}^{T}v + \sum_{a\in {A}_{u}}{v}^{T}a\#\end{array}$$
(10)

where u, v, a represent the embedding of user, item, and memory attributes respectively, and \({A}_{u}\) denotes the attribute set of user preference.

We adopt the weighted entropy selection attribute query proposed by [9], because the weighted entropy can reduce the uncertainty of the item in a limited way:

$$f\left(u,a,{V}_{cand}\right)=-prob\left(a\right)\cdot {log}_{2}\left(prob\left(a\right)\right), prob\left(a\right)= \frac{\sum_{v\in {V}_{cand}\cap {V}_{a}}\delta \left({s}_{uv}\right)}{\sum_{v\in {V}_{cand}}\delta \left({s}_{uv}\right)}$$
(11)

where \(\updelta\) denotes the sigmoid function, \({V}_{cand}\) represents the candidate item, and \({V}_{a}\) represents the item whose function contains attribute a.

5 Experiments

In this section, we will evaluate our proposed ACDG framework on four real datasets.

5.1 Dataset description

We conduct experiments on four multi-turn conversational recommendation benchmark datasets. Table 2 gives the details of the dataset.

  • LastFM and Yelp. The LastFM dataset is used to evaluate music artist recommendations, while the Yelp dataset is used for business recommendations. To facilitate modeling, Lei et al. [9] artificially merged the relevant original attributes of LastFM into 33 coarse-grained groups, and constructed a secondary classification for Yelp, with 29 primary classifications and 590 secondary attributes.

  • LastFM* and Yelp*. Lei et al. [9] believe that manually merging attributes is not necessarily the best practice in practical applications, so they use the original attributes to reconstruct the two datasets. For comparison, we use both versions in our experiments.

Table 2 Summary statistics of datasets

5.2 Experimental setup

5.2.1 Training details

We split each dataset for training, validation and testing in a ratio of 7:1.5:1.5. And set top k item as 10, and maximum turn T as 15. An online training for reinforcement learning is used in the consultation step. We use a user simulator (c.f. Sec 5.2.2) to interact with the user to train the policy network using the validation set. The detailed rewards to train the policy network are rec_suc = 1, rec_fail = -0.1, ask_suc = 0.01, ask_fail = -0.1, and quit = -0.3. The parameters of the DQN are empirically set as follows: the experience replay memory size is 50,000, the sample batch size is 128, discount factor γ is set to be 0.999. We optimize the policy network with RMSprop optimizer and update the target network every 20 episodes.

5.2.2 User simulator For MCR

Due to the interactive nature of MCR, it is essential to train and evaluate it by interacting with users. Following the user simulator adopted in [1, 2, 9, 10], we simulate a conversation session for each observed user-item interaction pair (u, v). In this approach, the item v is considered as the ground-truth target item, and its attribute set \({A}_{v}\) is treated as the oracle set of attributes preferred by the user u in this conversation session. The session is initiated by the simulated user who randomly chooses a certain attribute from \({A}_{cand}\). Then, the session follows the process of "System Ask, User Respond".

5.2.3 Baselines

We compare our method ACDG with the following several models:

  • Max Entropy. This method adopts a rule-based strategy for asking and recommending. The selection of attributes to ask is based on the criteria of maximum entropy, while the top-ranked items can be recommended with a predetermined probability [2].

  • Abs Greedy. This method focuses solely on the recommendation process and updates the model by integrating feedback at the item-level. This process continues until the CRS successfully provides a recommendation or exceeds the maximum number of conversations turns.

  • CRM [1]. This model incorporates a belief tracker to record the user's preferences and employs reinforcement learning (RL) algorithms to determine the optimal interaction policy with the user. The RL algorithm is implemented using a policy network, where the state vector is derived from the belief tracker.

  • EAR [2]. This approach adopts a three-stage solution that emphasizes on the deep interaction between the conversation component and recommendation component using an RL framework similar to that of CRM.

  • SCPR [9]. This method utilizes interactive path reasoning on the graph to eliminate candidate attributes and employs the DQN [31] framework to determine the optimal time to ask or recommend.

  • FPAN [11]. The proposed method employs a gating mechanism to aggregate user online feedback information and update the user's preference representation in order to achieve more accurate user preference estimation.

5.2.4 Evaluation metrics

In line with previous research on multi-round conversational recommendation [2, 9, 11], this study employs success rate at turn t (SR@t) [30] as a metric to measure the cumulative ratio of successful conversational recommendations up to turn t. Additionally, average turn (AT) is used to evaluate the average number of turns across all sessions. If a conversation session reaches the maximum turn T, the turn count for that session is capped at T. A higher SR@t indicates better CRS performance at turn t, while a lower AT signifies overall higher efficiency.

5.3 Performance comparison of ACDG with existing models

5.3.1 Overall performance

Table 3 shows the performance comparison between the proposed method ACDG and all baselines on the four datasets. The higher the SR value, the better the model performance, and the lower the AT value, the better the model performance. Our method achieves better performance than the baseline methods. Especially on the LastFM* and Yelp* datasets, our method outperforms all baseline methods. And on these two datasets, our method has higher success rates and lower average turns. This demonstrates that the attribute clipping module in our method works well for datasets with large attribute spaces. However, the performance on the LastFM and Yelp datasets did not exceed FPAN. This may be due to the manual processing of the dataset [9], resulting in less attribute space. Our attribute clipping module does not perform well, and even then our method achieves performance similar to SCPR (Table 3).

Table 3 Experimental results

5.3.2 Comparison at different conversation turns

Figure 3: Comparisons at Different Conversation Turns. shows a more fine-grained performance comparison of the success at each turn (SR@t) with a typical approach, SCPR. Because it has relatively excellent performance on all datasets. To better observe the difference between the 2 methods, this paper chooses the relative success rate for comparison. The line with y = 0 in the figure represents the success rate curve of SCPR relative to itself. There are several notable observations as follows:

  1. (i)

    The method ACDG proposed in this paper shows a relatively good performance on four datasets. It is more evident in datasets with large attribute space, such as the LAST_FM dataset.

  2. (ii)

    On the LAST_FM and LAST_FM* datasets, our method ACDG comprehensively outperforms SCPR, which indicates that our attribute clipping module plays an important role. Facing the coarse-grained YELP dataset with small attribute space, the performance of our method is close to that of SCPR. In the first few conversations in Yelp*, the performance of our method is very close to that of the SCPR model, the reason for this is most likely due to the fact that YELP* itself has a small attribute set space (only 590), so in the first few rounds our method does not work very well, but as the conversations continue, our method is much more capable of capturing the user's interests, and the accuracy of the recommendations continues to improve.

  3. (iii)

    Since there exists a (user1, friend, user2) triplet relationship in the music artist recommendation datasets LAST_FM and LAST_FM*, we also consider filtering some of the users by calculating the degree of similarity between the preferences of the user who is currently in a dialogue with the user's friend, and the attributes of the filtered user connections are used as a set of candidate attributes. As the ACDG -user curve in Fig. Experiments demonstrate that our approach is effective.

  4. (iv)

    In order to further consider the reasonableness of the filtered attribute set in (iii), after obtaining the user-preferred attributes, the similarity between the attributes is not considered as in ACDG, so we further consider the similarity between the filtered attributes, and further clip some of the attributes. As the ACDG -user-attr curve in Fig. It is worth noting that in the LAST_FM coarse-grained dataset, the attribute set is too small (just 33), resulting in a lower SR value than the SCPR in the later dialogues, and a large part of the reason may be that the excessive clipping of the attribute set by the ACDG -user-attr may lead to the sharp reduction of the attribute space, which filters out the attributes that the users are likely to prefer This is well demonstrated by the ACDG -user-attr curve in the LAST_FM* dataset (8438 attributes).

  5. (v)

    Summarizing Fig. 3, it can be easily found that our method ACDG is particularly suitable for datasets with large attribute spaces. The attribute space sizes of the Yelp and Yelp* attributes are 29 and 590, respectively.

Fig. 3
figure 3

Comparisons at different conversation turns

5.4 Case analysis

In order to intuitively study the difference between the proposed ACDG and SCPR methods, we randomly sample a real-world interaction from the LAST_FM*. In Fig. 4, conversations are generated by ACDG and SCPR, respectively. \(\left|{{\text{A}}}_{{\text{cand}}}\right|\) denotes the number of candidate attributes at the current turn. When faced with a large attribute space, although SCPR is able to take advantage of the graph structure to filter many attributes, there are still a large number of irrelevant attributes in the set of candidate attributes, which is likely to result in asking about attributes that the user dislikes, or even asking about attributes that the user hates. This leads to the need for a long dialogue to obtain a positive response from the user. However, our method ACDG can further consider the relevance between the attributes of the candidate attribute set and the user’s current preferred attribute set for the purpose of clipping some of the irrelevant attributes and further filtering the candidate attribute set. As can be seen in the left figure (a), the attributes asked by our method ACDG are very similar to the user-preferred attributes, while the attributes asked by SCPR in the right figure (b) will deviate from the user-preferred attributes.

Fig. 4
figure 4

Sample conversations

6 Conclusion and future work

In this paper, we propose a novel framework ACDG for conversational recommendation. In ACDG, we design an explicable and adaptive attribute clipping module to select users’ attribute preference sets. In addition, we use a residual deep Q-learning method to obtain a flexible policy network that selects the appropriate action at each conversation turn, which can better exploit users' responses in a global context. Compared to the baseline CRS methods, the experimental results on four real-world datasets verify the effectiveness of our proposed method. Considering that the current experiments use a single-agent strategy that uses the same recommendation strategy for all users, in the future we will consider using different agent solutions for different groups of users, i.e., multi-agent solutions.