Introduction

Recommended systems are one of the most important techniques used to introduce information about user needs, including related services, by analyzing user actions [1, 2]. For the recommender system, a collaborative filtering approach is used to introduce information that will meet the needs of the user. The collaborative filtering is based on similarly tasteless users, the same choice, and the idea that users who buy in the past will buy in the future [3]. Data production factors for the collaborative filtering process are user interest or user behavior in the form of the feature vector. This vector is paired with all other user carriers, and the most similar users are selected to be made in the vicinity of the user. From there, the guide contains information about things previously liked by users in their neighborhood [4]. However, collaborative filtering often suffers from vulnerabilities [5] that affect the quality of their neighborhood. Use like Cold-start, Sparsity, and Rating credibility.

Latest technologies, user actions, and user private data can be saved to social networks or e-commerce websites [6]. This type of technology makes it easier to analyze user settings, which is useful for recommender systems [7]. Many researchers are interested in how community-based networks are as complex as Hui et al. [8] identify k-clique communities among students and use this information to design an effective way of transmitting for mobile networks. There are also k-cliques used in the analysis of social networks [9, 10] is introduced in the movie recommender system to improve the accuracy of the movie recommender system. Ryu et al. [11] predicting the unemployment rate using social media analysis. Hao et al. [12] similarity evolution between graphs: a formal concept analysis approach. Carlos et al. [13] fuzzy linguistic recommender systems for the selective diffusion of information in digital libraries.

The main purpose of this document is to achieve a more effective solution than collaborative filtering. The proposed movie recommendation system is based on the abstract maximal clique method. This was the first time that this method of social network analysis was used to introduce in the movie recommender system, and it is found to be very efficient. The k-cliques, which are partially graphs that are fully connected to k vertices [14] and a very effective method to build groups in social networks analysis is proposed. In the proposed approach, a similarity measure of cosine is used to measure similarities between users. Then, use k-cliques to create clusters and introduce movie in similar groups using the collaborative filtering method. The proposed solution offers improved k-clique methods for more efficient performance than existing collaborative filtering and maximal clique. The effect of each experiment depends on the value of k. So to find a more effective solution, the improved k-clique will be offered. The improved k-cliques method is used to determine that the value of k is the optimal value of the k-clique method, which provides the maximum accuracy with the recommender system. The best value of k in k-clique is the value that results in the mean absolute percentage error to be the minimum value. For performance evaluation, use MovieLens data, which is general information in movie recommendation systems. To assess the effectiveness of a MovieLens dataset, it is divided into experimental and test data that are widely used in artificial intelligence. Comparison of collaborative filtering methods using k nearest neighbor, maximal clique method, k-clique method, and improve k-clique to evaluate performance.

The remainder of the article is organized as follows: “Related work” section, presents the relevant work in the field. In “An efficient movie recommendation algorithm based on improved k-clique” section, the proposed methodology is described in detail. In “Experimental analysis” section, the experiment will be conducted. The results of the experiments are discussed and the proposed method by the collaborative filtering method using the k-nearest neighbor method and the maximal clique method to compare the performance. Finally, in “Conclusion” section, the conclusions are presented and described in future directions.

Related work

We briefly describe the content needed to describe the proposed improved k-cliques method and the existing methods used for performance evaluation.

Recommendation systems

It is often referred to as recommender systems, a simple algorithm that aims to provide relevant and accurate information to users by filtering useful information from large data sets. The recommender system discovers information patterns in the data set by learning about consumer choices and generating results relevant to their needs and interests [15].

Recommender systems are becoming more popular and are being used today in many areas, such as movies, music, news, books, research articles, search terms, social tags, and products in general. There are also expert systems for collaborators, jokes, restaurants, clothing, financial services, insurance, and more. Most systems recommend the use of collaborative filtering or content filtering to create a list of recommendations [16].

Collaborative filtering

The idea of a collaborative filtering approach is to collect and analyze a large amount of information about user actions and settings and then predict which users will favor their similarity with other users [17]. The advantage of collaborative filtering is that it does not rely on content that can be analyzed and can accurately represent complex items. Algorithms are used to calculate user similarities or item similarities in recommender systems, such as k-nearest neighbors and Pearson correlation. Another collaborative filtering concept is based on assumptions. People who buy in the past will buy in the future and like the same product them like in the past.

When modeling from user actions, differences often occur between actual and predicted data collection models. One of the most popular examples of collaborative filtering is item-to-item collaborative filtering (Users who bought A also buy B). The Weaknesses of collaborative filtering methods include cold start, scalability, and sparsity. There are two types of collaborative filtering methods: memory-based and model-based collaborative filtering [18].

k-cliques detection in social network

The k-clique is a complete graph which has k nodes. In complex networks, there are usually a lot of complete graphs which have different scales. Generally, the value of k is greater than or equal to at least 3. If k is equal to 2, this means that. There is only an edge, which has little practical meaning.

Example Fig. 1 shows the undirected graph G including eight nodes and their relationships.

Fig. 1
figure 1

An example of k-clique in graph G

Evidently, the nodes C, D, E, and F show a 4-clique since these nodes are connected with each other. In a similar manner, the set of nodes D, F, and G show a 3-clique.

To obtain improved accuracy, various researchers have presented potential new methods. The k-cliques method introduced in social network analysis is introduced in the movie recommendation system in order to improve the accuracy of the movie recommendation system.

There has been some theoretical and empirical work on how the k-cliques can be detected in social networks. Hao et al. [10] used k-clique mining in dynamic social networks. Hao et al. [14] used the k-clique method for detecting the social network. Hao et al. [19] used the k-clique method base on formal concept analysis to community detection in the social network. Gregori et al. [20] used a parallel k-cliques to community detection on the large-scale network. Palla et al. [21] used a CFinder to determine the k-clique community, and subsequently extracted a set of k-clique communities. Kumpula et al. [22] proposed the use of the sequential clique percolation algorithm in order to improve detection efficiency. CFinder is a tool for finding overlapping clusters in the biological graph based on k-clique [23]. Saito et al. [24] presented k-dense as well as an efficient method for extracting the communities from complex networks. Adamcsek et al. [23] provided a faster CFinder to find the k-cliques. Farkas et al. [25] introduce a clustering algorithm for weighted network modules using k-clique methods, as the earlier k-clique did not consider weighted graphs until it was initiated. The edge weights of the discovered k-cliques were measured for their intensity. When the intensity value was found to be below the threshold, it would not be considered when building a cluster, and vice versa [25]. Duan et al. [26] solved the problem of k-clique clustering in a dynamic social network.

Cosine similarity measure

Cosine similarity is a similarity function that is often used in information retrieval, and it is also one of the most popular similarity calculations applied to text documents, for numerous information retrieval applications [27], as well as clustering [28]. When documents are represented as vectors, the similarity in two documents corresponds to the correlation between the vectors. This is quantified as the cosine of the angle between vectors. Given two documents \(\overrightarrow {{t_{a} }}\) and \(\overrightarrow {{t_{b} }}\), their cosine similarity can be represented as

$$SIM_{C} \left( {\overrightarrow {{t_{a} }} ,\overrightarrow {{t_{b} }} } \right) = \varvec{ }\frac{{\overrightarrow {{t_{a} }} .\overrightarrow {{t_{b} }} }}{{\left| {\overrightarrow {{t_{a} }} } \right| \times \left| {\overrightarrow {{t_{b} }} } \right|}}$$
(1)

where \(\overrightarrow {{t_{a} }}\) and \(\overrightarrow {{t_{b} }}\) are m-dimensional vectors over the term set T = {t1,…, tm}. Each dimension represents a weight in the document, which is non-negative and bounded between [0, 1]. For example, when combining two identical copies of document d to get a new pseudo document d’, the cosine similarity between d and d’ is 1, which means that these two documents are regarded to be identical. At the same time, given another document l, d and d’ will have the same similarity value as that of l, that is \(\varvec{ }SIM\left( {\overrightarrow {{t_{d} }} ,\overrightarrow {{t_{l} }} } \right) = SIM\left( {\overrightarrow {{t_{d'} }} ,\overrightarrow {{t_{l} }} } \right)\). Otherwise, documents with the same composition but different totals will be treated identically. When the term vectors are normalized to a unit length such as 1, the representation of d and d’ will be the same.

Maximal cliques

A maximal clique is a special clique which cannot be extended by adding any other nodes [29]. The phrase “maximal clique” is usually used in terms of a sub graph of a given graph G. So a sub graph H of a graph G is a maximal clique in G if H is isomorphic to a complete graph and there is no vertex vV(G)∖V(H) so that v is adjacent to each vertex of H.

In other words, a sub graph H of a graph G is a maximal clique in G if H is a clique (there is an edge between every pair of vertices in H) and there is no vertex in G but not in H that sends an edge to every vertex of H. So we could not create a bigger clique in G by adding another vertex to H. as shown in Fig. 2 below.

Fig. 2
figure 2

An example of maximal cliques

The red sub graph of the first graph is not a clique because there are two vertices in it not connected by an edge. The red sub graph of the second graph is a clique, but because there is a vertex in the larger graph connected to all 3 vertices in the sub graph, it is not a maximal clique. The red sub graph of the third graph is a maximal clique because it is a clique, and the last vertex not included in the sub graph does not send an edge to every vertex in the sub graph. The red sub graph of the fourth graph is a maximal clique because it is a clique, and neither of the vertices not included in the sub graph send an edge to every vertex in the sub graph. Note that the third and fourth red sub graphs are both maximal cliques even though the third red sub graph is larger.

An efficient movie recommendation algorithm based on improved k-clique

The process of gathering adequate information and workflow for recommendation systems is shown in Fig. 3. Figure 3 illustrates a movie recommendation system using the k-clique method.

Fig. 3
figure 3

Flow chart of proposed recommendation system

Process1 in Fig. 3: we need to collect some necessary personalization information about users. The personalization information that we use here includes gender, age, and occupation. The new user needs to sign up to the system and provide some necessary personalization information.

Process2 in Fig. 3: we used personalization information from MovieLens data to make up the experimental data and test data. The experimental data is used to calculate the similarity between users. The similarities in the users are measured with the help of a cosine similarity measure algorithm. At the end of this process, the adjacency matrix of user similarity is created as shown in Table 1.

Table 1 Adjacency Matrix for Users relationship

Process3 in Fig. 3: as previously described, the adjacency matrix shows the relationship between users. If the value of two users is 1, that means that they have similar characteristics, otherwise, they are considered not to be similar in characteristics to other. The adjacency matrix is used to cluster users into several groups based on the k-clique method. After this process is completed, several clusters of the users may be presented.

Process4 in Fig. 3: after the clusters of the users are found, we then need to find the most similar cluster with respect to the new user. In this process, the personalization information of the new user is compared with the personalization information of users in each group using cosine similarity methods to calculate the similarity between them. After finishing the comparison, the number of users, indicating the number of similar users in each group, will be counted. Then, the group having the highest number of ones will be chosen as the most similar cluster for the corresponding user, as shown in Table 2 for the result and shown in Fig. 4 for the detail of the process.

Table 2 List of the users in the group that the new user belongs to
Fig. 4
figure 4

The process of matching the new user to suitable clustering

Process5 in Fig. 3: when the group most similar to the new user is found, movies rated by the group members will be arranged in order of popularity, as movies rated nearest the maximum score will be listed at the top, and movies rated nearest the minimum score otherwise listed at the bottom, then the top 15 movies which get the maximum rated will be recommended to the new user, as shown in Table 3. In Table 3, the data are shown in the first column is referred to the id of the new user and the data are shown in the second column is referred to the id of the movie.

Table 3 List of the recommended movies to new users

Process6 in Fig. 3: the top 5 movies on the list will be recommended to the new user. In addition, it is up to the new user to decide which movies to choose from of the recommended movies.

The detail of the k-clique algorithm shows in the below.

figure a

We describe the improved k-clique method. The process of workflow for improved k-clique is similarities to the workflow of the k-clique method as shown in Fig. 3. In the k-cliques, we performed various experimental runs in order to find an improved k-cliques method. We have created the most appropriate group and recommendation count from these experimental runs. First, we ran an experiment for a dataset in which the users are the randomly-selected users from the list of users who rated at least 20 movies. In this case, we performed an experiment to cluster users into several groups by using a value of k = 3 to 14 and recommended the movies to the users with the numbers of movies rated being five, 10, and 15 movies. Afterward, we performed an experiment for the dataset in which the users are the random user from the list of users who rated at least 50, 100, and 200 movies serially. Then, we recommended the movie to the users with the number of movies rated being five, 10, and 15 movies successively. We performed 10 runs in order to calculate the accuracy and used the mean value as an accuracy value.

The detail of the Improved k-clique algorithm shows in the below.

figure b

Experimental analysis

The experimental set-up is used to validate the performance of existing approaches and the proposed approach. For performance evaluation, MovieLens data is used. In order to evaluate the performance of the MovieLens data, the collaborative filtering method is examined with the use of a k nearest neighbor, maximal clique method, k-clique method, and an improved k-clique method in order to evaluate the performance.

Experimental set-up

In order to implement the proposed method, the hardware and software below were used in experimentation, as shown in Table 4.

Table 4 Resources used for experimental set-up

R is a leading tool for machine learning, statistics, and data analysis that is also a programming language. R can be used to create objects, functions, and packages. The R language is platform-independent so it can be used on any operating system. The installation of R is free, so we can use it without having to purchase a license. It is not only a statistical package, and is open source, which means that anyone can examine the source code in order to see what exactly is being done on screen. Anyone can add features and fix bugs without waiting for the vendor to do this. Thus, it allows for integration with other languages (C, C++). It also enables you to interact with many data sources and statistical packages (SAS, SPSS). R has a large, growing community of users. The power of R is used in academia, data wrangling, data visualization, specificity, machine learning, and availability.

Datasets

The proposed approach is implemented using the MovieLens dataset [30], which is the most generalized data in movie recommendation systems. This dataset is divided into experimental data and test data. In the experimental and test data, each of them consists of 10 random datasets from a user who rated at least 20 movies, another at least 50 movies, another at least 100 movies, and another at least 200 movies. There are 800 users in the experimental data, 143 users in the test data, and 100,000 ratings from 943 users on 1684 movies. Simple demographic information for users includes age, gender, and occupation. Details of the dataset are given below.

Testing dataset is made of a list of the users’ personalization information record. The users in the testing dataset are a random list of users from MovieLens dataset. The number of users in the testing dataset is 143, with each user’s data consisting of three categories of personalization information: age, gender, and occupation. The experimental dataset is the made of a list of the users’ personalization information record. The users in the experimental dataset are the remaining users after random selection from the MovieLens dataset. The number of users in the experimental dataset is 800 users, with each user’s data consisting of three categories of personalization information: age, gender, and occupation.

Experimental result

The results of the experiment depend upon the various values of k as shown in Fig. 5.

Fig. 5
figure 5

The result of the experiments

After developing the proposed movie recommendation system using improved k-cliques, the number of movies that were to be rated by the new user among the movies recommended by the system was predicted. This paper adopts the most widely used evaluation metric for performance comparison of the proposed recommendation system. The mean absolute percentage error (MAPE) is a method of prediction accuracy of a forecasting method in statistics that is defined by the formula [31,32,33]:

$$\text{MAPE} = \frac{{{100\% }}}{n}\mathop \sum \limits_{t}^{n} \left| {\frac{{A_{t} - \varvec{ }F_{t} }}{{A_{t} }}} \right|$$
(2)

where At is the actual value and Ft is the forecast value.

We compared the MAPE values with the k-cliques method, the maximal clique method, and the collaborative filtering using k nearest neighbor method to evaluate the performance of the proposed method. If the number of outcomes is small it means that our method is useful. For the calculation of MAPE, Eq. (2) is used. For consistency of performance, 10 replications were performed for each trial and the mean values were taken.

First, we calculated the MAPE for the proposed method. The value of MAPE is 18.85% (see below) when RM = 5, rated at least 200 movies and value of k = 11.

$$\text{MAPE} = \frac{{100{\% }}}{n}\mathop \sum \limits_{t}^{n} \left| {\frac{{A_{t} - \varvec{ }F_{t} }}{{A_{t} }}} \right| = 18.85\%$$
(3)

The value detail of MAPE using the improved k-cliques is shown in Fig. 5 above. Figure 5 shown that when a number of movies recommended is RM = 5 the minimum value of MAPE is 18.85% where k = 11 and rated at least 200 movies, when number of movies recommended RM = 10 the minimum value of MAPE is 20.37% where k = 10 and rated at least 200 movies, the minimum value of MAPE is 21.41% where k = 11 and rated at least 200 movies when RM = 15. Therefore, the best number of movies recommend is RM = 5.

Second, we calculated the MAPE for the Maximal clique method. The value of MAPE is 28.63% (see below) when RM = 5, rated at least 200 movies.

$$\text{MAPE} = \frac{{100{\% }}}{n}\mathop \sum \limits_{t}^{n} \left| {\frac{{A_{\varvec{t}} - \varvec{ }F_{t} }}{{A_{\varvec{t}} }}} \right| = 28.63\%$$
(4)

The value detail of MAPE using the Maximal cliques is shown in Fig. 6 below. Figure 6 shown that when a number of movies recommended is RM = 5 the minimum value of MAPE is 28.63% where rated at least 200 movies, when number of movies recommended RM = 10 the minimum value of MAPE is 30.29% where rated at least 200 movies, the minimum value of MAPE is 32.00% where rated at least 200 movies when RM = 15. Therefore, the best number of movies recommend is RM = 5.

Fig. 6
figure 6

The value result of mean absolute percentage error using maximal cliques

Third, we calculated the MAPE for Collaborative Filtering using a k Nearest Neighbor. The value of MAPE is 19.69% (see below) when RM = 5, rated at least 200 movies.

$$\text{MAPE} = \frac{{100{\% }}}{n}\mathop \sum \limits_{t}^{n} \left| {\frac{{A_{t} - \varvec{ }F_{t} }}{{A_{t} }}} \right| = 19.69\%$$
(5)

The value detail of MAPE using the Collaborative Filtering using a k nearest neighbor is shown in Fig. 7 below. Figure 7 shown that when a number of movies recommended is RM = 5 the minimum value of MAPE is 19.69% where rated at least 200 movies, when number of movies recommended RM = 10 the minimum value of MAPE is 22.44% where rated at least 200 movies, the minimum value of MAPE is 24.09% where rated at least 200 movies when RM = 15. Therefore, the best number of movies recommend is RM = 5.

Fig. 7
figure 7

The value result of mean absolute percentage error based on collaborative filtering using a k nearest neighbor

Finally, we calculated the mean value result of MAPE of a k-clique method, collaborative filtering using a k nearest neighbor method and a maximal clique method. After comparing of these methods, the mean value result of MAPE computed by a movie recommendation system based on using the improved k-clique, a movie recommendation system based on using k-clique, a movie recommendation system based on collaborative filtering using a k nearest neighbor and that of the movie recommendation system based on maximal clique algorithm are more accurate and efficiency, as shown in Fig. 8 below. We also have to argue that the efficiency of the four methods is the best of our method.

Fig. 8
figure 8

Comparative result of the improved k-clique, k-clique, knn-CF and Maximal clique

Conclusions

In order to achieve more accuracy than collaborative filtering methods; the maximal clique method used in social network analysis introduces in this paper is the first time that used in a movie recommendation system and the output of this method is very effective. To achieve more accurate; the k-clique method, which is very effective in social networks, is introduced in this experiment and the output showed this method was more effective than maximal clique method. Therefore, this paper also proposed an improved k-cliques method to find the most efficient method than the k-cliques method. Finally, after several experiments were performed; in terms of the mean absolute percentage error used to calculate, shown in Fig. 5, which is the mean value calculated, the best method was found when k = 11 and rated at least 200 movies with five movies recommended to the user.

For performance evaluation, we evaluated the collaborative filtering method using a k nearest neighbor, maximal clique method, k-clique method and improved k-clique methods. The results showed that the improved k-clique method improved the precision of the movie recommendation system more than the other methods used in this.

Until now, it takes a long time to calculate the k-clique methods. In future studies, we will study ways to shorten this time. And data mining method will be used with the improved k-clique method in the future to increase the accuracy and effectiveness of the movie recommendation system.