1 Introduction

TikTokFootnote 1, also known as Douyin, is a social network that allows its users to make funny and creative videos of short duration, typically 15 to 60 sec. It has quickly become the social network of choice for several categories of users, especially for the so-called Generation Z (Singh and Dangmei 2016). There are several features characterizing TikTok with respect to other social networks. They include: (i) the HD resolution and full screen display; (ii) the presence of advanced video editing features; (iii) the possibility of adding a music clip to a posted video; (iv) FYP and the associated recommendation algorithm; (v) a much higher prevalence of challenge-related posts than in the other social networks. In particular, the last two features are very characteristic for TikTok.

FYP is the acronym of For You Page. It is the first page shown to a user who opens TikTok. The videos it contains are determined by a recommender system very different from the one of the other social networks. It starts from the consideration that each person’s profile is unique and tailored to her. The system recommends content by ranking videos based on a combination of factors that reflects what the user is interested in and what she is not. These factors include her interactions, video information, device and account settings. All these factors are processed by the recommender system in order to rank videos to show. Interestingly, unlike many other social platforms, neither the follower count of the user posting the video nor whether she had previously high-performing videos are direct factors in the recommendation algorithm. Ultimately, FYP is powered by user feedback; it is designed to continuously improve, correct and learn from user engagement with TikTok. FYP and its recommendation algorithm are crucial to the distribution of challenges on TikTok.

A challenge is a viral showdown/competition. It is identified by a hashtag and starts with a user who posts a video with that hashtag and invites other ones to replicate the same video in their own way. Most challenges are fun and harmless; however, there are also other ones related to harmful or dangerous behaviors. Actually, the latter kinds of behaviors existed before TikTok and have been inherited by the challenges of this social medium, in particular by what we call “dangerous challenges”. Harmful or dangerous behaviors have become so common in social media that they have recently attracted the interest of many researchers (Lawson 2018; Fritz and Gonzales 2018; Wood et al. 2019; Page et al. 2018; Linabary and Corple 2019; Mitchell et al. 2014; Lerman et al. 2017; Guan et al. 2015; Zhu et al. 2016; Pater and Mynatt 2017; Robert et al. 2015). Further analyses have been carried out to assess the role of social platforms with regard to sexual or aggressive behaviors that place youth at risk (Livingstone and Smith 2014). Finally, several researchers have investigated online self-injury and cyber-suicide (Patton et al. 2014; Muñoz-Sánchez et al. 2018; Khasawneh et al. 2020). TikTok removes challenges reported as dangerous and has increased safety controls. However, considering the huge number of users and challenges created every day on this social platform, as well as the usage of some tricks exploited by the authors of dangerous challenges to bypass controls, the risk that dangerous challenges are accessible is real.

In this paper, we want to make a contribution to address this problem and propose an investigation of TikTok challenges. In particular, we analyze their lifespans to extract time patterns that allow the classification of challenges into dangerous and non-dangerous ones. By the term “lifespan” we do not mean the time interval between the moment a challenge is launched and the one it disappears permanently. In fact, there are challenges that never disappear even though they have not been active for a long time. From our point of view, the lifespan of a challenge is the period that elapses from the time it is launched to the time it is no longer capable of eliciting at least limited interactions with users. As will be clear in the following, the classification approach we are proposing in this paper is currently able to support the detection of dangerous challenges only near the end of their lifespan, or at least after a presumably long time period. On the other hand, the early detection of dangerous challenges is not the objective of this paper. In fact, we want to propose a challenge classification approach that once has its validity verified, represents a first step in the direction of early detection of dangerous challenges. To reach the latter goal, in the future, we can think of greatly reducing the granularity of the time intervals taken into account (which, as we will see, is currently coarse) in such a way as to identify the time patterns allowing the detection of the dangerous challenges at an early stage.

Actually, despite its young age, TikTok has already been the subject of many studies in the past literature. However, although challenges are one of the most important aspects of TikTok, only very few authors have still analyzed them (Zulli and Zulli 2020; Klug 2020; Chen et al. 2021b; Su et al. 2020). Furthermore, these authors investigated aspects very different from the challenge lifespan and the ability to distinguish non-dangerous challenges from dangerous ones, which represent the core of our paper.

To perform our analysis, we followed the evolution of seven non-dangerous and seven dangerous challenges. For each challenge, we considered the corresponding videos posted, and, for each video, we considered a set of features (e.g., duration, number of likes received, number of followers of its authors, etc.). Next, we defined a social network-based model to represent a TikTok challenge. At this point, we began our analysis to find features capable of distinguishing non-dangerous challenges from dangerous ones. First, we focused on the characteristics of the videos and the basic structural parameters of social networks (for example, number of nodes, average clustering coefficient, density, etc.). Then, we considered the challenge lifespans and could see that the two types of challenges showed very different lifespans.

In order to capture such a difference, we divided each lifespan into suitable intervals. After that we performed a clustering activity to group intervals into homogeneous clusters. To define the characteristics of each cluster, we used the properties of the videos and social networks corresponding to the challenges, which the cluster’s intervals referred to. Then, we defined the sequence of intervals that characterized the lifespan of each challenge. From the examination of such sequences, after a further study aimed at demonstrating that some clusters were substantially equivalent to each other, we were able to determine a time pattern that characterized all the non-dangerous challenges and three time patterns that could be found in the dangerous ones. Finally, we verified if what we had found with the 14 initial challenges was valid in general. To do this, we performed two further tests with a much higher number of challenges and were able to verify that our results were very accurate also in this case.

This paper is structured as follows: In Sect. 2, we review the Related Literature. In Sect. 3, we illustrate the dataset used for our investigation. In Sect. 4, we describe our social network-based model for representing TikTok challenges. In Sect. 5, we present a structural analysis of the social networks associated with challenges. In Sect. 6, we define the intervals of a challenge lifespan. In Sect. 7, we illustrate our approach to search for time patterns characterizing challenge lifespans. In Sect. 8, we describe the current limitations of our approach that represent the starting point for our future researches in this field. Finally, in Sect. 9, we draw our conclusions and examine some possible future developments of our research efforts.

2 Related literature

Similar to what happened in the past with other social media, such as Instagram, Facebook (Pensa et al. 2019), Twitter (Davidson et al. 2020; Boutet et al. 2013), Yelp and Reddit (Cauteruccio et al. 2020; Corradini et al. 2021b; Cauteruccio et al. 2022), TikTok has recently attracted the interest of researchers from different fields (Stokel-Walker 2020). For instance, it has been the subject of investigation by researchers working in the context of marketing (Choudhary et al. 2020), social network analysis, machine learning and deep learning, health and politics (Zhu et al. 2020; Li et al. 2021; Chen et al. 2021a; Medina Serrano et al. 2020; Lujain et al. 2020; Sodani and Mendenhall 2021), just to cite a few of these fields.

Being considerably popular among teenagers (Herrman 2019), TikTok has led to the emergence of new types of influencers (Kennedy 2020). Many people have become influencers in this social medium without even planning to do so. The authors of (De Veirman et al. 2020) studied how teenage influencers perceive the process of becoming an influencer and how they feel the impact they can have on other users through their social media activities. In Ishihara and Oktavianti (2020), the authors focus on “personal branding”, i.e., the process of creating a brand from a person’s profile. In Azpeitia (2021), the authors wanted to understand whether social media marketers successfully reach their target audience and generate sales in TikTok.

In a successful social medium like TikTok, many privacy and security issues need to be addressed (Neyaz et al. 2020; Khoa et al. 2020; Meral 2021).

Clearly, TikTok has greatly stimulated the interest of researchers working in the context of Social Network Analysis. For example, in Weimann and Masri (2020), the authors analyze the behavior of TikTok with respect to extremist users.

Another aspect of TikTok that has caught the attention of researchers concerns its recommendation algorithm (Davis 2021; Xu et al. 2019; Simpson and Semaan 2021). In fact, when a user scrolls through her home page, TikTok suggests her some videos to watch. The authors of Zhao (2021) show that TikTok’s recommendations are based on user preferences and needs. This content-based component of the algorithm is complemented by a second collaborative filtering one. The authors of Klug et al. (2021) investigate the principles behind this recommendation algorithm and find some expedients to trick and make it suggest certain trends to other users. They found that some of the aspects taken into consideration by the TikTok’s recommendation algorithm are hashtags, time of posting and user engagement. In Bandy and Diakopoulos (2020), the authors explore the role of the TikTok’s recommendation algorithm in amplifying call-to-action videos promoting collective action against the Tulsa rally. They show how these videos, promoted by 600 TikTok users, received more views than other ones not suggested by the TikTok’s recommendation algorithm.

Some authors have focused on developing and/or applying machine learning and deep learning algorithms to understand the dynamics of TikTok. For example, the authors of Yang et al. (2021) developed an algorithm to predict the effects of influencer advertising on product sales. The authors of Zulli and Zulli (2020) analyze how TikTok challenges encourage the principle of imitation. To this end, they leverage the concept of memes and theorize the one of “imitation publics”. In Klug (2020), the author analyzes the strategies behind the creation of a video for a challenge. For this purpose, he studies the #distantdance challenge in depth. In Chen et al. (2021b), the authors investigate the processes by which challenges can influence TikTok users. Finally, the authors of Su et al. (2020) investigate how TikTok challenges can be exploited to spread particular messages among the users of this social platform.

Social media have provided enormous opportunities for communication and knowledge seeking (Zhang and Wang 2013). However, they also present many risks, including cyberbullying, dangerous contacts with strangers, pornography, self-harm activities, and even suicide (Livingstone and Smith 2014; Smahel et al. 2020). This is a topic much debated by researchers who study human behavior in social platforms. In the past, these authors have identified a wide range of “dangerous behaviors”, such as: (i) harassing, discriminating (Lawson 2018; Fritz and Gonzales 2018), doxing (Wood et al. 2019) and socially disenfranchising vulnerable individuals (Page et al. 2018; Linabary and Corple 2019; Mitchell et al. 2014; ii) stimulating suicidal tendencies and depressive symptoms among adolescents and young adults (Mitchell et al. 2014; Lerman et al. 2017; Guan et al. 2015; iii) stimulating adolescents and young adults to engage in self-harming behavior (Zhu et al. 2016; Pater and Mynatt 2017; Robert et al. 2015; iv) stimulating social and aggressive behaviors; (v) stimulating online non-suicidal self-injury; (vi) discussing acts of self-harm and of cyber-suicide (Patton et al. 2014; Muñoz-Sánchez et al. 2018; Khasawneh et al. 2020). Such behaviors have been inherited by what we call “dangerous challenges” in TikTok. In this context, one example that has received great attention in recent years is the Blue Whale challenge (Mukhra et al. 2019). It involved mainly vulnerable teenagers, across different social media, like Facebook and Twitter. It consisted of a series of games that ended with the “death” of the participant. The Blue Whale challenge is one of several dramatic examples of harmful behavior in online social platforms. In fact, this phenomenon is so relevant, especially among adolescents, that it has prompted many researchers to investigate it. For example, in Hilton (2017), the author explores self-harm behaviors on Twitter, which are a global concern for health and social care practice. She analyzed 362 Twitter messages and saw that self-harm behaviors are largely misunderstood by general public and are often treated in the wrong way.

The authors of Mitchell et al. (2014) analyze 12-months prevalence rates of youth exposure to websites that encourage self-harm or suicide. They aim to examine whether such exposure is related to thoughts of self-harm and suicide over the past 30 days. Their experiment shows that young people who visited these websites are seven times more likely to say they have thought of hurting or killing themselves. In Duggan et al. (2012), the authors focus on the problem of self-injury. In particular, they point out that non-suicidal self-injury (NSSI) is an increasing concern investigated by mental health professionals working with youth. This type of behavior has grown considerably in the last decade, especially in the Internet media. The authors examine the scope and nature of NSSI content in informational, interactive websites and social networking platforms.

Several authors in the past have been interested in Internet challenges that have led to harmful behavior, especially among young people. For example, the authors of Ortega-Baron et al. (2022) propose an exploratory study on Internet challenges in preadolescents. In particular, they propose a Viral Internet Challenge Scale (VICH-S) to assess this phenomenon. They also analyze some psychometric properties of this scale, such as the types of challenges (e.g., dangerous) and their performance. They found that 7.7% of online challenges are dangerous.

Like all other social platforms, TikTok also has positive and negative behaviors. For example, in Zeng et al. (2020) the authors analyze the role of TikTok in stimulating science memes. In Hayes et al. (2020), the authors illustrate the use of TikTok in facilitating scientific public engagement and contextualization of chemistry. A report on the role of TikTok as a widely used source of information on popular culture, as well as on other issues, and even news, can be found in Newman et al. (2020).

Alongside these examples of positive behavior, researchers have also studied instances of negative behavior. One such examples involves the Blackout challenge Tan and Wegmann (2021), also known as the Choking challenge or Pass-out challenge. It encourages users to hold their breath until they pass out due to a lack of oxygen. Several children ages 12 and under already died after attempting this challenge. In fact, having low oxygen to the brain for over 3 min can get brain damage, while having low oxygen to the brain for over 5 min can result in death. A major problem with this challenge is that children may not understand the dangers of this activity.

In Roth et al. (2022), the authors analyze the participation of adolescents in TikTok challenges, as well as the potential impact that the latter exert on them. Their results show that the participation in these challenges make adolescents feel confident thanks to the obtained views and likes. In Khasawneh et al. (2021), the authors mention the Cinnamon challenge, which requires participants to ingest spoonfuls of ground cinnamon powder without any liquid. Other challenges have prompted adolescents to commit crimes, such as stealing something at school and posting an incriminating video online (Marples 2021). In Atherton (2021), the author reports that a 14-year-old girl was hospitalized because of the Nutmeg challenge. This involves participants ingesting a spoonful of ground nutmeg mixed with water. This produced a hallucinogenic high similar to that of LSD. In Minhaj and Leonard (2021), the authors describe the Benadryl challenge that resulted in the death of at least one 15-year-old girl. This challenge encourages participants to ingest large amounts of diphenhydramine to get high and record their response. This led to numerous cases of diphenhydramine poisoning, with the first case reported in May 2020. Due to its dangerousness and the increase of serious cases, the US Food and Drug Administration issues a warning in September 2020.

Our paper is focused on TikTok challenges, in particular on their lifespan and the possibility of identifying time patterns capable of distinguishing non-dangerous challenges from dangerous ones. To the best of our knowledge, it is the first paper that addresses this issue. To achieve its goals, it uses an articulated set of concepts and techniques derived from Social Network Analysis (Corradini et al. 2021a, 2020; Tsvetovat and Kouznetsov 2011) Data Mining (Cassavia et al. 2017; Han et al. 2011) and Statistics (Bruce et al. 2020). In particular, it constructs a suitable social network for each challenge and uses several parameters and concepts from Social Network Analysis to characterize it. Furthermore, it borrows clustering techniques and Principal Component Analysis from Data Mining to build a first rough version of time patterns. Finally, it uses the classic t-test (Bruce et al. 2020), the Bartlett’s t-test (Bartlett 1935) and the Welch’s t-test (Bruce et al. 2020) to test some hypotheses that allow the refinement of the time patterns previously built.

3 Dataset description

As specified in the Introduction, the first step of our research consisted in building the dataset for our experiments. Indeed, to the best of our knowledge, there was no dataset of TikTok challenges already available and suitable for our goals.

To construct this dataset, we first considered a period of interest for our challenge analysis. The choice fell on the period January 2018–April 2021 as it encompassed the most recent challenges and was sufficiently extensive. Among the challenges whose lifespan spanned this period, we considered those mentioned most frequently on Google News. From them, we had to exclude the extremely dangerous ones, already removed by TikTok, since it would have been impossible to recover their data (see below for details). Finally, among the challenges still available, we chose some that we could assume had been highly recommended in TikTok. With regard to this, as seen in the Introduction, the recommender system underlying a user’s FYP in TikTok depends heavily on her past behavior and returns highly personalized results, which vary rapidly over time. All this makes it impossible to determine with certainty which challenges have been most recommended. Furthermore, TikTok does not publicly provide detailed information about them (e.g., how many times a challenge has been recommended, its level of popularity in various countries, etc.). However, we assumed that if a challenge has many views, receives many likes and comments and has many videos associated, then it has been seen by many users and is popular. We chose challenges based on this assumption. Clearly, ours is an assumption and not an objective and incontrovertible criterion. Therefore, it is prone to sample bias. However, we believe that, with the limitations on the information made available by TikTok mentioned above, any sampling choice we made would not have eliminated this risk. Our choice was aimed at reducing it by adopting criteria and indicators that seemed reasonable to us.

Among the available challenges, we selected seven “non-dangerous” and seven “dangerous” ones. These last challenges, besides complying with all the previous constraints, meet an additional one, that is the fact that all the news that mentioned them judged them “dangerous”. Before continuing with the discussion, some considerations on the concept of “dangerous challenges” are in order. First of all, as specified in the Introduction, dangerous challenges can be considered as a particular case, related to TikTok, of harmful or dangerous behaviors in social media. Regarding this concept, we must point out that the definition of “dangerous” is not necessarily objective, nor can it be taken-for-granted as widely accepted. It is also adult-centric since the people who talk about it are almost always adults. Clearly, the aim of this paper is not to propose a scientific and systematic treatment of dangerous behavior in social media. It is up to experts of human behavior, some of whom have been cited above. Our goal is the definition of computer science-based approaches to search for “dangerous” challenges, considering the latter based on the “mainstream” views of a general public.

We are aware that media and journalistic analyses can be politically, ideologically and geographically/culturally biased. However, we want to point out that our approach, from a technical point of view, can work with any definition of “dangerous” challenge. Therefore, if a user wants to apply it considering a different definition of “dangerous” challenge, our approach still works. The only condition for it to work is that the user provides (perhaps with the support of experts on human behavior) training data that reflect the definition of “dangerous” challenge that she wants to consider.

As pointed out in the Introduction, a challenge is identified by the hashtag used to post a video related to it.

The seven non-dangerous challenges we selected are the following:

  • #bussitchallenge: it consists of a change of clothes following the song “Buss It” by Erica Banks.

  • #copinesdancechallenge: it consists of a series of dance movements following the song “Fly” by Aya Nakamura.

  • #emojichallenge: it consists of imitating several emoji; it does not have an associated song.

  • #colpiditesta: it consists of virtually hitting a soccer ball with the head; it does not have an associated song.

  • #boredinthehouse: it consists of filming a subject, mostly an animal, in different parts of the house. The associated song is “Board in the house” by Curtis Roach.

  • #itookanap: it consists of filming a subject, mostly an animal, sleeping. The associated song is “I Took A Nap” published by the user “gunnarolla”.

  • #plankchallenge: it consists of performing dance movements based on physical training exercises to the rhythm of a song, which is not unique.

The seven dangerous challenges we selected are the following:

  • #silhouttechallenge: it consists of exposing the body covered by a red light filter following the song “Put Your Head On My Shoulder” by Giulia di Nicolantonio. It is considered dangerous because often the body of the author of the video is naked and the filter, being digital, can be easily removed.

  • #bugsbunny: the authors of the corresponding videos lie down on their stomach and lift their legs upwards to show their feet sticking out of their head like the ears of a rabbit; at this point they start to move their feet to the rhythm of a song. It is considered dangerous because it has an explicit variant in which the authors show parts of their bodies “inappropriate” for young people aged 0–18Footnote 2.

  • #strippatok: it consists of publishing videos related to strippers (both men and women). It is considered dangerous because it deals with subjects “inappropriate” for young people aged 0–18.

  • #firewroks: it consists of posting videos with fireworks for which the authors risk their own safety. The apparently wrong hashtag is a trick of the authors of videos to bypass TikTok’s controls.

  • #fightchallenge: it consists of publishing videos with fights organized by the authors themselves. It is considered dangerous because it can lead to the injury of the author or other participants.

  • #sugarbaby: it consists of videos regarding “sugar babies”, i.e., young people having sex with older ones for economic reasons only. It is considered dangerous because it deals with topics “inappropriate” for young people aged 0–18.

  • #updownchallenge: it consists of moving intimate parts of the bodies to the rhythm of a song. It is considered dangerous because it deals with issues “inappropriate” for young people aged 0–18.

We point out that challenges much more dangerous than the seven ones selected by us were spread on TikTok in the past, such as those mentioned in Sect. 2. They were promptly blocked by TikTok and, therefore, the recovery of the corresponding data was impossible.

Regarding the choice to consider seven non-dangerous and seven dangerous challenges, some discussions are in order. Indeed, the classification problem we are dealing with is a typical “rare class problem” (Bruce et al. 2020). It arises when there is a strong imbalance of the two classes to predict, and the class of greatest interest (which we call “positive”) is precisely the rare one. In this scenario, a false negative (which, in our case, would imply classifying a dangerous challenge as non-dangerous) is much more serious than a false positive. Paradoxically, in a case like this, the most accurate classification model might be the one that simply classifies all classes as non-dangerous. However, such a model would be useless. In our context, it is better to have a model that is less accurate but is able to detect as many dangerous challenges as possible, even if it were to misclassify some non-dangerous challenges along the way (Bruce et al. 2020). It is precisely this reasoning that led us to use the same number of dangerous and non-dangerous challenges in the sample.

In practice, it is very difficult to find data on dangerous challenges because they are rare and are removed from TikTok as soon as they are recognized as dangerous. For this reason, in order to have a balanced dataset, we had to undersample the non-dangerous challenges. As pointed out above, this way of proceeding can lead to a worsening of the overall accuracy of our approach, but allows us to obtain very high values of sensitivity (i.e., recall). The latter allows our approach to correctly classify the maximum possible number of dangerous challenges.

After the choice of the challenges, we developed a crawler capable of obtaining public data about the videos associated with a given challenge identified by its hashtag. Our crawler is written in Python and uses several libraries of this language, such as Pandas. The DBMS used to store the corresponding data is MongoDB. Our crawler is primarily a web scraper that, given in input the hashtag of a challenge, returns the list of all videos related to that hashtag.

After downloading data through our crawler, and after performing some pre-processing tasks, we obtained a record for each video. This record contains the following fields:

  • challenge_id: the hashtag of the challenge which the video belongs to;

  • createTime: the publication date of the video;

  • video_id: the identifier of the video;

  • video_duration: the video duration, expressed in seconds;

  • author_id: the identifier of the author of the video;

  • author_verified: it indicates whether the user is verifiedFootnote 3;

  • music_id: the identifier of the music track or sound used in the video;

  • music_title: the title of the music track or sound used in the video;

  • stats_diggCount: the number of likes obtained by the video;

  • stats_playCount: the number of views of the video;

  • authorStats_diggCount: the total number of likes expressed by the author of the video for other videos;

  • authorStats_followingCount: the number of users followed by the author of the video;

  • authorStats_followerCount: the number of users following the author of the video;

  • authorStats_heartCount: the total number of likes received by the author of the video;

  • originalVideo: it is set to 1 if the video began the challenge it belongs to; otherwise, it is set to 0.

  • likedBy_ids: the list of identifiers of the users, who put a like to the video and have their privacy policy set to “public”.

Table 1 displays the number of videos we collected for each challenge, along with the date of the first and last one.

Table 1 Number of videos, date of the first and last one for each challenge

It is worth pointing out that, in the period in which we conducted our experimental campaign (June 2021–August 2021), the lifespan of all the challenges of our reference dataset could be considered concluded. In fact, as pointed out in Sect. 3 and shown in Table 1, the reference period of all our challenges is January 2018–April 2021. After April 2021 and until the end of our experiments (i.e., August 2021), the challenges involved, while continuing to exist, no longer generated significant interactions with users.

Finally, a consideration about the completeness of the dataset is due. In fact, as we said before, TikTok does not make available in an official way the data of the videos published. Since our data were not officially provided by TikTok, we cannot guarantee the completeness of our dataset. However, we can guarantee that, for each challenge, our crawler extracted all the information about its videos that were detectable on TikTok.

4 A social network-based model representing TikTok challenges

The second step of our research activity consists in the construction of a social network for each challenge. Specifically, let \({{\mathcal {C}}}\) be the set of challenges considered in the dataset and let \({{\mathcal {C}}}'\) (resp., \({{\mathcal {C}}}''\)) be the set of non-dangerous (resp., dangerous) challenges. Let \(C_i\) be a challenge of \({{\mathcal {C}}}\); a social network \({{\mathcal {N}}}_i = \langle N_i, A_i \rangle\) can be associated with it.

\(N_i\) is the set of nodes of \({{\mathcal {N}}}_i\). There is a node \(n_{i_j}\) for each author \(a_{i_j}\) who posted at least one video for \(C_i\). A label \(l_{i_j}\) can be associated with \(n_{i_j}\); it indicates the publication timestamp of the first video on \(C_i\) posted by \(a_{i_j}\)Footnote 4. Since there is a biunivocal correspondence between a node \(n_{i_j} \in N_i\) and the corresponding author \(a_{i_j}\), in the following we will use these two terms interchangeably.

\(A_i\) is the set of arcs of \({{\mathcal {N}}}_i\). An arc \((n_{i_j}, n_{i_k})\) indicates that the author \(a_{i_k}\) put a like to a video posted by the author \(a_{i_j}\) and that the timestamp corresponding to \(l_{i_j}\) precedes the one corresponding to \(l_{i_k}\). Intuitively, the presence of this arc indicates a form of propagation of the challenge \(C_i\) toward new users. In fact, it denotes that \(a_{i_j}\) published a video for \(C_i\), \(a_{i_k}\) liked it and decided to publish her own video, thus participating to \(C_i\).

To give an idea of the structure of the networks thus obtained, in Fig. 1 (resp., Fig. 2) we report the structure of the non-dangerous (resp., dangerous) networks. The more internal a node, the older the corresponding label and the most senior the associated author in the community of \(C_i\).

In both figures there are nodes of different colors. In particular, we can find red, black and yellow nodes. The red node, if present, represents the author of the original video of the challenge, i.e., the author who started it. The yellow nodes represent the leaf nodes of the network, i.e., authors who have been stimulated to publish a video, but have not been able to stimulate other authors to do so. Black nodes are all the other nodes in the network; they represent authors who were stimulated to publish a video and in turn were able to stimulate other authors to do so.

Fig. 1
figure 1

Structure of non-dangerous networks

Fig. 2
figure 2

Structure of dangerous networks

5 Analysis of the structure of the social networks associated with the challenges

In this section, we begin by analyzing the structure of the networks associated with the non-dangerous and dangerous challenges of our dataset to verify if there are structural differences between the networks corresponding to the two types of challenges. Tables 2 and 3 show the basic structural characteristics of the two types of networks. From the analysis of these tables, we can draw the following conclusions: (i) the networks associated with non-dangerous challenges are on average larger than those associated with dangerous challenges; (ii) there is no significant difference for the average degree and the clustering coefficient of the two types of networks; (iii) the networks associated with dangerous challenges have a density higher than the ones associated with non-dangerous challenges.

Table 2 Basic structural characteristics of the networks associated with non-dangerous challenges
Table 3 Basic structural characteristics of the networks associated with dangerous challenges

In the next analysis, we focused on the characteristics of the videos for the two types of challenges. The main basic characteristics are shown in Table 4. From the analysis of this table we can observe that: (i) the average duration of the videos is similar in the two types of challenges; (ii) the average number of music tracks is higher in the non-dangerous challenges than in the dangerous ones; (iii) the average number of likes, comments, shares and views is higher for the dangerous challenges than for the non-dangerous ones.

Table 4 Differences between the main basic characteristics of the videos for non-dangerous and dangerous challenges

After examining videos, we focused on the main basic characteristics of their authors. These characteristics are reported in Table 5. From the analysis of this table we can observe that: (i) there is a slight difference in the average number of followers for the two types of authors; (ii) the authors of non-dangerous challenges tend to put more likes, follow many more authors and have many more videos published than the authors of dangerous challenges; (iii) the authors of dangerous challenges receive many more likes than the authors of non-dangerous ones.

Table 5 Differences between the main basic characteristics of the authors of videos for non-dangerous and dangerous challenges

The last structural analysis we performed regarded the evolution of the network structure over time during the challenge lifespan. It is also a starting point for the next analyses that represent the core of our paper. In particular, this analysis focused on the average duration of the lifespan and the growth of the network size over time. The results obtained are reported in Table 6. From the analysis of this table we can observe important differences between non-dangerous and dangerous challenges. First of all, the average lifespan of dangerous challenges is longer than that of non-dangerous ones. Furthermore, the growth of non-dangerous challenges is much more gradual than that of dangerous ones. In fact, in the latter case, the growth is very limited up to about 75% of the lifespan, while it becomes “explosive” later. The investigation of the detailed differences concerning challenge lifespans represents the main topic of the research described in this paper.

Table 6 Differences between the main basic characteristics of the lifespan for non-dangerous and dangerous challenges

6 Definition of the lifespan intervals of a challenge

In the last experiment of the previous section we have seen that the growth of non-dangerous networks seems to show a totally different trend from the one characterizing dangerous networks. In this section, we explore this aspect more deeply.

As a first step, we considered the variation of the size of each network during its lifespan. Clearly, the functions thus obtained would be broken lines, whatever the sampling frequency. Actually, we chose a very high sampling frequency, equal to 1% of the lifespan. However, for motivations we will see later, we wanted to have continuous curves, rather than broken lines. For this reason, we interpolated the points using a univariate spline. Given the high sampling frequency we chose, we assumed that the difference between the broken line and the curve obtained by interpolation was minimal. To test this hypothesis, we computed the Mean Absolute Error (i.e., MAE) between them, considering 100 additional equidistant points for each interval (thus considering 10,000 points for each lifespan). Afterwards, for each point, we normalized this value against the corresponding one of the broken line. The obtained results are reported in Table 7. From the analysis of this table we can observe that the average normalized differences are very low. Therefore, the interpolation we made can be considered acceptable.

Table 7 Normalized MAE between the continuous function returned by the univariate spline interpolation and the real values for non-dangerous challenges (at left) and dangerous ones (at right)

The reason we wanted to have a continuous curve is that it allows the computation of the first derivative and, then, the identification of the points of the lifespan where the curve slope inverts.

Let \(C_i\) be a challenge, let \({{\mathcal {N}}}_i\) be the corresponding network and let \(\nu _i(\cdot )\) be the function representing the variation of the number of nodes of \({{\mathcal {N}}}_i\) during the lifespan of \(C_i\). \(\nu _i(\cdot )\) is obtained by applying the univariate spline on the data of \(C_i\). Let \(X = \{ x_1, x_2, \cdots , x_N \}\) be the set of points for which the first derivative of \(\nu _i(\cdot )\) is null. The lifespan of \(C_i\) can be divided into \(N-1\) intervals \((x_q, x_{q+1})\), \(1 \le q \le N-1\), such that \(\nu _i(\cdot )\) is always increasing or always decreasing within each interval. As we will see later, such intervals play a key role in our approach.

In Fig. 3 (resp., 4) we show the trend of the function \(\nu _i(\cdot )\) and the corresponding intervals for non-dangerous (resp., dangerous) challenges. Already from the examination of these figures we can see how the two types of challenges show very different trends of \(\nu _i(\cdot )\). Capturing such differences is the next goal of this paper.

Fig. 3
figure 3

Trend of the function \(\nu _i(\cdot )\) and corresponding intervals for non-dangerous challenges

Fig. 4
figure 4

Trend of the function \(\nu _i(\cdot )\) and corresponding intervals for dangerous challenges

6.1 Definition of features to characterize lifespan intervals

As the next step of our approach, we determined a set of features capable of characterizing an interval of a challenge lifespan. To this end, we tried to maximize the number of features to consider taking all those available from the dataset plus several others derived from Social Network Analysis. The latter were possible thanks to the Social Network-based model for the representation of a challenge described in Sect. 4. Proceeding in this way, given a challenge \(C_i\), the corresponding social network \({{\mathcal {N}}}_i\), and an interval \({{\mathcal {I}}}\), we identified the following 26 features characterizing it:

  • video_number: number of videos of \(C_i\) posted during \({{\mathcal {I}}}\);

  • video_difference: difference between the number of videos posted during \({{\mathcal {I}}}\) and the number of videos posted in the previous interval;

  • begin_percentage: percentage of the lifespan at which \({{\mathcal {I}}}\) begins;

  • end_percentage: percentage of the lifespan at which \({{\mathcal {I}}}\) ends;

  • duration: duration of \({{\mathcal {I}}}\) (expressed in days);

  • average_hours_between: average number of hours elapsed between the posting of two videos during \({{\mathcal {I}}}\);

  • likes: total number of likes obtained by \(C_i\) during \({{\mathcal {I}}}\);

  • average_likes: average number of likes obtained by \(C_i\) during \({{\mathcal {I}}}\);

  • average_comments: average number of comments obtained by \(C_i\) during \({{\mathcal {I}}}\);

  • average_shares: average number of shares obtained by \(C_i\) during \({{\mathcal {I}}}\);

  • average_views: average number of views obtained by \(C_i\) during \({{\mathcal {I}}}\);

  • average_followers: average number of followers of the authors of the videos posted during \({{\mathcal {I}}}\);

  • average_following: average number of users followed by the authors of the videos posted during \({{\mathcal {I}}}\);

  • average_likes_authors: average number of likes received by the authors of the videos posted during \({{\mathcal {I}}}\);

  • verified_authors: number of verified authors (see Sect. 3) posting videos during \({{\mathcal {I}}}\);

  • number_nodes: number of nodes of \({{\mathcal {N}}}_i\);

  • number_arcs: number of arcs of \({{\mathcal {N}}}_i\);

  • network_density: density of \({{\mathcal {N}}}_i\);

  • connected_components: number of connected components of \({{\mathcal {N}}}_i\);

  • maximum_size_components: number of nodes of the maximum connected component of \({{\mathcal {N}}}_i\);

  • average_degree_centrality: average degree centrality of the nodes of \({{\mathcal {N}}}_i\);

  • average_eigenvector_centrality: average eigenvector centrality of the nodes of \({{\mathcal {N}}}_i\);

  • average_pagerank: average PageRank of the nodes of \({{\mathcal {N}}}_i\);

  • average_closeness_centrality: average closeness centrality of the nodes of \({{\mathcal {N}}}_i\);

  • average_betweenness_centrality: average betweenness centrality of the nodes of \({{\mathcal {N}}}_i\);

  • average_clustering_coefficient: average clustering coefficient of the nodes of \({{\mathcal {N}}}_i\).

However, such a large number of features is difficult to manage. Therefore, we decided to carry out a study of their correlations to see if some of them could be filtered out. In Fig. 5, we show the correlation matrix thus obtained. This figure shows several valuable information that can help us to better understand the mutual interrelationships between the features, as well as the interrelationships between the features and the structure of the underlying network.

Fig. 5
figure 5

Correlation matrix for the 26 features selected for characterizing lifespan intervals

In particular, some interesting information that can be derived and that help us to select a manageable number of features to characterize lifespans are the following:

  • There is a high direct correlation between video_number, video_difference, number_nodes, number_edges, maximum_size_component and average_degree_centrality. Therefore, to characterize lifespans, it is sufficient to keep only one of them and discard the others. We decided to keep video_number.

  • There is a high direct correlation between login_percentage and end_percentage. For this reason, we decided to keep begin_percentage and discard end_percentage.

  • There is a low correlation between duration and all the other features. Therefore, we decided to keep this feature. A similar reasoning applies to average_hours_between, average_following and average_betweenness_centrality.

  • There is a high direct correlation between like, average_likes, average_comments, average_shares, average_views and verified_authors. For this reason, it is sufficient to keep only one of them and discard all the others. We decided to keep average_likes.

  • The features average_followers and average_like_authors have a high direct correlation with each other. Furthermore, each of them has also a high direct correlation with average_clustering_coefficient. Therefore, we decided to keep the latter feature and discard the first two.

  • The feature connected_component has both direct and inverse medium-high correlations with many features. Therefore, we decided to discard it. A similar reasoning applies to maximum_size_component and average_degree_centrality.

  • The features average_eigenvector_centrality, and average_closeness_centrality have a high direct correlation with each other and with network_density. As a consequence, one might think of keeping only one of these features and discarding the others. However, we observe that all of them have a high inverse correlation with several other ones, for example with video_number, which we have already kept, and end_percentage. In turn, the latter has a very high correlation with begin_percentage, which we have already kept. For this reason, we decided to discard all these features.

Summarizing, at the conclusion of this examination, we decided to select the following eight features for characterizing lifespan intervals:

  • average_likes;

  • average_following;

  • video_number;

  • duration;

  • average_betweenness_centrality;

  • average_clustering_coefficient;

  • average_hours_between;

  • begin_percentage.

6.2 Characterizing the intervals of challenge lifespans

In the previous sections, we determined, through the function \(\nu (\cdot )\), the lifespan of the 14 challenges of our interest. Afterwards, through the computation of the first derivative of \(\nu (\cdot )\), we divided each lifespan into intervals. In this section, we illustrate our approach for characterizing these intervals. Roughly speaking, it consists of grouping them into homogeneous clusters, based on the eight features identified above, and, then, determining the characteristics of each cluster.

As a first task of this activity, we considered a new dataset consisting of one table whose rows were associated with intervals and whose columns corresponded to the eight features. Each row of the table reported the values of the eight features for the corresponding interval.

Afterward, we applied the principal component analysis (hereafter, PCA) (Han et al. 2011) to this dataset and reduced the number of dimensions from 8 to 2. This allowed us to represent the intervals in a plane, in order to favor a visual representation of the clusters obtained.

After this task, we applied Autoclass (Cheeseman and Stutz 1996), a classical algorithm that uses Naive–Bayes in combination with Expectation-Maximization to find the probability distribution parameters best fitting the data. We chose Autoclass because, among the various strengths characterizing it, there is also the capability of automatically determining the number of clusters (Han et al. 2011). In fact, it was not possible to make any preliminary conjecture about this number, and the elbow method performed with k-means returned no results. Autoclass allowed us to group the intervals into five clusters. Thanks to the preliminary application of PCA, these clusters can be represented in a plane whose coordinates correspond to the two dimensions returned by PCA. The five clusters thus obtained are shown in Fig. 6.

Fig. 6
figure 6

The five clusters of intervals returned by Autoclass

From the analysis of this figure we can observe that these clusters actually appear quite homogeneous. However, in order for them to be useful for our analysis, it is necessary to understand what type of intervals each cluster represents. By carefully examining the features of the intervals belonging to each cluster, we were able to draw the following characterizations:

  • Cluster A: it includes final intervals of lifespans, when the challenge has less attraction on users. When compared to the other intervals of the same challenge, the ones of Cluster A are characterized by: (i) a lower average number of likes; (ii) the presence of less important authors (in fact, verified authors are few and most of them have few followers). Finally, the time interval between the publication of two consecutive videos is longer. The networks associated with these intervals are more connected and have higher centrality values than the ones corresponding to other intervals. This represents a further evidence that we are in a well-established phase of the challenge.

  • Cluster B: it includes intervals belonging to a peak phase of the challenge. In fact, they are characterized by a very high number of likes and videos published. There are many verified authors, as well as many authors with many followers. The time interval between the publication of two consecutive videos is short.

  • Cluster C: it includes initial intervals of lifespans. The number of likes is less than that characterizing the intervals of Cluster B. However, it is quite high, and this means that the challenge is arousing curiosity and will probably have a peak in a later interval. The users are generally not verified but have a high number of followers. This makes the number of views and shares very high. The time interval between the publication of two consecutive videos is quite long. The networks associated with these intervals are poorly connected. This indicates that, in these intervals, video postings are made by people still unconnected to each other. This represents a further evidence that we are in an initial phase of the challenge.

  • Cluster D: it includes lifespan intervals that follow a challenge peak. Intervals belonging to this cluster are characterized by a low number of likes. Most of the authors of the videos are verified and have many followers and interactions. The average number of videos posted is high. The time elapsed between the posting of two consecutive videos is short, although it tends to increase as we move toward the end of the intervals. The network associated with these intervals is fairly connected.

  • Cluster E: it includes initial intervals of lifespans. They are characterized by a high number of likes and published videos. There are many views but few comments and few shares. This implies that the interaction level between users is low. The time elapsed between the publication of two consecutive videos is long. The network associated with intervals is quite disconnected.

To give also a quantitative idea of the characteristics of the clusters, in Table 8 we report the average values assumed in each cluster by the eight features that we selected to represent the lifespan intervals.

Table 8 Average values assumed in each cluster by the features representing lifespan intervals

7 Searching for time patterns in the challenge lifespans

After grouping the intervals into homogeneous clusters, we were able to perform the second main investigation of this paper, namely the extraction of time patterns allowing us to distinguish non-dangerous challenges from dangerous ones.

As a first step, we considered the lifespan of the 14 challenges under examination and verified to which cluster the corresponding intervals belonged. If two consecutive intervals belonged to the same cluster we considered them as if they were a single one. At the end of this activity, we obtained the following sequences of intervals for non-dangerous challenges:

  • #boredinthehouse: B, A

  • #bussitchallenge: B, A

  • #colpiditesta: B, A

  • #copinesdancechallenge: B, A

  • #emojichallenge: C, B, D

  • #ITookANap: E, B, A

  • #plankchallenge: E, B, A

Instead, the sequences of intervals characterizing the dangerous challenges were as follows:

  • #bugsbunnychallenge: E, C, D

  • #fightchallenge: C, B, A

  • #firewroks: C, B

  • #silhouettechallenge: E, A

  • #strippatok: E, D

  • #sugarbaby: E, A

  • #updownchallenge: E, B

From the examination of the previous sequences, we drew some interesting information. In particular, we observed that:

  • In non-dangerous challenges, the pattern B, A tends to repeat often. In any case, an interval belonging to the cluster B is always present. However, it is always followed by an interval belonging to the clusters A or D.

  • In dangerous challenges there is no dominant pattern. However, the presence of an interval belonging to the cluster E is often observed.

We noticed that the intervals of type D generally followed the peak of a challenge and that the ones of type A generally were the final ones of a challenge. These characteristics, along with the features of clusters A and D reported in Sect. 6.2, led us to hypothesize that the intervals of types A and D represented the same reality, i.e., the conclusion of a challenge. More precisely, they represented two slightly different ways of challenge conclusion. In fact, the intervals of type A described a faster conclusion, while those of type D represented a slower one.

To deepen this hypothesis we decided to perform a t-test based on the following null hypothesis H0: “The means of the samples for the intervals of types A and D are equal”. The metrics we used to perform this test are the eight features we selected to characterize the intervals of the challenge lifespans, namely: average_likes, average_following, video_number, duration, average_betweenness_cen-trality, average_clustering_coefficient, average_hours_between, begin_percentage (see Sect. 6.1 for all details).

Actually, in order to apply the classical t-test it is necessary that the elements of the two samples have equal variance; otherwise, it is necessary to use the Welch’s t-test (Bruce et al. 2020).

In order to decide what kind of t-test was appropriate, we applied the Bartlett’s t-test (Bartlett 1935) to the intervals of types A and D; also for this test we applied the same metrics used for t-test. The Bartlett’s t-test is used to know if two samples with different numbers of elements have the same variance or not. In our application of it, we considered the following null hypothesis H0: “The variances of the samples for the intervals of types A and D are equal”. At this point, we computed the corresponding p-value and obtained that it is equal to 0.003. Since this value is smaller than 0.05, we concluded that the null hypothesis was rejected and, therefore, it was necessary to apply the Welch’s t-test, instead of the classical one, to test the hypothesis H0: “The means of the samples for the intervals of types A and D are equal”. Applying this test, we obtained a p-value of 0.67, which was much greater than 0.05. Therefore, the null hypothesis cannot be rejected.

As a consequence, deepening through t-test did not invalidate our hypothesis that the intervals of type A and D represent the conclusion of a challenge. Despite their minor differences, for the purpose of our research, we can assume that A and D are equivalent.

Based on this assumption, the sequences of intervals for non-dangerous challenges were the following:

  • #boredinthehouse: B, A

  • #bussitchallenge: B, A

  • #colpiditesta: B, A

  • #copinesdancechallenge: B, A

  • #emojichallenge: C, B, A

  • #ITookANap: E, B, A

  • #plankchallenge: E, B, A

Instead, the sequences of intervals for dangerous challenges were the following:

  • #bugsbunnychallenge: E, C, A

  • #fightchallenge: C, B, A

  • #firewroks: C, B

  • #silhouettechallenge: E, A

  • #strippatok: E, A

  • #sugarbaby: E, A

  • #updownchallenge: E, B

After this, we considered the intervals of types C and E. The description given above allowed us to hypothesize that both of them were initial lifespan intervals. Also, the number of likes and the number of videos posted during them were comparable. The properties of the networks associated with them were also similar. Analogously to what we performed for A and D, we carried out a statistical analysis to deepen our hypothesis. In this case, the Bartlett’s t-test with the null hypothesis H0: “the variances of the samples for the intervals of types C and E are equal”, and with the same metrics used for the previous t-test, gave us a value of 0.55, which is much greater than 0.05. Therefore, we could conclude that the null hypothesis cannot be rejected. Consequently, we could apply the classical t-test with the following null hypothesis H0: “The means of the samples for the intervals of types C and E are equal” and with the metrics used for all the previous t-tests. In this case, the computation of the p-value returned 0.91. Therefore, the null hypothesis cannot be rejected.

As a consequence, also for the intervals of type C and E, the further investigation through t-test did not invalidate our hypothesis, namely that both intervals represent the beginning of a challenge, albeit with some minor specificities. Despite them, for the purposes of our research, we can assume that C and E are equivalent.

Based on this assumption, the interval sequences for non-dangerous challenges were the following:

  • #boredinthehouse: B, A

  • #bussitchallenge: B, A

  • #colpiditesta: B, A

  • #copinesdancechallenge: B, A

  • #emojichallenge: C, B, A

  • #ITookANap: C, B, A

  • #plankchallenge: C, B, A

Instead, the interval sequences for dangerous challenges were the following:

  • #bugsbunnychallenge: C, A

  • #fightchallenge: C, B, A

  • #firewroks: C, B

  • #silhouettechallenge: C, A

  • #strippatok: C, A

  • #sugarbaby: C, A

  • #updownchallenge: C, B

Thanks to this result, we were able to identify some time patterns characterizing non-dangerous and dangerous challenges. As we will see below, since these time patterns are different in the two cases, they are also able to differentiate one type of challenge from the other.

Let us first examine non-dangerous challenges. In this case, we always have the presence of a sequence of intervals of type B, A. This sequence is very often preceded by an interval of type C, so that we have a time pattern of type C, B, A. Recall that: (i) the intervals of type C are initial ones in a challenge lifespan; (ii) the intervals of type B correspond to a peak of a challenge; (iii) the intervals of type A indicate the end of a challenge. We argued that the typical time pattern of a non-dangerous sequence is C, B, A. In fact, the challenges showing a B, A time pattern already existed when our research on them began although the interactions with users that they were able to elicit were almost negligible.

Let us now examine dangerous challenges. In this case, unlike the previous one, there is no single sequence of intervals characterizing all of them. Instead, we identified three dominant sequences that correspond to three different “fates” generally characterizing the challenges of this type. In particular, the three time patterns are:

  • C, B: these challenges had a standard initial phase with an interval of type C; then, they reached a peak phase. Finally, they almost suddenly ceased to have meaningful interactions with users. This may have happened because they ran out of steam very quickly or they were recognized by TikTok as dangerous and were stopped or removed from the social network.

  • C, A: these challenges had an initial phase, which was followed by a decay one. In other words, they never reached the peak. They were born, survived for a certain period on the social network, and then died.

  • C, B, A: as we will see below, these challenges are a small minority among the dangerous ones. They behaved like the non-dangerous ones, in that they were born, had a peak and, finally, decayed.

In order to verify the goodness of our approach, we decided to test it on a new dataset, larger than the previous one. It stores data on 175 challenges; 150 of them are non-dangerous while 25 are dangerous. Due to space limitations, we cannot detail these challenges as we did for the 14 challenges defined in Sect. 3. However, in Table 9, we report the aggregate values of some fields that refer to them and whose meaning we had illustrated in Sect. 3.

Table 9 Aggregate values of some fields that refer to non-dangerous and dangerous challenges

The results obtained are the following:

  • As for non-dangerous challenges:

    • 134 (i.e., 89.33% of them) followed the time pattern C, B, A. This is the only one we identified as significant for this type of challenges.

    • 16 (i.e,, 10.67% of them) followed several other sequences of intervals.

  • As for dangerous challenges:

    • 10 (i.e., 40.00% of them) followed the time pattern C, B;

    • 11 (i.e., 44.00% of them) followed the time pattern C, A;

    • 2 (i.e., 8.00% of them) followed the time pattern C, B, A;

    • 2 (i.e., 8.00% of them) followed other sequences of intervals.

As a further analysis, having trained our model on a balanced dataset, we decided to create a third dataset of 300 challenges (150 non-dangerous and 150 dangerous ones). The 150 non-dangerous challenges are those of the previous dataset. As for the dangerous challenges, since they are very rare, they have been obtained from the 25 challenges of the previous dataset using the oversampling technique implemented through bootstrap (Bruce et al. 2020). The results obtained by applying our approach to the new dataset are the following:

  • As for non-dangerous challenges:

    • 132 (i.e., 88.00% of them) followed the time pattern C, B, A.

    • 18 (i.e,, 12.00% of them) followed a variety of other sequences of intervals; these were partially different from the ones found in the previous dataset, because they were influenced by the new composition of the dataset.

  • As for dangerous challenges:

    • 65 (i.e., 43.33% of them) followed the time pattern C, B;

    • 69 (i.e., 46.00% of them) followed the time pattern C, A;

    • 7 (i.e., 4.67% of them) followed the time pattern C, B, A;

    • 9 (i.e., 6.00% of them) followed a variety of other sequences of intervals.

The results obtained from both these datasets represent a confirmation that the time patterns we detected actually exist for the two types of challenges into consideration and are capable of discriminating them. In addition, they show that the patterns we found are really able to capture almost all the behaviors of TikTok challenges.

Note that with both datasets the sensitivity of our approach is very high. In fact, it is equal to 92.00% in the case of the second dataset (i.e., the one containing only real challenges), while it raises to 94.00% in the case of the third dataset (i.e., the one balanced through the oversampling of dangerous challenges).

8 Limitations of our approach

The approach proposed in this paper is the first step of a research line that aims to identify new ways to distinguish TikTok challenges into dangerous and non-dangerous. Just because it is the first step of a path, it suffers from limitations that we examine in this section.

First, as specified in the Introduction, we are currently able to perform a classification of challenges into dangerous and non-dangerous. In particular, we are only able to classify a challenge near the end of its lifespan, or at least after a presumably long period of time. This is currently a limitation of our approach, because, if we were able to classify a challenge as dangerous early in its lifespan, we could build a system for the early detection of dangerous challenges. This capability is very important for being able to detect and remove dangerous challenges before they become too successful and achieve exponential growth. Such a feature would transform our approach from descriptive-diagnostic to predictive-prescriptive, making it much more powerful. We believe that if we were able to reduce the granularity of time intervals, so as to make it much finer, we could test the possibility of extending our approach to identify temporal patterns capable of distinguishing the two kinds of challenge already at the beginning of their lifespan. The early detection of dangerous challenges using time interval analysis could have important applications. For example, it could enrich the set of approaches used by TikTok to detect dangerous challenges for removing them. In addition, it could be used by government regulators to identify dangerous challenges and then ask TikTok to remove them. Last but not least, it could be used to offer a service reporting dangerous challenges or challenges with content “inappropriate” for young people. This service could be extremely valuable for parents and educators (recall that TikTok is currently the most popular social network among adolescents, and therefore among minors).

A second limitation of our approach concerns the low number of challenges considered in the reference dataset (see Sect. 3). This is due in part to the rarity of dangerous challenges and in part to the way of proceeding typical of the analyses on TikTok. In fact, these analyses often take into consideration few challenges, each characterized by many videos. For example, Ng et al. (2021) analyzes 12 challenges, Alonso-López et al. (2021) examines 8 challenges, Bruno (2020) considers 8 challenges and a total of 100 videos, Fiallos et al. (2021) studies only one challenge characterized by 1,495 videos; finally, Medina Serrano et al. (2020) and Qiyang and Jung (2019) each analyze two challenges. As we have seen above, our 14 challenges still led us to examine 6,005 videos, which represent a significant number in the TikTok analyses scenario.

There is a well-defined reason why the analyses on TikTok have such characteristics (i.e., a low number of challenges taken into account, each presenting many videos). In fact, TikTok does not provide an API to fetch its data. Therefore, it is necessary to implement a web crawler using a web scraper to achieve this goal. On the other hand, the need of creating a web scraper means that our crawler does not suffer from time or rate limitations set by TikTok.

The data downloaded by our crawler are those publicly visible in TikTok. In other words, they are the same that any user would see when opening this app. In fact, our crawler can operate only with users who have set their privacy policy to “public” and comply with the Terms and Conditions of TikTok. Thanks to this and to the fact that it does not take any data from users who have their privacy policy set to “private”, we can say that the use of our crawler does not pose ethical issues.

Our crawler suffers from some technical limitations due to its nature of a web scraper. In fact, the time to download the data for an experimental campaign is very large. The number of videos available for a challenge could be very high and the web scraper has to download and process the data of each video and its corresponding author. Moreover, for each video so identified, it gets the list of its likes. For each like, it determines: (i) the user who put it; (ii) whether this user has her privacy policy set to “public” or not; (iii) a video (if it exists) about the same challenge published by her. All these operations must necessarily be performed in sequence by the crawler. Furthermore, we had to perform some pre-processing and cleaning activities on our data. First of all, we had to immediately verify the privacy settings of the user whose data we wanted to download. If those privacy settings were set to “private” we had to discard that user. This happened for about 30% of the users considered. For the remaining ones we carried out the classic ETL operations on their data. In particular, we removed all rows with null fields or inconsistencies. Next, we performed aggregations of numeric values. In particular, we had to transform the likes given to a certain video from a list of nicknames to an overall value. More generally, wherever possible, we had to convert lists and non-numeric values to numeric ones, because they are easier to process and much more suitable for data analyses. Clearly, all the operations described above are time consuming, and this limits the number of challenges that both we and past TikTok researchers have been able to use in building the dataset to support experiments. For example, it took more than one week to download the data we used for the training activities of our experimental campaign (which involved 14 challenges). Instead, the download of the data for the testing activities (involving 175 challenges) took about 2 months.

Although computation time plays a key role in limiting the number of challenges that can be used in experiments, there are other no less important factors that contribute to this limitation. A first one concerns the fact that many challenges are not identified by a unique hashtag or sound. All these challenges are not recognizable by web scrapers that researchers must use to extract data from TikTok. A second factor has its root in the dual problem to the previous one, which is that a hashtag associated with a challenge can also be used to identify other videos completely unrelated to it. When this happens and there are a significant number of these videos, it is impossible to use the corresponding challenge in the experiments, because its data would be contaminated by these videos. As an alternative to its discarding, a very sophisticated ad-hoc filtering system should be implemented, capable of identifying videos that have the same hashtag as a challenge but do not belong to it. Such a filtering system would have to examine the semantics of the videos, and therefore would be very time consuming. This would lead to a worsening of the web scraper’s computation time that, as seen above, already represents a major limitation for the current way of proceeding.

A third factor relates to the presence of challenges with many authors liking a posted video and having a private profile. We have already seen above that, on average, about 30% of the authors of the videos of a challenge have their corresponding privacy policy set to “private”. This percentage is to be considered physiological. However, when, in a given challenge, it is much higher (for example, 50% or 70%), the data extracted from it have an excessive distortion factor and, therefore, the whole challenge must be discarded. A final factor limiting the number of challenges in the datasets used by TikTok researchers was mentioned earlier and relates to TikTok’s policy of removing challenges verified as dangerous. The removal of such challenges might not have impacted the scientific experiments related to TikTok if that social platform had provided an API-based mechanism to access its data. But, since such a mechanism does not exist and the only way to access TikTok’s data is through a web scraper, it is clear that removing a challenge leads to the ultimate loss of its content not only for users but also for researchers who want to carry out scientific investigations on it.

9 Conclusion

In this paper, we have proposed an approach to extract time patterns from the lifespans of non-dangerous and dangerous TikTok challenges. We have seen that the patterns we found for the two types of challenges are different. As a consequence, the presence of a certain pattern can be a strong indicator on the (non) dangerousness of the corresponding challenge.

In light of our results, we can say that our goal of identifying a new model to classify challenges into dangerous and non-dangerous ones has been achieved. In fact, our approach has proved capable of distinguishing the two kinds of challenge. We point out again that it must be considered a first step in our overall research. In this sense, the early detection of dangerous challenges, as described in Sect. 8, is certainly the first future development of it. A second development involves an effort to speed up the retrieval of data about TikTok challenges so that we can have richer datasets in reasonable time in the future. We have seen that the only way to currently retrieve TikTok data is the usage of a web scraper. Some activities of such a scraper must necessarily be performed sequentially, while others can be parallelized. In the future, we plan to proceed, where possible, to such parallelization activity. In addition to this, we would like to further delve into the investigation of challenges through Social Network Analysis in order to find indicators capable of distinguishing the two types of challenges based on how the corresponding communities evolve over time. Last, but not the least, we would like to extend our analyses done for challenges to TikTok’s trends. These certainly have some similarities with challenges. However, they also have several specificities. Consequently, it is presumable that many of the results found for challenges can be extended to trends by making the suitable changes taking their specificities into account.