A data-driven investigation of successful local film profiles in the Indonesian box office

Many local film industries in developing economies are struggling to compete with foreign film hits. Surprisingly, despite the high exposure from imported films, the Indonesian film industry has maintained a robust domestic market in the last decade. This study aims to gather insights from this phenomenon by examining the key factors that distinguish the profiles of non-successful and successful local films at the Indonesian box office. The analysis was conducted based on the characteristics of 225 local films from 2017 to 2018. We employed a recent method to generate such profiles, namely the tree-based comparative analysis, which uses a machine learning model to augment the cross-case comparisons from a large pool of data. The study result suggests that the success of local films is mainly driven by the actor's popularity and the presence of foreign movie hits at the box office. Besides, we also found that the distinction between moderately successful and highly successful local films predominantly lies in story familiarity, suggesting the efficacy of the "brand extension" strategy in the Indonesian film industry. The study concludes by offering some managerial insights for producers and distributors to deliver successful films at the box office.

filmgoers. Although the number of films released each year is relatively low, the total admission for local films has increased sharply from year to year (Mediarta, 2018). For example, there were 124 Indonesian films released in 2016, with a total of 37 million admissions. The following year, the total number of film releases dropped by 6%, but the total admission rose by 14%. These facts indicate that Indonesian films have attracted many filmgoers and maintained substantial growth in the domestic market.
Despite the increasing interest from audiences, Indonesian films also face intense competition from foreign films. As of 2018, Hollywood films still dominate the domestic market in Indonesia, with a 65% market share (Sari & Kartika, 2018). They are often backed by a big budget, allowing them to cast superstars and create big promotions, making it difficult for local films to compete. Consequently, many local film producers rely on a timing policy to avoid direct competition with big international studios (Einav, 2002;Yang & Kim, 2014). Interestingly, few Indonesian films prevailed even when released in tandem with a foreign film hit. For example, in 2016, a sequel of an Indonesian film hit "Ada Apa Dengan Cinta 2" was released during the same period as the Hollywood blockbuster Captain America: Civil War. The local film attracted more than 3.6 million admissions and became one of the best-selling Indonesian films of all time (Film Indonesia, 2016). This phenomenon indicates that there are still many uncertainties in what truly drives Indonesian film success at the box office. While many local producers in the Asia-Pacific struggle to survive global competition (Changsong, 2019;McKenzie & Walls, 2013;Yang & Kim, 2014), Indonesian films successfully maintain a strong position in the domestic theatrical market.
This study aims to investigate the profiles of successful local films and interpret the behavior of filmgoers in Indonesia. Currently, such studies are lacking in the literature. Many existing studies on film success are based on global film performance in the international market (e.g., Lash & Zhao, 2016;Litman & Ahn, 1998;Pokorny & Sedgwick, 2010). Little attention has been given to the performance of local films in the domestic market. For example, Jansen (2005) examined the success factors of local films in Germany and found that managerial skills in film production play an essential role in the success of German films. McKenzie and Walls (2013) studied local film performance in Australia and found that higher levels of advertising support and opening screens do not significantly improve the financial outcome of Australian films. Changsong (2019) found that local films in Malaysia are unsuccessful since most of the filmgoers in Malaysia come from a lower-income group. Verma and Verma (2019) studied Bollywood motion pictures and concluded that the most important predictor for film success in India is the rating of its soundtrack. Bartosiewicz and Orankiewicz (2020) studied the impact of foreign films on the Polish box office and suggested that foreign films' presence does not negatively affect local film performance. These findings suggest that every country possesses a set of unique profiles for achieving cinematic success. Thus, it is necessary to understand the characteristics that drive local film success in a domestic market context. This study contributes by providing the profile of successful local films in the developing market such as Indonesia, as it still needs to be adequately recognized in the literature. With a lower budget and less star power, local films in Indonesia are prone to receive fewer admissions and less financial success. The profiling of local film performance can give insights to producers on the success factors of local films.
The rest of this paper is structured as follows. In Section 2, we provide a literature review. The data and methods used in the study are described in Section 3 and Section 4, respectively. In Section 5, we present the results and discuss the findings. Finally, we provide the main findings and study limitations in Section 6.

II. LITERATURE REVIEW
Although film success is unpredictable, researchers and practitioners have long been interested in the key attributes that may have a significant effect on it. For instance, many filmmakers believe that popular actors can secure a return on investment despite the expensive payment (Wei, 2006). The fame of actors can attract significant attention from filmgoers, which indirectly helps the film's promotion. However, empirical studies have shown mixed conclusions on the effect of famous actors on film success. Some studies support the significant effect of actor popularity on film success (e.g., Bagella & Becchetti, 1999;Elberse, 2007;Litman & Kohl, 1989), whereas other studies suggested otherwise (e.g., Hennig-Thurau et al., 2006;Sochay, 1994). Recently, Wirtz, Mermann, and Daiser (2016) evaluated several variables that may affect actors' contributions to the film's success. Among those variables, the price-competence ratio of an actor has the most powerful impact on film performance.
Thus, an actor's popularity alone is not a guarantee of success: other factors attributed to actors affect the film's performance.
Diverging conclusions were also observed on the impact of genre on film success. Litman and Kohl (1989) examined the performance of global films in the 1980s. They suggest that drama positively influences a film's financial performance, whereas horror and comedy genres are insignificant. Craig et al. (2005) evaluated the performance of American films in other countries and found that drama and comedy do not have a significant impact. Instead, action and horror films influence film revenues substantially. These results indicate that no specific genre is widely preferable across markets, but rather, each market has its preference for a particular genre.
The script type is also one of the traits often associated with film success. A script can be written as an original, readapted from a famous story, remade from old films, or produced as a sequel. In marketing terminology, a sequel or remake is a form of a "brand extension" where producers re-develop a successful brand in different ways to produce a new version to achieve tremendous success than the original brand (Gunter, 2018). Studies have found that audiences are excited to have more stories about famous characters or tales, making sequels and remakes financially attractive. Pokorny and Sedgwick (2010) examined the performance of sequels across seven major studios from 2004 to 2009 and found that sequels deliver a high return on investments. Bohnenkamp et al. (2015) also suggest that sequels, remakes, and adaptations often provide higher returns and less risk to produce. One of the reasons is that remakes and sequels offer the audience familiarity and positive associations and are thus more appealing (Hennig-Thurau et al., 2006). Terry, Butler, and De'Armond (2003) examined the performance of 505 Hollywood films and found that sequels contribute significantly to the success of a film. In contrast, Ginsburgh, Pestieau, and Weyers (2006) found that remakes often perform worse than originals. They suggest that for a remake to be successful, it should retain sufficient originality to invite attention from filmgoers.
The timing factor is also one of the critical factors in film success. Even though films are released throughout the year, filmgoers are only sometimes evenly available. Chang and Ki (2005) found that film revenues in the United States are higher during the summer holidays. Einav (2002) also suggests that films released during the holidays had higher admissions than on regular days despite the higher competition. However, if all films were released during the high season, competition at the box office would be intense, and the market would quickly become saturated (Gunter, 2018). Since time availability and the budget of filmgoers are limited, some films will be preferred while others are neglected, affecting the film admission rate. Yang and Kim (2014) found that the Hollywood film release pattern affects the release schedule for local films in South Korea, where many local film producers avoid direct competition with foreign film hits. Einav (2002) also suggests that avoiding direct competition with big studios could be financially beneficial. Foreign film hits are often backed up by big stars, high budgets, and big promotions, forcing smaller studios to release during the low season to avoid competition.
Similarly, age has long been considered a determinant of film success. The Motion Pictures Association of America (MPA) rates a film's age category in the United States. This rating helps to determine the appropriate specific age category of a film before its release. The study by Litman and Kohl (1989) suggested that specific age categories can lead to higher financial performance because they appeal to a larger potential audience than others. Films with a more restrictive age category typically had a worse financial performance at the box office (De Vany & Walls, 2002).
From the reviewed literature, we observe that the direct associations between film attributes and their performance in the market are less than conclusive. Despite the vast array of studies on film success, most of them are based on the performance of global box office movies. Scholars need more attention to analyze the critical factors that define local film success at the box office. With the high exposure of foreign films to the domestic market, local films will likely struggle to compete. This study aims to complement the existing body of knowledge on the film industry by providing profiles of successful local films at domestic theatres based on empirical data. Local film success is the primary concern of our study. To support the analysis, we collected several data pertinent to Indonesian films at the box office. We collected 225 films released between 2017 and 2018, along with their performance and attributes. The attributes include eight factors such as actor popularity, story origins, release date, genre, number of competitors at the theater, exhibition period, and foreign film revenues. The data was collected from multiple sources -on the internet, such as Google Trends, filmindonesia.or.id, imdb.com, bisokoptoday.com, and boxofficemojo.com.

336
We perform some synthetic measurements to augment the data attributes. The actors' popularity in our data was measured based on the search term volume on the internet, as recorded by Google Trends. In Google Trends, the search term volume is measured on a scale of 0 to 100: "0", denoting the lowest popularity and "100", the highest popularity during the observation period. This study assumes that the actors were cast one year before the film's release date. Thus, an actor's popularity was measured based on that timeline to avoid bias from the postrelease effect. The exposure of foreign films was estimated based on the total worldwide grossing of the foreign films during a local film exhibition period. Thus, we assume that a high total gross indicates substantial exposure from foreign films. We also classified the exhibition period of local films into two seasons: the holiday season (June, July, December, and January) and the regular season (the rest of the months). The story familiarity was sorted based on whether the film was produced as a sequel/remake. The story origin was classified based on whether the film is developed based on an existing story (e.g., novels, popular tales). We made the dataset publicly available via the following link: https://doi.org/10.7910/DVN/BJTJTI. The statistical summary of the data is shown in Table 1, and the details of the film release time are shown in Figure 1.

III. RESEARCH METHODOLOGY
We adopt the tree-based comparative analysis (TBCA) method to generate the success profiles of Indonesian films. The TBCA is an exploratory data analysis tool based on the popular qualitative comparative analysis (QCA) method to perform a cross-case evaluation. In TBCA, each case is represented by a combination of conditions ( ) and a specific outcome ( ). In statistical terms, the outcome can be considered the response/target variable, while the conditions can be conceived as factors/determinants that cause an outcome. Like QCA, the TBCA uses set theory as the foundation to provide evidence on causal effects. However, it also employs a machine learning tool, i.e., the decision tree model, to augment the search process for finding the minimum set of conditions sufficient to cause an outcome ( → ). The TBCA method was first introduced by Hartono et al. (2020) to improve the efficiency of traditional QCA in dealing with a dataset that has a large combination of conditions but a limited number of cases.

A. Case Outcome
In the TBCA, the case outcomes need to be translated into a binary term, where "0" means an absence, and "1" means the presence of an outcome. Our study determines the case outcomes based on the local film performance at the Indonesian box office. Several previous studies used financial performance as a proxy for film success (Galvão & Henriques, 2018;Litman & Kohl, 1989;McKenzie & Walls, 2013;Simonton, 2009). However, the financial-related data for Indonesian films are not publicly available. Therefore, we used film admission rates as the proxy, as has been used by Changsong (2019) and Gevaria et al. (2015) in their studies.
Based on the film admission rates, we translate the film success level into two classes, i.e., "non-successful" and "successful" films. A local film is considered successful if it is attended by at least 200,000 viewers (about twice the typical total viewers median). Table 2 shows that 50% of local films with more than 200,000 are successful. The rest are considered non-successful since less than 100,000 viewers attended the films. Further, we observe that the local film admission distribution is highly skewed. Although the median film admission rate is about 100,000 viewers, some films can reach more than 1,000,000 attendees. Therefore, to address the issue, we further break down the successful film category into two classes: "moderately successful" and "highly successful". The highly successful films are local films that managed to attract more than 1,000,000 attendees during their release period, and the moderately successful films have admission between 200,001 to 1,000,000 attendees.

B. Potential factors and conditions
Based on the recent literature, we collected potential factors that may significantly affect the film's success (see Section 2).-We found at least eight factors pertained to film success from the literature. We then categorized the factors based on two aspects, namely: the production aspect and the distribution aspect. The production aspect consists of film attributes that are pertinent in film production, such as story origins, production type, actor popularity, age category, and film genre. In contrast, the distribution consists of film attributes relevant to the managerial aspect, such as release time decision, competition level, and foreign film exposure. In TBCA, the data values in factors need to be transformed into categorical terms to represent conditions. Table 3 summarizes the potential factors and the conditions used in the study. There are seven nominal variables, with four dummy variables representing the film genre. Actor popularity is estimated based on the search term volume in the Google trend one year before the film release date 2 A film can have more than one genre, such as action-comedy or drama-horror. Therefore, it is coded as a dummy variable.

C. Comparative analysis
In the traditional QCA, all possible conditions need to be listed and compared based on the amount of case that complies with the conditions. This process can be meticulous, especially when the possible combination of conditions is enormous. Our case has eight potential factors with various conditions. In total, 18,432 possible sets of conditions need to be evaluated, while there are only 225 cases available. This is where the machine learning tool in TBCA comes in handy. The classification tree helps to automatically find the minimum set of conditions that best predict the outcome. It serves as a decision support tool in the form of a tree-like model of decisions and their possible outcome, including the likelihood of the outcome based on set theory.
Unlike most machine learning tools, the classification tree is a white box model, that is transparent and very interpretable. The model learns the complex interrelationship between conditions based on a greedy function. It starts by selecting a variable that best classifies the sample and continuously partitioning the dataset based on the identified conditions until a stopping criterion is met. In this study, we used the Gini index as the splitting rule. The Gini index measures the heterogeneity level in the sample. An index with a "0" value denotes an entirely homogeneous sample, and an index with "1" denotes a completely heterogeneous sample. The heterogeneity level indicates the consistency of a condition. Therefore, an attribute with a "0" Gini index would be insignificant since the condition gives a mixed conclusion. To maintain the interpretability of the tree, we determine that the number of samples in the leaf should be at least than 10% of the total sample, and the depth of the tree should be at most three splits. We evaluated the performance of the classification tree based on its accuracy criterion, which shows the ratio of the correctly predicted case to the total observed cases. The classification tree analysis was executed using Python programming language version 3.6 with sci-kit-learn library version 20.0.
The resulting tree consists of nodes and branches. The node represents a condition that splits the data into branches. Following the Boolean logic in the QCA, we employed a binary classification tree, meaning that every node will only have two subdivisions, denoting True and False paths (see Figure 2). The binary paths reflect compliance to a particular combination of conditions for an outcome to happen. The branches are then used to generate a "truth table," which helps interpret the causal effect of multiple conditions on an outcome. Fig. 2 The basic structure of a classification tree in the TBCA Table 4 summarize the number of case and the success rate pertinent to a specific condition. Although the summary does not embody a multivariate analysis, it gives initial insights into which condition is critical for film success. From the table, we can observe the high-level characteristics of Indonesian films. Specifically, we found that 75% of local films were based on original stories, and only 7% were produced as a sequel/remake. More than half of local films were rated for 13+, and 66% have a drama genre. Besides, most films were released during a non-holiday season (77%), and about half of them needed to compete with more than ten other films at the theaters. Further, every condition's success rate indicates how the condition affects the outcome. The table shows some notable conditions that might be critical for film success. The data shows that films with popular actors and familiar stories tend to be successful. Besides, films released during the holiday season or with popular foreign films also tend to attract more viewers. Interestingly, other factors such as story origins, film genre, age category, and the number of competitors at the theaters have a low effect on the success rate of a film. However, the data summarized in Table 4 was based on univariate analysis. It does not represent the complex interaction amongst conditions during a film exhibition. Besides, the total case representing the condition should also be considered. A condition with a high success rate but only represents a small number of cases should be noticed in the analysis. This is where the TBCA has an advantage in giving reliable insights from several possible conditions with a limited number of cases. Figure 3 shows the classification tree model from observation #1. The model aims to find the critical features that distinguish the profile of "non-successful" and "successful" local films at the Indonesian box office. Note that the classification tree has an accuracy score of 70.2%, which is reasonably accurate for a comparative purpose. Only three out of eight potential factors are critical in distinguishing successful and non-successful films. The three factors are the actor's popularity, foreign film revenues, and genre. The remaining five factors were insignificant and therefore excluded from the profile.

Fig. 3. Classification tree of observation #1
Based on the resulting tree, we deduce the comparative profiles between non-successful and non-successful films, as shown in Table 5. The result implies that successful films are mainly dominated by famous actors, as suggested by the root node, where 34 out of 59 films (58%) with famous actors (popularity condition > 2) are proven to be successful at the box office. This finding is consistent with the conclusions from previous studies (e.g., Bagella & Becchetti, 1999;Elberse, 2007;Litman & Kohl, 1989), which suggests that famous actors play an essential role in securing financial success. The presence of big stars in a film is often recognized as a sign of a high-quality film (Gunter, 2018). Their presence can also serve as a promotion to potential viewers, creating awareness and anticipation for the film.
Interestingly, the local film success rate rose by 11% if they were exhibited with popular foreign films. Our sample shows that 24 out of 35 films (69%) fitting the condition attracted more than 200,000 filmgoers. This evidence contrasts with the common belief of local film producers. Previous studies have shown that most producers believe that avoiding direct competition with foreign film hits would increase film success rate (Einav, 2002;Yang & Kim, 2014). However, our evidence shows otherwise. We found that local films tend to have higher viewers when released simultaneously with foreign film hits. The exact reason behind this phenomenon still needs to be determined . A possible explanation is that the popularity of foreign films entices filmgoers to visit theaters, thus increasing the discovery rate of local films ("halo effect"). Another possible explanation is that local films that are distributed at the same time as popular foreign film was highly anticipated. A popular foreign film typically occupies most screens in the theater. Therefore, if the local film is not highly anticipated, producers would not risk exhibiting it with foreign movie hits. The resulting tree also suggests that local films with non-famous actors were mostly non-successful. Our sample shows that 120 out of 166 films (73%) with less famous actors (popularity condition ≤ 2) enticed less than 200,000 viewers during their exhibition. The chance to fail was even greater when non-famous actors star drama films. Our data indicates that 90 out of 113 drama films (80%) with unknown actors could not reach 200,000 viewers. This result is interesting since most Indonesian films (66%) have a drama genre. However, many filmgoers are only attracted to drama films if famous actors star in them.

C. TBCA #2: Moderately Successful vs. Highly Successful Films
The classification tree resulting from observation #2 is presented in Figure 4. The tree aims to find the critical factors that distinguish the profile of "moderately successful" and "highly successful" films at the Indonesian box office. The resulting tree has an accuracy of 76.25%, which is reasonably accurate for a comparative purpose. The accuracy is even higher than the previous model. From Figure 4, we observe that only one factor that significantly separates moderately successful films from highly successful films: is "story familiarity". The other seven factors did not distinguish the two groups as indicated by the Gini performance. The comparative profiles between moderately successful and highly successful films are summarized in Table 6.  The figure also suggests that a film with a familiar story, such as a sequel/remake, is likelier to attain "high success". Our sample shows that 8 out of 13 successful films (62%) produced as a sequel/remake successfully attract more than one million viewers. This finding confirms the efficacy of a sequel/remake strategy as a "brand extension" in the film industry, as confirmed by Bohnenkamp et al. (2015). Previous studies have also shown that remakes and sequels offer filmgoers a positive association, raising their intention to have more stories alike (Hennig-Thurau et al., 2006).

V. CONCLUSIONS AND RECOMMENDATION
Producing a thriving local film is challenging, especially in this globalization era. Our data shows that only a small portion of local films (35%) in Indonesia managed to attract more than 200,000 viewers, and only about a quarter reached one million. To understand this phenomenon, we conducted a data-driven investigation to seek the profiles of successful Indonesian films. The objective was to discover the critical conditions that drive and distinguish the local film performance. To do so, we collecteddata from multiple sources, consisting of 225 Indonesian films. As part of the study contribution, we made this dataset available to the public for future research.
Our study used the TBCA method to generate successful local film profiles based on the dataset. The method efficiently found the minimal set of conditions sufficient to explain the outcomes (success/unsuccess). The result implies that four critical factors that drive local film success in Indonesia: (1) actor popularity, (2) foreign film popularity, (3) genre, and (4) story familiarity. The TBCA also presents the interaction between conditions affecting the film's success rate. Specifically, we found that combining of famous actors, a familiar script, and a deliberate release strategy can significantly improve the success rate of local films. In contrast, combining a drama film with unknown actors can reduce the chance of success. While the positive effect of famous actors and a familiar story has been well-understood in the literature, the exact reason local films get significant viewers when exhibited together with popular foreign films needs to be determined . This phenomenon deserves future research to explicate the puzzle since it contrasts the common belief of local producers.
We propose managerial implications for local film producers who wish to reach significant viewers in the Indonesian theatrical market. First, acquiring famous actors for a local film can be a safe strategy to attract significant viewers and secure a return on investment. Second, local drama films with unknown actors are the least favorable type at the Indonesian box office. Therefore, producers with a small budget might want only to produce a drama movie if they can afford famous actors to star in the film. Third, releasing a local film in the same period as foreign film blockbusters could help to increase film popularity. However, such a strategy might be expensive since exhibitors tend to allocate most screens for the more anticipated films to maximize profits. Finally, creating a remake/sequel from successful films could be a promising strategy to attract many viewers due to the positive association gained from memorable stories or characters.
This study also has some potential limitations. Some financial-related variables in a film, such as film revenues, production costs, marketing expenses, and actor salaries, were unavailable for the study. Therefore, not all potential success factors derived from previous research can be explored in this study. Moreover, the conclusions from this study were derived based on the performance of Indonesian films in 2017-2018. Even though the data may reflect the recent trend in the Indonesian market, they may not capture the emerging trend over a prolonged period. Future studies may address these limitations by using a more extended period to gain more reliable insights into the Indonesian film industry.