**Abstract**

It is a common problem facing recommender to sparse data dealing, especially for crowdfunding recommendations. The collaborative filtering (CF) tends to recommend a user those items only connecting to similar users directly but fails to recommend the items with indirect actions to similar users. Therefore, CF performs poorly in the case of sparse data like Kickstarter. We propose a method of enabling indirect crowdfunding campaign recommendation based on bipartite graph. PersonalRank is applicable to calculate global similarity; as opposed to local similarity, for any node of the network, we use PersonalRank in an iterative manner to produce recommendation list where CF is invalid. Furthermore, we propose a bipartite graph-based CF model by combining CF and PersonalRank. The new model classifies nodes into one of the following two types: user nodes and campaign nodes. For any two types of nodes, the global similarity between them is calculated by PersonalRank. Finally, a recommendation list is generated for any node through CF algorithm. Experimental results show that the bipartite graph-based CF achieves better performance in recommendation for the extremely sparse data from crowdfunding campaigns.

**Keywords:** crowdfunding, recommender, bipartite graph, network structure

#### **1. Introduction**

As the largest crowdfunding platform in the world, Kickstarter has attracted 8,604,863 users who participated in 230,850 campaigns with 22,525,091 investment behaviors (www.kickstarter.com). However, about 60% of the campaigns are unsuccessfully financed. The main reason is that many campaigns failed to find enough investors, rather than the ideas were not good enough [1]. Therefore, a recommender for crowdfunding is the key to solving this problem.

A survey has shown that the sparseness of user behaviors in Kickstarter is about 99.99%, leading to the commonly used recommendation algorithms inefficient. For example, collaborative filtering (CF) algorithm based on cosine similarity aims to find users who have the same preference, then calculates interest similarity, and produces recommendation list. However, it is difficult for the algorithm to find similar users on a sparse data, which is one of the main problems faced by recommender systems [2].

Faced with large-scale sparse data, network analysis algorithms are effective approaches to overcome the problem. For example, the PageRank algorithm is applicable to calculate the weight of web nodes. As a global iterative algorithm, PageRank does not distinguish the types of nodes, making it hard to improve the recommendation performance. However, an improved algorithm based on PageRank (i.e., bipartite graph model) provides ideas for us. Using bipartite graph model, we divide the network into an item-user structure, where there is no direct edge between items or between users. Then, the global similarity is calculated by bipartite graph analysis, as opposed to local similarity calculated by cosine function, and can better deal with the problem of sparse data.

subsequent research has shown the upper and lower bounds of bipartite approximations [19]. On this basis, combined with hierarchical subgraphs, Hausdorff edit distance is proposed that can improve calculation accuracy and reduce computational complexity [20]. Visualization methods are also suggested [21]. In practice, the bipartite graph is applied to image segmentation [22]. In terms of recommender system, researcher uses aggregated bipartite graph model to reduce computational complexity of graph models, while recommendation accuracy is decreased [23].

*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data*

Collaborative filtering (CF) techniques are widely used in recommender systems [24]. Relaying on historical behaviors of users, similarity between users is calculated, and then products purchased by similar users are recommended. CF techniques are classified into item-based CF and user-based CF. User-based CF algorithm firstly identifies the user preference profile [25], next calculates the similarity based on the user preference profile, and finally applies the distance of user similarity to recommendation algorithms [26]. Since most users only have purchase behaviors for a few products, sparse data problem hinders the efficiency of recommendation [27, 28]. One solution is data clustering, which solves the

How to evaluate the performance of recommender systems is a complex topic. In general, recommender systems more tend to provide a narrow recommendation list. Inspired by the Gini index, directed weighted conduction (DWC) is proposed. DWC is an evaluation metric based on bipartite graph model, which can effectively avoid recommendation congestion and greatly improve the novelty and diversity of

Take **Figure 1** as an example, where black nodes *A*, *B*, *C*, and *D* denote users and gray nodes *e*, *f*, *g* and *h* denote items. If using the user-based cosine similarity CF algorithm, user *A* has adjacent users *C* and *B*. Item *f* is impossibly recommended to user *A*, because the adjacent users of *A* have no direct link to *f*. Similarly, *f* is also impossibly recommended to *A* in the item-based CF algorithm. Cosine similarity algorithm is a local algorithm, which cannot calculate the similarities of global nodes in a sparse network structure. The recommendation accuracy should be guaranteed using local similarity with dense data, but it is hard to get ideal performance in the

**2.2 Collaborative filtering**

*DOI: http://dx.doi.org/10.5772/intechopen.92781*

problem to some extent.

recommendation [29].

case of sparse data.

**Figure 1.**

**199**

**3. Research gaps and problem definitions**

*The diagram of the application of CF algorithm in the network structure.*

Experiments show that bipartite graph model can effectively produce recommendation lists with sparse data. Furthermore, in the global iterative process of bipartite graph model, the similarity between items or between users is also calculated, in addition to the similarity between items and users. Compared with cosine function, which can only calculate adjacent users, this kind of similarity is extracted from the network, thus it is able to solve the computation problem caused by sparse data. Therefore, we propose a bipartite graph-based CF model by combining the similarity calculated by bipartite graph model with CF algorithm.

#### **2. Literature review**

#### **2.1 Graph model**

PageRank is a classic algorithm to calculate the node's weight [3, 4]. PageRank determines the importance of all web pages based on the assumption that web pages linked from high-quality pages are also high-quality. A page is given a higher weight if more high-ranking pages point to it. Prior studies have raised improved PageRank algorithms, e.g., topic-sensitive PageRank [5]; the algorithm where the linked pages are content relevant but nondirectly adjacent pages, instead of directly adjacent pages [6]. PageRank is a computing-consuming and time-consuming algorithm, and its computational efficiency can be improved by some improved algorithms [7, 8].

When the node's weight is calculated by PageRank, the link weight and the content weight are not distinguished [9]. HITS algorithm separates the quality of nodes into link authority (Hub) and content authority (Authority) [10]. Based on content authority of pages, link authority of pages is determined, and then overall evaluation of web pages is given. A good hub is a page that points to many good authorities; a good authority is a page that is pointed to by many good hubs. This kind of mutually reinforcing relationship between hubs and authorities is applicable for the discovery of authoritative pages and automatic identification of the web structure and resources. Since there are problems of topic drift and irrelevant links in HITS algorithm, some improved methods are proposed [11, 12].

A bipartite graph is an extension of network theory and has attracted lots of attention, such as social network analysis [13]. A bipartite graph divides network nodes into two types, which is different from PageRank that treats nodes as homogeneous. Only nodes in different types are directly connected, while nodes in the same type are indirectly connected [14, 15]. The crowdfunding network can be abstracted as a bipartite graph, where one group of nodes is investors and the other group is items. The bipartite graph model can calculate the distance between nodes, such as Laplacian distance [16], though appropriate algorithms. The Laplacian matrix can measure the reachability of nodes in graph models. Since the distance between nodes is calculated in bipartite graphs, they can be transformed into the similarity between nodes [17]. Typical algorithms include mean similarity [18], and *A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data DOI: http://dx.doi.org/10.5772/intechopen.92781*

subsequent research has shown the upper and lower bounds of bipartite approximations [19]. On this basis, combined with hierarchical subgraphs, Hausdorff edit distance is proposed that can improve calculation accuracy and reduce computational complexity [20]. Visualization methods are also suggested [21]. In practice, the bipartite graph is applied to image segmentation [22]. In terms of recommender system, researcher uses aggregated bipartite graph model to reduce computational complexity of graph models, while recommendation accuracy is decreased [23].

#### **2.2 Collaborative filtering**

Faced with large-scale sparse data, network analysis algorithms are effective approaches to overcome the problem. For example, the PageRank algorithm is applicable to calculate the weight of web nodes. As a global iterative algorithm, PageRank does not distinguish the types of nodes, making it hard to improve the recommendation performance. However, an improved algorithm based on

PageRank (i.e., bipartite graph model) provides ideas for us. Using bipartite graph model, we divide the network into an item-user structure, where there is no direct edge between items or between users. Then, the global similarity is calculated by bipartite graph analysis, as opposed to local similarity calculated by cosine function,

Experiments show that bipartite graph model can effectively produce recommendation lists with sparse data. Furthermore, in the global iterative process of bipartite graph model, the similarity between items or between users is also calculated, in addition to the similarity between items and users. Compared with cosine function, which can only calculate adjacent users, this kind of similarity is extracted from the network, thus it is able to solve the computation problem caused by sparse data. Therefore, we propose a bipartite graph-based CF model by combining the

PageRank is a classic algorithm to calculate the node's weight [3, 4]. PageRank determines the importance of all web pages based on the assumption that web pages linked from high-quality pages are also high-quality. A page is given a higher weight if more high-ranking pages point to it. Prior studies have raised improved PageRank algorithms, e.g., topic-sensitive PageRank [5]; the algorithm where the linked pages are content relevant but nondirectly adjacent pages, instead of directly adjacent pages [6]. PageRank is a computing-consuming and time-consuming algorithm, and its computational efficiency can be improved by some improved algorithms [7, 8]. When the node's weight is calculated by PageRank, the link weight and the content weight are not distinguished [9]. HITS algorithm separates the quality of nodes into link authority (Hub) and content authority (Authority) [10]. Based on content authority of pages, link authority of pages is determined, and then overall evaluation of web pages is given. A good hub is a page that points to many good authorities; a good authority is a page that is pointed to by many good hubs. This kind of mutually reinforcing relationship between hubs and authorities is applicable for the discovery of authoritative pages and automatic identification of the web structure and resources. Since there are problems of topic drift and irrelevant links

and can better deal with the problem of sparse data.

**2. Literature review**

**2.1 Graph model**

*Banking and Finance*

**198**

similarity calculated by bipartite graph model with CF algorithm.

in HITS algorithm, some improved methods are proposed [11, 12].

A bipartite graph is an extension of network theory and has attracted lots of attention, such as social network analysis [13]. A bipartite graph divides network nodes into two types, which is different from PageRank that treats nodes as homogeneous. Only nodes in different types are directly connected, while nodes in the same type are indirectly connected [14, 15]. The crowdfunding network can be abstracted as a bipartite graph, where one group of nodes is investors and the other group is items. The bipartite graph model can calculate the distance between nodes, such as Laplacian distance [16], though appropriate algorithms. The Laplacian matrix can measure the reachability of nodes in graph models. Since the distance between nodes is calculated in bipartite graphs, they can be transformed into the similarity between nodes [17]. Typical algorithms include mean similarity [18], and

Collaborative filtering (CF) techniques are widely used in recommender systems [24]. Relaying on historical behaviors of users, similarity between users is calculated, and then products purchased by similar users are recommended. CF techniques are classified into item-based CF and user-based CF. User-based CF algorithm firstly identifies the user preference profile [25], next calculates the similarity based on the user preference profile, and finally applies the distance of user similarity to recommendation algorithms [26]. Since most users only have purchase behaviors for a few products, sparse data problem hinders the efficiency of recommendation [27, 28]. One solution is data clustering, which solves the problem to some extent.

How to evaluate the performance of recommender systems is a complex topic. In general, recommender systems more tend to provide a narrow recommendation list. Inspired by the Gini index, directed weighted conduction (DWC) is proposed. DWC is an evaluation metric based on bipartite graph model, which can effectively avoid recommendation congestion and greatly improve the novelty and diversity of recommendation [29].

#### **3. Research gaps and problem definitions**

Take **Figure 1** as an example, where black nodes *A*, *B*, *C*, and *D* denote users and gray nodes *e*, *f*, *g* and *h* denote items. If using the user-based cosine similarity CF algorithm, user *A* has adjacent users *C* and *B*. Item *f* is impossibly recommended to user *A*, because the adjacent users of *A* have no direct link to *f*. Similarly, *f* is also impossibly recommended to *A* in the item-based CF algorithm. Cosine similarity algorithm is a local algorithm, which cannot calculate the similarities of global nodes in a sparse network structure. The recommendation accuracy should be guaranteed using local similarity with dense data, but it is hard to get ideal performance in the case of sparse data.

**Figure 1.** *The diagram of the application of CF algorithm in the network structure.*

In bipartite graph algorithms, such as PersonalRank, the distance between items and users can be obtained directly. Therefore, the direct recommendation results can be obtained by transforming the distance into the similarity. For instance, the network in **Figure 1** is transformed into a bipartite graph as shown in **Figure 2**.

PersonalRank is used to calculate the bipartite graph in **Figure 2**. If a recommendation is provided to user *A*, iterative calculation starts at *A*. After 62 iterations, the calculation result converges, and the similarities between *A* and each item are obtained:

$$\begin{aligned} s(A, \mathbf{e}) &= 0.07709, & s(A, f) &= 0.01791\\ s(A, \mathbf{g}) &= 0.09499, & s(A, h) &= 0.26949 \end{aligned} \tag{1}$$

Except the node h with direct action to *A*, the recommended order of the remaining three items is *g* ranks firstly, *e* followed, and *f* lastly. In fact, in the calculation process, PersonalRank also repeatedly iterates to generate user similarities, but explicit output does not exist. In **Figure 2**, the implicit similarities between *A* and other users are:

$$s(A,B) = 0.13\text{602}, \quad s(A,C) = 0.13\text{602}, \quad s(A,D) = 0.04213\tag{2}$$

The above similarities are different from the similarities based on cosine function or Pearson function. Local similarity between users is obtained by cosine or Pearson function (i.e., only the nodes directly adjacent to the user are calculated), while global similarity between users is obtained by bipartite graph algorithm. Taking **Figure 2** as an example, since *A* and *D* have no common actions, their similarity cannot be calculated by cosine or Pearson similarity function, or *s* (*A*, *D*) = 0. It is effective with dense data, since there are enough users with common actions and a neighborhood with a sufficient width is able to be obtained. However, in the case of sparse data, globally calculating user similarity is apparently more effective.

The research progress related to this paper is summarized in **Table 1**. For the personalized recommender for crowdfunding campaigns, although graph models have been used in present research, bipartite graph model is rarely used, especially focusing on solving the problem of sparse data in crowdfunding communities.

energy maintaining campaigns. A survey found that during the preparation period, it took an average of 30 minutes a day and 11 hours on weekends; during the fundraising period, it took 2–11 hours a day lasting 0.5–2 months [34]. Once the funding failed, the founder would get nothing. The reasons for failure might be the quality of the campaigns was poor or right investors had not been found. In the latter case, designing a reasonable personalized recommender system will increase the funding success rate. Therefore, taking advantages of PersonalRank in computing of bipartite graphs, combined with advantages of CF algorithms, the following research questions are proposed to investigate the recommender for crowdfunding

1. In view of the extremely sparse data in crowdfunding communities, we extract user behaviors into the bipartite graph structure and calculate the global

2.Depending on the node similarity matrix in bipartite graph model, we propose a bipartite graph-based CF model combined with CF algorithm to generate

effectiveness of the bipartite graph-based recommender algorithm, comparing

3.We conduct experiments on the dataset from Kickstarter to evaluate the

differences between algorithms and suggesting feasible solutions.

similarity between nodes in the graph model.

**Authors Year Conclusions Comments**

*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data*

• Analysis is focused on project features, while recommender is just an

• It is not aimed at the current situation of sparse data in crowdfunding

• Supervised machine learning lacks reporting on sparse data

• Information in social networks is needed, making operability reduced

• There are similarities in research questions, but it is not aimed at sparse

• Identifying a large number text from social media has high cost

• VC and crowdfunding have similarities and also many differences

• There are differences between KNN and the method used in this paper

• The goal of the study is to achieve the balance between diversity and accuracy

of recommendation, without considering the processing of sparse

application

communities

data

data

2015 • Four research dimensions: temporal traits, personal traits, geo-location traits, and network traits • The backing habits of investors are influenced by their social circle

2014 • Social networks can identify user preference more accurately • Different recommender strategies are adopted for different types of projects

2014 • Social networks can identify user

2013 • Traditional collaborative filtering

(group, segment, code)

2010 • Global recommendation algorithm overcomes some shortcomings of local recommendation algorithm • Nodes of "weak ties" have value in identifying user preference • Bipartite graph model can improve the diversity of recommendation

venture capital

*The main research progress related to this paper.*

algorithm is not suitable in the field of

• The recommendation performance is improved by hierarchy information

preference

*DOI: http://dx.doi.org/10.5772/intechopen.92781*

recommendation list for crowdfunding campaigns.

campaigns:

**201**

Rakesh and Choo [30]

An et al. [1]

Lu et al. [31]

Stone et al. [32]

Zhou et al. [33]

**Table 1.**

Crowdfunding platforms represented by Kickstarter use an *all-or-nothing* funding model, and the funding success rate is about 40%. Founders spend a lot in

**Figure 2.** *The diagram of bipartite graph transformation.*



#### **Table 1.**

In bipartite graph algorithms, such as PersonalRank, the distance between items and users can be obtained directly. Therefore, the direct recommendation results can be obtained by transforming the distance into the similarity. For instance, the network in **Figure 1** is transformed into a bipartite graph as shown in **Figure 2**. PersonalRank is used to calculate the bipartite graph in **Figure 2**. If a recommendation is provided to user *A*, iterative calculation starts at *A*. After 62 iterations, the calculation result converges, and the similarities between *A* and each item are

*s A*ð Þ¼ , e 0*:*07709, *s A*ð Þ¼ , *f* 0*:*01791

Except the node h with direct action to *A*, the recommended order of the remaining three items is *g* ranks firstly, *e* followed, and *f* lastly. In fact, in the calculation process, PersonalRank also repeatedly iterates to generate user similarities, but explicit output does not exist. In **Figure 2**, the implicit similarities between

*s A*ð Þ¼ , *B* 0*:*13602, *s A*ð Þ¼ ,*C* 0*:*13602, *s A*ð Þ¼ , *D* 0*:*04213 (2)

The above similarities are different from the similarities based on cosine function or Pearson function. Local similarity between users is obtained by cosine or Pearson function (i.e., only the nodes directly adjacent to the user are calculated), while global similarity between users is obtained by bipartite graph algorithm. Taking **Figure 2** as an example, since *A* and *D* have no common actions, their similarity cannot be calculated by cosine or Pearson similarity function, or *s* (*A*, *D*) = 0. It is effective with dense data, since there are enough users with common actions and a neighborhood with a sufficient width is able to be obtained. However, in the case of sparse data, globally calculating user similarity is apparently

The research progress related to this paper is summarized in **Table 1**. For the personalized recommender for crowdfunding campaigns, although graph models have been used in present research, bipartite graph model is rarely used, especially focusing on solving the problem of sparse data in crowdfunding communities. Crowdfunding platforms represented by Kickstarter use an *all-or-nothing* funding model, and the funding success rate is about 40%. Founders spend a lot in

*s A*ð Þ¼ , *<sup>g</sup>* <sup>0</sup>*:*09499, *s A*ð Þ¼ , *<sup>h</sup>* <sup>0</sup>*:*<sup>26949</sup> (1)

obtained:

*Banking and Finance*

*A* and other users are:

more effective.

**Figure 2.**

**200**

*The diagram of bipartite graph transformation.*

*The main research progress related to this paper.*

energy maintaining campaigns. A survey found that during the preparation period, it took an average of 30 minutes a day and 11 hours on weekends; during the fundraising period, it took 2–11 hours a day lasting 0.5–2 months [34]. Once the funding failed, the founder would get nothing. The reasons for failure might be the quality of the campaigns was poor or right investors had not been found. In the latter case, designing a reasonable personalized recommender system will increase the funding success rate. Therefore, taking advantages of PersonalRank in computing of bipartite graphs, combined with advantages of CF algorithms, the following research questions are proposed to investigate the recommender for crowdfunding campaigns:

