**4. Model overview**

PageRank is an algorithm that measures the weight of a specific web page relative to other web pages, which is often used in page ranking. PageRank assumes that a user randomly selects a page to visit from all pages and then jumps to other pages through hyperlinks. After reaching each page, the user has two options: end here, or continue visiting by selecting a link randomly. Let *d* be the probability of continuing visiting. The user selects a hyperlink at random with the same probability from the current page to continue visiting, which is a random walk process. After many rounds of walks, the probability of visiting each page will converge to a stable value. This value is the weight of a web page. The algorithm is shown in Eq. (3):

$$PR(i) = \frac{1-d}{N} + d \sum\_{j \in in(i)} \frac{PR(j)}{|out(j)|} \tag{3}$$

**Figure 2** is a typical bipartite graph structure, containing four users and four items. The actions between users and items are mapped as edges in the graph. For simplicity, the weights of edges are assumed the same. Take crowdfunding as an example, where *U* denotes investors; *I* denotes crowdfunding campaigns; and the edges denote users' investment behaviors in campaigns. *G* is actually a matrix structure, which can be calculated to obtain the global similarity by PersonalRank. The core idea of CF is the calculation of similarity between users (user-based) or between items (item-based). The similarity algorithm commonly used is cosine

*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data*

*A* � *B* k k *<sup>A</sup>* k k*<sup>B</sup>* <sup>¼</sup>

P*<sup>n</sup> <sup>i</sup>*¼<sup>1</sup>*Ai* � *Bi* ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

�

j j *User* � j j *Item* (6)

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P*<sup>n</sup> <sup>i</sup>*¼1ð Þ *Bi* 2 <sup>q</sup> (5)

P*<sup>n</sup> <sup>i</sup>*¼1ð Þ *Ai* 2

q

In bipartite graph model, the similarities between all nodes are calculated, which can be integrated with CF algorithm, and may achieve better performance than

The research data was collected from Kickstarter, which contains 32,226 investment behaviors from 14,506 users which invest on 787 campaigns. This paper used an offline evaluation method to evaluate the recommender system, dividing the dataset into a training set and a test set. If a user has only one investment behavior, the recommendation list cannot be produced because if the behavior is classified into the training set, the accuracy of the recommendation list cannot be evaluated; if classified into the test set, preference similarity cannot be obtained through the

Data sparseness is defined as the probability of matrix elements without data, which is calculated by Eq. (6). The sparseness of the experimental data is 96–99%, that is, about 96–99% of the matrix elements in the users' behavior matrix lack

*sparsity* <sup>¼</sup> <sup>1</sup> � j j *Behavior*

Firstly, the parameters are setting. PersonalRank has two parameters:

1.Convergence coefficient. According to the present research, it is set to 0.85.

2.Number of iterations. There is no fixed value, and it needs to be set depending on the data by following two methods: (a) specify the number of iterations

In the dataset, most users support less than five campaigns, also leading to the extremely sparseness of the dataset. Many campaigns have a small number of supporters, while popular campaigns have won a large number of supporters. Statistics show that campaigns in the dataset have one supporter at least, 9046 sup-

function, as shown in Eq. (5):

**5.1 Experimental data**

user's behavior.

values:

**203**

*similarity A*ð Þ¼ , *B* cosð Þ¼ *θ*

*DOI: http://dx.doi.org/10.5772/intechopen.92781*

directly recommendation by bipartite graphs.

porters at most, and 41 supporters on average.

**5.2 Experimental settings**

**5. Experimental data and experimental settings**

*PR*(*i*) is the probability of visiting page *i, d* is the probability of continuing visiting pages (i.e., the damping coefficient), *N* is the total number of pages, *in (i)* is the page set pointing to page *i* (i.e., in-links), and *out (j)* is the page set pointed by page j (i.e., out-links).

PageRank is a global algorithm, which does not distinguish the types of nodes. However, the recommender system for crowdfunding campaigns is faced with both user nodes and campaign nodes. We can only obtain the weight of nodes themselves by PageRank, rather than the similarity between nodes. Based on PageRank, the improved algorithm PersonalRank is a bipartite graph algorithm [6], which can generate personalized item list for users, as shown in the Eq. (4):

$$\begin{aligned} PR(i) &= (1 - d)r\_i + d \sum\_{j \in in(i)} \frac{PR(j)}{|out(j)|} \\ r\_i &= \begin{cases} 1 & , \text{ } \text{if } \ i = u \\ 0 & , \text{ } \text{ } i \neq u \end{cases} \end{aligned} \tag{4}$$

The difference between Eqs. (3) and (4) is that 1/*N* is replaced by *ri*. In other words, initial probabilities vary in different nodes. In bipartite graph model, *u* is the target user, and Eq. (4) actually calculates the similarity of all nodes relative to node *u*.

Specifically, unlike PageRank randomly selecting a node to walk, PersonalRank starts from the special node *u* and can only walk to different types of nodes. Taking crowdfunding as an example, user nodes can only walk to campaign nodes, while campaign nodes can only walk to user nodes. After reaching a new node, the walk stops and restarts from *u* with a probability of 1-*d* or continues walking to a node in the other type with a probability of *d.* After many rounds of walk, the probabilities of visiting each node tend to be stable. Therefore, before running PersonalRank algorithm, an initial probability must be set for each node. In PageRank, if *u* is the user, let the initial probability of visiting node *u* be 1 and other nodes be 0. But in PageRank, initial probability of visiting each node is equal, and the initial probability is *1/N*.

A bipartite graph is a graph model composed of two groups of nodes with different properties, and the nodes in the same group are not connected. A bipartite graph can be defined as a network structure *G* = <*U*, *I*, *E*>, where *U* denotes the user set; *I* denotes the item set; and *E* denotes the edges of bipartite graph model.

*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data DOI: http://dx.doi.org/10.5772/intechopen.92781*

**Figure 2** is a typical bipartite graph structure, containing four users and four items. The actions between users and items are mapped as edges in the graph. For simplicity, the weights of edges are assumed the same. Take crowdfunding as an example, where *U* denotes investors; *I* denotes crowdfunding campaigns; and the edges denote users' investment behaviors in campaigns. *G* is actually a matrix structure, which can be calculated to obtain the global similarity by PersonalRank.

The core idea of CF is the calculation of similarity between users (user-based) or between items (item-based). The similarity algorithm commonly used is cosine function, as shown in Eq. (5):

$$\text{similarity}(A, B) = \cos \left( \theta \right) = \frac{A \cdot B}{||A|| ||B||} = \frac{\sum\_{i=1}^{n} A\_i \times B\_i}{\sqrt{\sum\_{i=1}^{n} (A\_i)^2} \times \sqrt{\sum\_{i=1}^{n} (B\_i)^2}} \tag{5}$$

In bipartite graph model, the similarities between all nodes are calculated, which can be integrated with CF algorithm, and may achieve better performance than directly recommendation by bipartite graphs.

### **5. Experimental data and experimental settings**

#### **5.1 Experimental data**

**4. Model overview**

*Banking and Finance*

page j (i.e., out-links).

node *u*.

ity is *1/N*.

model.

**202**

PageRank is an algorithm that measures the weight of a specific web page relative to other web pages, which is often used in page ranking. PageRank assumes that a user randomly selects a page to visit from all pages and then jumps to other pages through hyperlinks. After reaching each page, the user has two options: end here, or continue visiting by selecting a link randomly. Let *d* be the probability of continuing visiting. The user selects a hyperlink at random with the same probability from the current page to continue visiting, which is a random walk process. After many rounds of walks, the probability of visiting each page will converge to a stable value. This

*<sup>N</sup>* <sup>þ</sup> *<sup>d</sup>* <sup>X</sup>

PageRank is a global algorithm, which does not distinguish the types of nodes. However, the recommender system for crowdfunding campaigns is faced with both user nodes and campaign nodes. We can only obtain the weight of nodes themselves by PageRank, rather than the similarity between nodes. Based on PageRank, the improved algorithm PersonalRank is a bipartite graph algorithm [6], which can

*PR*(*i*) is the probability of visiting page *i, d* is the probability of continuing visiting pages (i.e., the damping coefficient), *N* is the total number of pages, *in (i)* is the page set pointing to page *i* (i.e., in-links), and *out (j)* is the page set pointed by

*j* ∈*in i*ð Þ

*j* ∈*in i*ð Þ

*PR j* ð Þ j j *out j* ð Þ

( (4)

*PR j* ð Þ

j j *out j* ð Þ (3)

value is the weight of a web page. The algorithm is shown in Eq. (3):

*PR i*ðÞ¼ <sup>1</sup> � *<sup>d</sup>*

generate personalized item list for users, as shown in the Eq. (4):

*PR i*ðÞ¼ ð Þ <sup>1</sup> � *<sup>d</sup> ri* <sup>þ</sup> *<sup>d</sup>* <sup>X</sup>

0 , *if i* 6¼ *u*

The difference between Eqs. (3) and (4) is that 1/*N* is replaced by *ri*. In other words, initial probabilities vary in different nodes. In bipartite graph model, *u* is the target user, and Eq. (4) actually calculates the similarity of all nodes relative to

Specifically, unlike PageRank randomly selecting a node to walk, PersonalRank starts from the special node *u* and can only walk to different types of nodes. Taking crowdfunding as an example, user nodes can only walk to campaign nodes, while campaign nodes can only walk to user nodes. After reaching a new node, the walk stops and restarts from *u* with a probability of 1-*d* or continues walking to a node in the other type with a probability of *d.* After many rounds of walk, the probabilities of visiting each node tend to be stable. Therefore, before running PersonalRank algorithm, an initial probability must be set for each node. In PageRank, if *u* is the user, let the initial probability of visiting node *u* be 1 and other nodes be 0. But in PageRank, initial probability of visiting each node is equal, and the initial probabil-

A bipartite graph is a graph model composed of two groups of nodes with different properties, and the nodes in the same group are not connected. A bipartite graph can be defined as a network structure *G* = <*U*, *I*, *E*>, where *U* denotes the user set; *I* denotes the item set; and *E* denotes the edges of bipartite graph

*ri* <sup>¼</sup> 1 , *if i* <sup>¼</sup> *<sup>u</sup>*

The research data was collected from Kickstarter, which contains 32,226 investment behaviors from 14,506 users which invest on 787 campaigns. This paper used an offline evaluation method to evaluate the recommender system, dividing the dataset into a training set and a test set. If a user has only one investment behavior, the recommendation list cannot be produced because if the behavior is classified into the training set, the accuracy of the recommendation list cannot be evaluated; if classified into the test set, preference similarity cannot be obtained through the user's behavior.

Data sparseness is defined as the probability of matrix elements without data, which is calculated by Eq. (6). The sparseness of the experimental data is 96–99%, that is, about 96–99% of the matrix elements in the users' behavior matrix lack values:

$$sparsity = 1 - \frac{|Behavior|}{|User| \times |Item|} \tag{6}$$

In the dataset, most users support less than five campaigns, also leading to the extremely sparseness of the dataset. Many campaigns have a small number of supporters, while popular campaigns have won a large number of supporters. Statistics show that campaigns in the dataset have one supporter at least, 9046 supporters at most, and 41 supporters on average.

#### **5.2 Experimental settings**

Firstly, the parameters are setting. PersonalRank has two parameters:


forcibly; and (2) judge whether the global computing result has converged, and stop iteration if converged. We integrate these two methods and use the following method for iteration setting.

> In addition to cosine similarity function, other similarity functions have also been tried. The results show that cosine similarity function performs best in the recommender for crowdfunding campaigns. Therefore, cosine-based CF is used as

1 Cosine-based CF Collaborative filtering algorithm based on cosine similarity

graphs

recommend

*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data*

2 PersonalRank Recommendation directly using PersonalRank to calculate bipartite

3 Bipartite graph-based CF Using PersonalRank to calculate node similarity then using CF to

Recommendation according to the content

The most popular items (users) are recommended to users (items)

The sparseness of user behaviors is larger than 99%, and many users cannot find similar users. As a local similarity method, cosine function hardly produces recommendation lists in this situation. Thus, the similarity between any users without intersection is set to 0. **Tables 3** and **4** show the recommendation performance of

From **Tables 3** and **4**, when *K* = 40 and *K* = 55, the best performances are achieved, respectively. However, on the whole, the accuracy is extremely low, which has large room for improvement. The reason is that on the extremely sparse dataset, users have few intersections, which makes it difficult to find users with

**Table 5** shows the performance of using PersonalRank to produce recommendation lists. Compared to cosine similarity CF algorithm, the accuracy of recommendation by PersonalRank has at least doubled, which indicates that on sparse data network, the global similarity algorithm can effectively solve the computing

Then we use bipartite graph-based CF algorithm. The recommendation result for *N* = 5 is shown in **Table 6**, where the algorithm achieves the best performance when *K* = 30. The recommendation result for *N* = 10 is shown in **Table 7**, where the algorithm achieves the best performance when *K* = 30. However, compared to the result of recommendation directly by bipartite graph model, bipartite graph-based CF algorithm does not perform better. It indicates that on this dataset, the accuracy

Comparing **Tables 5**–**7**, we can get the conclusion that the result of recommendation by bipartite graph model is superior to bipartite graph-based CF algorithm. The possible reasons are as follows. (1) Although bipartite graph-based CF algorithm can obtain the similarity between items (users) and generate neighbor items (users), which cannot be done by cosine similarity algorithm, the CF algorithm cannot extract enough items from the neighborhood for recommendation due to the extremely sparse data (e.g., *A* and *B* are very similar, but if *B* has few actions, the accuracy of the recommendation list is still quite low). Therefore, the recommender

similar interests, resulting in the low accuracy of recommendation.

problem of node similarity and improve the recommendation accuracy.

of recommendation calculated by bipartite graph model is higher.

one of the benchmarks for comparing.

**Compared algorithms Description**

*DOI: http://dx.doi.org/10.5772/intechopen.92781*

CF algorithm based on cosine similarity.

**6. Experimental result**

4 Content-based recommender

5 Popularity-based recommender

*Compared algorithms and description.*

**Table 2.**

**205**

**Algorithm 1.** Iteration setting of PersonalRank Algorithm

```
Input: network structure G
   Output: computing results of PersonalRank
1. Define G; #Construct the network
2. Define max_iteration; # Define a maximum number of iterations
3. Define item; #Define the starting point of PersonalRank walking
4. Define previous_iteration = [Null]; # Predefine iteration result
5. For iteration in range(0,max_iteration)
6. For I in G.nodes():
7. Pr[i] = PersonalRank(G);
8. End For
9. If previous_iteration == Pr:
10. Break; #Converged
11. End If
12. previous_iteration = Pr;
13. End For
14. Output Pr;
```
The time complexity of Algorithm 1 is *O*(*max\_iteration*\*|*item*|), where *max\_iteration* is the predefined number of iterations and |*item*| is the number of nodes of items. The complexity of the algorithm means the complexity of the number of iterations, not the complexity of the complete algorithm. We tried to calculate the network, showing that the PersonalRank converges after 100 iterations.

All of the CF in experiments use item-based algorithm, for the following reasons: (1) The number of items is much smaller than the number of users so that the computing cost of the similarity between items is much lower than between users. (2) Item-based methods are used more often in practical applications due to computing convenience, such as Amazon recommender system.

The compared algorithms in this study are summarized in **Table 2**. The contentbased recommender is based on the similarity of items. For instance, if a user has supported a "music" campaign, the content-based recommender algorithm assumes that the user has a greater preference for "music" campaigns. In the compared experiments, we chose six indicators to measure the similarity of campaigns: campaign category, social network of founders, funding status, number of pledge levels, minimum pledge money, and average funding amount.

Popularity-based recommender means the most popular items are directly recommended to users (user-based) or the most popular users are recommended to items (item-based). Popularity-based recommender is independent of neighbor nodes, which means the recommendation lists are the same for any users.

Two parameters need to be set in CF algorithms:


*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data DOI: http://dx.doi.org/10.5772/intechopen.92781*


**Table 2.**

forcibly; and (2) judge whether the global computing result has converged, and stop iteration if converged. We integrate these two methods and use the

The time complexity of Algorithm 1 is *O*(*max\_iteration*\*|*item*|), where *max\_iteration* is the predefined number of iterations and |*item*| is the number of nodes of items. The complexity of the algorithm means the complexity of the number of iterations, not the complexity of the complete algorithm. We tried to calculate the network, showing that the PersonalRank converges after 100

(1) The number of items is much smaller than the number of users so that the computing cost of the similarity between items is much lower than between users. (2) Item-based methods are used more often in practical applications due to com-

puting convenience, such as Amazon recommender system.

minimum pledge money, and average funding amount.

Two parameters need to be set in CF algorithms:

interested in are recommended to target users.

used in present studies.

**204**

All of the CF in experiments use item-based algorithm, for the following reasons:

The compared algorithms in this study are summarized in **Table 2**. The contentbased recommender is based on the similarity of items. For instance, if a user has supported a "music" campaign, the content-based recommender algorithm assumes that the user has a greater preference for "music" campaigns. In the compared experiments, we chose six indicators to measure the similarity of campaigns: campaign category, social network of founders, funding status, number of pledge levels,

Popularity-based recommender means the most popular items are directly recommended to users (user-based) or the most popular users are recommended to items (item-based). Popularity-based recommender is independent of neighbor nodes, which means the recommendation lists are the same for any users.

1.The number of neighbors *K*. *K* similar users (items) are selected as the source for producing recommendation lists, and the items which the users are

2.The list length *N*. *N* items are recommended to target users (or *N* users are recommended to target items). Generally, *N* is set to 5 or 10, which is widely

following method for iteration setting.

**Algorithm 1.** Iteration setting of PersonalRank Algorithm

**Output**: computing results of PersonalRank

7. *Pr*[*i*] = PersonalRank(*G*);

2. **Define** *max\_iteration*; # Define a maximum number of iterations 3. **Define** *item*; #Define the starting point of PersonalRank walking 4. **Define** *previous\_iteration = [Null]*; # Predefine iteration result

**Input**: network structure *G*

*Banking and Finance*

1. **Define** *G*; #Construct the network

5. **For** *iteration* in range(0,*max\_iteration*) 6. **For** *I* in *G*.nodes():

9. **If** *previous\_iteration == Pr*: 10. Break; #Converged

12. *previous\_iteration = Pr*;

8. **End For**

11. **End If**

13. **End For** 14. **Output** *Pr*;

iterations.

*Compared algorithms and description.*

In addition to cosine similarity function, other similarity functions have also been tried. The results show that cosine similarity function performs best in the recommender for crowdfunding campaigns. Therefore, cosine-based CF is used as one of the benchmarks for comparing.

#### **6. Experimental result**

The sparseness of user behaviors is larger than 99%, and many users cannot find similar users. As a local similarity method, cosine function hardly produces recommendation lists in this situation. Thus, the similarity between any users without intersection is set to 0. **Tables 3** and **4** show the recommendation performance of CF algorithm based on cosine similarity.

From **Tables 3** and **4**, when *K* = 40 and *K* = 55, the best performances are achieved, respectively. However, on the whole, the accuracy is extremely low, which has large room for improvement. The reason is that on the extremely sparse dataset, users have few intersections, which makes it difficult to find users with similar interests, resulting in the low accuracy of recommendation.

**Table 5** shows the performance of using PersonalRank to produce recommendation lists. Compared to cosine similarity CF algorithm, the accuracy of recommendation by PersonalRank has at least doubled, which indicates that on sparse data network, the global similarity algorithm can effectively solve the computing problem of node similarity and improve the recommendation accuracy.

Then we use bipartite graph-based CF algorithm. The recommendation result for *N* = 5 is shown in **Table 6**, where the algorithm achieves the best performance when *K* = 30. The recommendation result for *N* = 10 is shown in **Table 7**, where the algorithm achieves the best performance when *K* = 30. However, compared to the result of recommendation directly by bipartite graph model, bipartite graph-based CF algorithm does not perform better. It indicates that on this dataset, the accuracy of recommendation calculated by bipartite graph model is higher.

Comparing **Tables 5**–**7**, we can get the conclusion that the result of recommendation by bipartite graph model is superior to bipartite graph-based CF algorithm. The possible reasons are as follows. (1) Although bipartite graph-based CF algorithm can obtain the similarity between items (users) and generate neighbor items (users), which cannot be done by cosine similarity algorithm, the CF algorithm cannot extract enough items from the neighborhood for recommendation due to the extremely sparse data (e.g., *A* and *B* are very similar, but if *B* has few actions, the accuracy of the recommendation list is still quite low). Therefore, the recommender


*N* **Recall (%) Precision (%) Coverage (%) Popularity** 0.16 1.17 1.72 1.586 0.28 1.02 3.17 1.545 0.41 0.99 4.48 1.476 0.52 0.93 5.65 1.434 0.59 0.85 6.75 1.425 0.67 0.81 7.81 1.415 0.72 0.75 8.82 1.401 0.80 0.72 9.80 1.382 0.85 0.68 10.78 1.372 0.90 0.66 11.76 1.365

*K* **Recall (%) Precision (%) Coverage (%) Popularity** 0.39 0.56 7.72 1.298 0.38 0.55 6.22 1.468 0.39 0.57 5.62 1.572 0.39 0.56 5.40 1.607 0.40 0.58 5.27 1.627 **0.41 0.59 5.19 1.641** 0.41 0.59 5.16 1.653 0.40 0.59 5.10 1.654 0.41 0.59 5.09 1.663 0.40 0.59 5.08 1.655 0.39 0.57 5.07 1.638 0.40 0.58 5.04 1.643 0.40 0.58 5.03 1.645 0.40 0.58 5.02 1.647 0.40 0.58 5.01 1.649 0.40 0.58 5.00 1.65 0.40 0.58 5.00 1.652

*Performance of CF algorithm based on cosine similarity (*N *= 10).*

*DOI: http://dx.doi.org/10.5772/intechopen.92781*

*K* **Recall (%) Precision (%) Coverage (%) Popularity** 0.18 0.3 4.4 0.993 0.18 0.3 4.4 0.993 0.18 0.3 4.4 0.993 0.18 0.29 4.4 0.993

*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data*

**Table 5.**

**Table 4.**

*Result of recommendation by PersonalRank.*

#### **Table 3.**

*Performance of CF algorithm based on cosine similarity (*N *= 5).*



*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data DOI: http://dx.doi.org/10.5772/intechopen.92781*

**Table 4.**

*K* **Recall (%) Precision (%) Coverage (%) Popularity** 0.07 0.25 2.60 0.871 0.08 0.28 2.58 0.935 0.09 0.30 2.52 0.96 0.09 0.31 2.46 1.013 0.09 0.31 2.38 1.035 0.10 0.32 2.36 1.043 0.10 0.33 2.34 1.048 **0.10 0.35 2.33 1.052** 0.10 0.33 2.33 1.054 0.10 0.33 2.33 1.057 0.10 0.34 2.32 1.054 0.10 0.34 2.32 1.055 0.10 0.34 2.32 1.056 0.10 0.33 2.32 1.057 0.10 0.33 2.32 1.057 0.10 0.33 2.32 1.057 0.10 0.34 2.32 1.057 0.10 0.34 2.32 1.057 0.10 0.34 2.32 1.058 0.10 0.34 2.32 1.058

*K* **Recall (%) Precision (%) Coverage (%) Popularity** 0.13 0.22 4.97 0.842 0.13 0.22 4.9 0.9 0.15 0.25 4.79 0.922 0.16 0.26 4.64 0.958 0.16 0.27 4.5 0.97 0.16 0.27 4.45 0.976 0.17 0.28 4.43 0.981 0.17 0.29 4.42 0.985 0.17 0.29 4.41 0.988 0.18 0.29 4.41 0.99 **0.18 0.3 4.4 0.99** 0.18 0.3 4.4 0.992 0.18 0.29 4.4 0.992 0.18 0.29 4.4 0.993 0.18 0.3 4.4 0.993 0.18 0.3 4.4 0.992

**Table 3.**

*Banking and Finance*

*Performance of CF algorithm based on cosine similarity (*N *= 5).*

*Performance of CF algorithm based on cosine similarity (*N *= 10).*


#### **Table 5.** *Result of recommendation by PersonalRank.*


## *Banking and Finance*


*K* **Recall (%) Precision (%) Coverage (%) Popularity** 0.01 0.04 1.82 0.846 0.03 0.09 1.69 0.927 0.03 0.11 1.54 1.029 0.04 0.12 1.45 1.121 0.04 0.14 1.36 1.243 0.04 0.15 1.24 1.337 0.05 0.16 1.14 1.427 0.05 0.17 1.05 1.52 0.05 0.17 0.95 1.607 0.05 0.17 0.88 1.647 0.05 0.17 0.83 1.685 0.06 0.19 0.77 1.729 0.06 0.20 0.72 1.763 0.06 0.20 0.68 1.801 0.06 0.20 0.65 1.839 0.06 0.22 0.62 1.867 0.07 0.22 0.59 1.901 **0.08 0.25 0.55 1.933** 0.07 0.25 0.53 1.957 0.07 0.25 0.50 1.993

*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data*

*DOI: http://dx.doi.org/10.5772/intechopen.92781*

*K* **Recall (%) Precision (%) Coverage (%) Popularity** 0.02 0.03 3.45 0.819 0.04 0.06 3.25 0.876 0.05 0.08 3.02 0.936 0.05 0.09 2.91 0.999 0.06 0.10 2.76 1.082 0.07 0.11 2.56 1.152 0.07 0.11 2.38 1.232 0.07 0.11 2.20 1.316 0.07 0.12 2.03 1.393 0.07 0.12 1.85 1.452 0.08 0.13 1.71 1.507 0.08 0.14 1.56 1.549 0.09 0.15 1.45 1.581 0.09 0.15 1.36 1.608 0.10 0.16 1.29 1.645 0.10 0.17 1.23 1.673

**Table 8.**

*Result of content-based recommender (*N *= 5).*

**Table 6.**

*Result of bipartite graph-based CF algorithm (*N *= 5).*


#### **Table 7.** *Result of bipartite graph-based CF algorithm (*N *= 10).*

performance of CF algorithm is poor. (2) Local algorithms only produce local optimal solutions, also resulting in the poor performance of bipartite graph-based CF algorithm, whereas recommendation directly by bipartite graph model is a global recommendation algorithm, which can overcome the shortage of sparse matrix.

The results of content-based recommender are shown in **Tables 8** and **9**, where the best performances are achieved when *K* = 90 and *K* = 100, respectively. The accuracy of content-based recommender is the lowest, which might be determined by the investment preference of investors on crowdfunding campaigns. For example, many investors have participated in multiple categories of campaigns, rather than focusing on one or several categories.

The comprehensive comparison result of various algorithms is summarized in **Table 10**. On this dataset, PersonalRank is the most effective in computing the node distance of bipartite graph model and converting it into similarity, followed by CF


*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data DOI: http://dx.doi.org/10.5772/intechopen.92781*

#### **Table 8.**

performance of CF algorithm is poor. (2) Local algorithms only produce local optimal solutions, also resulting in the poor performance of bipartite graph-based CF algorithm, whereas recommendation directly by bipartite graph model is a global recommendation algorithm, which can overcome the shortage of sparse

*K* **Recall (%) Precision (%) Coverage (%) Popularity** 0.56 0.41 14.11 1.235 0.58 0.42 11.51 1.406 0.59 0.43 10.32 1.466 0.60 0.43 9.91 1.495 0.62 0.45 9.67 1.521 **0.64 0.47 9.54 1.532** 0.64 0.47 9.49 1.532 0.64 0.47 9.41 1.536 0.63 0.45 9.39 1.544 0.62 0.45 9.41 1.541 0.62 0.45 9.39 1.536 0.61 0.44 9.34 1.539 0.60 0.44 9.29 1.541 0.60 0.44 9.26 1.543 0.60 0.44 9.24 1.546 0.60 0.44 9.23 1.547 0.60 0.44 9.23 1.548 0.60 0.44 9.22 1.55 0.60 0.44 9.21 1.551 0.61 0.44 9.20 1.552

*K* **Recall (%) Precision (%) Coverage (%) Popularity** 0.40 0.58 5.00 1.654 0.40 0.58 4.99 1.655 0.40 0.58 4.99 1.656

The results of content-based recommender are shown in **Tables 8** and **9**, where the best performances are achieved when *K* = 90 and *K* = 100, respectively. The accuracy of content-based recommender is the lowest, which might be determined by the investment preference of investors on crowdfunding campaigns. For example, many investors have participated in multiple categories of campaigns, rather

The comprehensive comparison result of various algorithms is summarized in **Table 10**. On this dataset, PersonalRank is the most effective in computing the node distance of bipartite graph model and converting it into similarity, followed by CF

matrix.

**Table 7.**

**Table 6.**

*Banking and Finance*

than focusing on one or several categories.

*Result of bipartite graph-based CF algorithm (*N *= 10).*

*Result of bipartite graph-based CF algorithm (*N *= 5).*

*Result of content-based recommender (*N *= 5).*



applicable to calculate the node similarity, such as SimRank [36]. Other graph models could be applied to recommendation for crowdfunding campaigns in the future. (2) Due to computing complexity, all of CF algorithms used in this paper are item-based, rather than user-based. However, we have to use user-based recommender in some cases. For example, when a new user enters the system, user-based method is more suitable in recommendation. Future research could make a comparison with user-based recommender algorithms. (3) The datasets are all from Kickstarter, but there are other crowdfunding platforms, such as Indiegogo [37]. Research could use other crowdfunding platforms to verify the applicability of bipartite graph model. (4) Based on the data from the crowdfunding platform, we have verified the usefulness of bipartite graph model. However, not all the information in crowdfunding communities is used. For example, some research found the home bias is a common phenomenon in investment [38], that is, offline relationships between founders and investors may have already been established, such as friends, classmates, acquaintances, colleagues, etc. Consequently, there is a psychological and cultural convergence between founders and investors, and the physical distance is relatively close. Therefore, in personalized recommender, the physical distance in graph model could be considered, and the physical distance between users could be modeled into binary graph model to improve the perfor-

*A Bipartite Graph-Based Recommender for Crowdfunding with Sparse Data*

*DOI: http://dx.doi.org/10.5772/intechopen.92781*

This work is partially supported by the NSFC Grant (71771177), the University Innovation Fund from the Science and Technology Development Center of the Ministry of Education (2019J01012), and Standardization of Trade in Service Fund

School of Economics and Management, Tongji University, Shanghai, China

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

mance of recommender.

**Acknowledgements**

(FMBZH-1947).

**Author details**

**211**

Hongwei Wang\* and Shiqin Chen

provided the original work is properly cited.

**Conflict of interest**

The authors declare no conflict of interest.

\*Address all correspondence to: hwwang@tongji.edu.cn

**Table 9.**

*Result of content-based recommender (*N *= 10).*


#### **Table 10.**

*Comprehensive comparison result of various algorithms.*

algorithm using global similarity distance, while content-based recommender has the worst performance. Popularity-based recommender algorithm is superior to cosine-based CF and content-based recommender in precision, but its coverages are too low (0.035 and 0.069), since popularity-based algorithm always recommends those most popular users to the target campaign.
