Next Article in Journal
Machine Learning-Guided Dual Heuristics and New Lower Bounds for the Refueling and Maintenance Planning Problem of Nuclear Power Plants
Previous Article in Journal
Trajectory Clustering and k-NN for Robust Privacy Preserving k-NN Query Processing in GeoSpark
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Influence Maximization with Priority in Online Social Networks

1
Faculty of Information and Security Technology, People’s Security Academy, Hanoi 100000, Vietnam
2
Vietnam National University, Hanoi 100000, Vietnam
*
Author to whom correspondence should be addressed.
Algorithms 2020, 13(8), 183; https://doi.org/10.3390/a13080183
Submission received: 16 June 2020 / Revised: 19 July 2020 / Accepted: 21 July 2020 / Published: 29 July 2020
(This article belongs to the Section Combinatorial Optimization, Graph, and Network Algorithms)

Abstract

:
The Influence Maximization ( IM ) problem, which finds a set of k nodes (called seedset) in a social network to initiate the influence spread so that the number of influenced nodes after propagation process is maximized, is an important problem in information propagation and social network analysis. However, previous studies ignored the constraint of priority that led to inefficient seed collections. In some real situations, companies or organizations often prioritize influencing potential users during their influence diffusion campaigns. With a new approach to these existing works, we propose a new problem called Influence Maximization with Priority ( IMP ) which finds out a set seed of k nodes in a social network to be able to influence the largest number of nodes subject to the influence spread to a specific set of nodes U (called priority set) at least a given threshold T in this paper. We show that the problem is NP-hard under well-known IC model. To find the solution, we propose two efficient algorithms, called Integrated Greedy ( IG ) and Integrated Greedy Sampling ( IGS ) with provable theoretical guarantees. IG provides a 1 ( 1 1 k ) t -approximation solution with t is an outcome of algorithm and t 1 . The worst-case approximation ratio is obtained when t = 1 and it is equal to 1 / k . In addition, IGS is an efficient randomized approximation algorithm based on sampling method that provides a 1 ( 1 1 k ) t ϵ -approximation solution with probability at least 1 δ with ϵ > 0 , δ ( 0 , 1 ) as input parameters of the problem. We conduct extensive experiments on various real networks to compare our IGS algorithm to the state-of-the-art algorithms in IM problem. The results indicate that our algorithm provides better solutions interns of influence on the priority sets when approximately give twice to ten times higher than threshold T while running time, memory usage and the influence spread also give considerable results compared to the others.

1. Introduction

Presently, Online Social Networks (OSNs) have become an important platform in communication as well as e-commerce. Companies and businesses have leveraged a rapid spread of information thanks to the “word of mouth” effect among friends in social networks as a powerful tool for viral marketing. For instance, companies can provide some ones with free samples over an OSN so that much more people may know about their products and they have more chances to sell them. Influence Maximization ( IM ) problem [1], a key problem in viral marketing, has been extensively studied for this decade due to its tremendous value in business, viral marketing and influence propagation. Basically, IM aims to find some nodes (called seedset) in a social network to inject opinion, innovation or influence that can effect the largest the number of nodes. Kempe et al. [1] first studied IM as an optimization problem combined with two well-known models, Independent Cascade ( IC ) and Linear Threshold ( LT ). Since IM is NP-hard, they designed a native greedy algorithm that returned an ( 1 1 / e ) -approximation solution. The research shows that IM is not only a potential commercial role in viral marketing [2,3] but also a foundation of various applications in many fields such as epidemics control in social network [4,5,6,7,8], social network monitoring [9,10], recommendation system [11], etc. Hence, IM has been extensively studied recently [2,4,12,13,14,15,16,17,18,19].
Although IM has a lot of great applications in viral marketing, previous studies ignored considering the impact on priority users who could play an important role for effectiveness of viral marketing campaigns. In fact, companies often prioritize specific potential customers, who are financially competent or suitable for their products. For examples, if a company produces baby diapers, they tend to introduce the product to married women aged 20 to 45. Supposing that they have some data about user accounts on a social network, hence they launch a promotion with suitable amount of gifts to married female users via this social network. If we only care about the number of influenced individuals, as in the case of IM , we will not evaluate the impact to the potential users and lead to wrong selection of a seed set. Figure 1 shows an example. This network contains 8 nodes and 9 edges, the priority set is { b , d } and the weight of each edge (or influence probability) is assigned to 1. Considering the case when the budget k = 1 (number of seed nodes), the optimal solution of IM is { f } influences to 6 nodes including { f , d , g , c , e , h } except b. Hence, IM cannot take effect to all priority nodes.The solution must be { a } that has the total influence is only 5.
Motivated by such interesting scenarios, in this paper we investigate the Influence Maximization with Priority ( IMP ) problem, which takes into account the priority constraint for influence process. Given a social network G = ( V , E ) , a priority set U V , a budget k and a priority threshold T , ( T k ) , the goal is to find the seed set S sized at k so that it influences to U at least T and the influence of the cascade is maximized. In fact, IMP is more suitable than IM . Besides, it generalizes IM problem. Nevertheless this problem faces with complicated challenges caused by the constraint of priority. To address this problem, we propose two approximation algorithms, Integrated Greedy ( IG ) and Integrated Greedy-based Sampling ( IGS ), with provable theoretical guarantees. IG meets the theoretical guarantee based on a modification of the natural greedy algorithm while IGS is an efficient randomized approximation algorithm based on sampling method [13,14,15,20]. This algorithm combines two novel techniques. Firstly, we propose Targeted Reverse Reachable (TRR) concept by modifying the Reverse Reachable Sampling (RR) technique [13,14,15,20] to estimate influence from a seed set to a given priority set. Secondly, we develop a new strategy to select a set of seeds in accordance with the priority constraint and set the number of samples to give a theoretical guarantees. Because IMP is a separate case of IM , we have built extensive experiments on various real networks to compare our IGS algorithm to the state-of-the-art algorithms for IM problem such as DSSA [15], BCT [2], OPIM about the influence on a given priority set, running time and memory used while the influence spread approximations are ensures as in IM .
Our contributions are summarized as follows:
  • We propose the Influence Maximization with Priority ( IMP ) problem that considers priority constraint in Influence Maximization ( IM ) problem. It means we expand the IM by adding a constraint to influence on a given set of users. IMP aims to find the seed set S with size k so that total influence of priority users is at least a given threshold T , ( k T ) and still maintain the influence of cascade maximized.
  • We propose two approximation algorithms, IG and IGS , for the IMP problem. IG algorithm provides an approximation ratio of 1 ( 1 1 k ) t , where t k T is an output of the algorithm. In addition, IGS is a randomized approximation algorithm providing an approximation ratio of 1 ( 1 1 k ) t ϵ with probability at least 1 δ , where ϵ > 0 , δ ( 0 , 1 ) are input parameters and t is an output of algorithm.
  • We conduct extensive experiments on various real networks such as netHEPT, netPHY, Email-Enron, DBLP, and Twitter ReTweet. The results indicate that our algorithm, IGS , often outperforms state-of-the-art IM algorithms in terms of influence, running time and memory used. In particular, IGS provides the solution which ensures that the influence on the priority set is approximately from twice to 10 times greater than its threshold T while still maintains influence spread approximations as in IM algorithms. Further, we also demonstrate that IGS is faster and uses lower memory than the others in a lot of cases. On the whole, although IGS has to care about how influences to a target given users, IGS still gives considerable fast runtime, low memory used and high maximized influence on all nodes such as state-of-the-art algorithms such as DSSA, BCT, OPIM-C. It proves that IGS has been very well designed.
Related work. Kempe et al. [1] first studied the Influence Maximization ( IM ) problem inspired by exploiting the influence among users in social networks for viral marketing [21]. They formulated IM as a discrete optimization problem under two classical information diffusion models, Independent Cascade ( IC ) and Linear Threshold ( LT ). They proved that IM could be approximated within a ratio of 1 1 / e + ϵ for any ϵ ( 0 , 1 ) and proposed a greedy algorithm that provided an approximation ratio of 1 1 / e ϵ for ϵ > 0 . Later, Chen et al. [12,16] continued to study IM and proved that to calculate exactly the influence spread of a seeding set was #P-Hard. Hence although many heuristics algorithms have been proposed to solve this problem in large networks, they still have failed to retain the approximation ratio of 1 1 / e ϵ and have provided a low quality solutions such as the cost-effective lazy-forward heuristic (CELF) proposed by Leskovec et al. [22] which is based on improving greedy algorithm to get 700 times faster than the greedy algorithm with Mote-Carlo simulation; a fast heuristics algorithm called PMIA proposed by Chen et al. [12] which constructs a directed acyclic graph to estimate the influence under IC model or the algorithm proposed by the authors in [16] which uses a local directed acyclic graphs (LDAG) to calculate the local influence of nodes under LT model. To keep the 1 1 / e ϵ ratio, research on the approximation approach continues to be explored. Borgs et al. [13] first presented an ( 1 1 / e ϵ ) -approximation algorithm with probability at least 1 δ in O ( k l 2 ( m + n ) log 2 n / ϵ 3 ) time complexity by introducing Reverse Influence Sampling (RIS) model. This model has formed the foundation for further algorithm development. [14,15,20,23].
From then on, many works expanded IM in contexts of viral marketing. Nguyen et al. [24] investigated the Budged Influence Maximization (BIM) problem which considered the cost of selecting a node and proposed a ( 1 1 / e ϵ ) approximation algorithm. The authors in [2] studied the a generalization of IM and BIM problems, called Cost-aware Targeted Viral Marketing (CTVM). In this work, each node u had an arbitrary cost c ( u ) and a benefit b ( u ) and the goal of CTVM was to select a seed set within a given budget so that the total benefit was maximized. We believe that this is the closest problem to our work. In CTVM problem, we can set parameters that maximize the influence on a given target set of users but cannot simultaneously maximize the influence of the others as in our problem. Later, several works improve the approximation as well as the scalability of CTVM algorithms [25,26].
Moreover, there are also many variants of IM problem that were studied. Some works studied the constraints of IM such as [17,18,27], in which edges were associated with a topic influence weight. These problems aimed to find a set of k users that maximized influenced users according to a topic query. However, the proposed algorithms did not provide any theoretical guarantee. Li et al. [28] proposed the Location-aware Influence Maximization (LIM) problem with the goal was to select the k-seed set so that the number of influenced nodes in the given query region was maximized. [29] investigated the Distance-aware Influence Maximization (DAIM) problem which considered the role of distance between users and the promoted location in seed selection. They extended a RIS process model and provided an unbiased estimator for the DAIM problem.
Besides, some works investigated the problem of Competitive Influence Maximization (CIM), which considered the context of IM under the competition of many rivals. Bharathi et al. [30] first formulated the CIM problem under a new competitive propagation model which was an extension of IC model. Chen et al. [12] investigated CIM under the combating with negative opinions based on an assumption that negative information was often more attractive than official information. Some authors considered the problem under many different cases in viral marketing, such as proposing a distance-aware problem [31], expanding the LT model to reflect competition [13,32,33,34], proposing a heuristic algorithm [35], etc.
Recently, some authors studied the selection of seed nodes in a social network to influence groups of users or communities instead of individuals [36,37,38,39]. They argue that in real-world scenarios, creating impact on groups is more beneficial than the individuals in a network. Tsang et al. [36] investigated the Fairness Group Maximization problem with two fairness criteria including maximin fairness and diversity. While the maximin fairness aimed to maximize the minimum influence nodes of any per their population, the criterion of diversity was an alternate fairness concept by extending the notion of individual rationality to group rationality. They proposed an approximation algorithm based on multi-submodular objective function processing techniques. More recent, the authors in [37] proposed exact algorithms for fairness group influence with multiple criteria based on mix integer linear programming formulation on a specific set of sample graphs under IC model. In [38], the authors characterize the intricate relationship between diversity and efficiency, which sometimes may be at odds but may also reinforce each other. Nguyen et al. [39] considered the Influence Maximization problem at the Community level problem, which found seed set of k nodes that influenced to largest number of communities. They showed that the objective function was neither sub-modular nor super-modular and proposed some approximation algorithms with provable guarantees. Different to our studied problem in this paper, these studies did not address the priority set in influence maximization context. Hence the proposed algorithms cannot be applied to the IMP problem.
Organization. The rest of the paper is organized as follows: Section 2 presents information diffusion model and problem definitions. Section 3 and Section 4 present our proposed Integrated Greedy and Integrated Greedy-based Sampling algorithms for IMP problem with the theoretical analysis. Experimental results are shown in Section 5. In Section 6 we discuss the future work and conclude this paper.

2. Model and Problem Definition

In this section, we introduce about network model and the well-known Independent Cascade ( IC ) diffusion information model [1]. Under IC model, we formally define the Influence Maximization with Priority ( IMP ) problem.

2.1. Graph Notation and Independent Cascade Model

Let G = ( V , E ) be a directed graph representing a social network with a node set V and a directed edge set E, | V | = n and | E | = m . Let N i n ( v ) and N o u t ( v ) be two sets of in-neighbors and out-neighbor of a node v, respectively. The notations of S and S * represent to a seed set that is a solution and an optimal solution of IMP , respectively. We also note OPT = σ ( S * ) is the influence of an optimal solution.
In Independent Cascade ( IC ) model, each edge e = ( u , v ) E has an influence probability p ( u , v ) ( 0 , 1 ) that represents the information transmission from u to v. Each node v V has two possible states, active and inactive. Given a seed set S V , the diffusion process from S happens in discrete steps t = 0 , 1 , , as follow:
  • At step t = 0 , all nodes in S is activated.
  • At step t 1 , for an activated node u in previous steps, it has a single chance to activate each inactive neighbour v with the successful probability p ( u , v ) . An activated node remains a c t i v e till the end of the diffusion process.
  • The propagation process ends when no more node is activated.
Kempe et al. [1] show that IC model is equivalent to live-edge model and estimating the quantity of influence nodes can be done as follows. We first generate a sample graph g from original graph G by selecting each edge e = ( u , v ) E , independently, with probability p ( u , v ) , and no select edge ( u , v ) with probability 1 p ( u , v ) . The probability that a realization g can be generated from G (denoted as g G ) is
Pr [ g G ] = e E ( g ) p ( u , v ) e E \ E ( g ) ( 1 p ( u , v ) )
In this equation, E ( g ) is the set edge of g. The number of sample graphs is 2 | E | . The influence spread of a seed set S in G is calculated as follows:
σ ( S ) = g G Pr [ g G ] | R ( g , S ) |
where R ( g , S ) denotes the set of reachable nodes from S in g. For a set of priority nodes U, the influence spread of S to U is calculated as follows:
σ U ( S ) = g G Pr [ g G ] | R ( g , S U ) |
where R g ( S P ) denotes the set of nodes in U that can reach from S in g. Kempe et al. [1] also show that, σ ( · ) is a monotone and sub-modular function, i.e, for any A V , and v V \ B , we have:
σ ( A + { v } ) σ ( A )
and for any A B V , and v V \ B , we have:
σ ( A + { v } ) σ ( A ) σ ( B + { v } ) σ ( B )
We also easy to see that σ U ( · ) is a monotone and sub-modular function.

2.2. Problem Definition

We investigate Influence Maximization with Priority ( IMP ) defined as follows:
Definition 1
( IMP problem). Given a graph G = ( V , E ) under IC model, a positive integer k (budget), the priority set U V , and the threshold T with T k , T | U | . IMP problem asks to find the seed set S V , with | S | k and σ U ( S ) T so that influence spread, σ ( S ) , is maximized, i.e, find S that is the solution to the following optimization problem:
maximize : σ ( S )
subject   to : | S | k
σ U ( S ) T
IMP becomes IM problem when U = . Therefore, IM is a special case of IMP and IMP is also NP-hard. In addition, the calculation of the influence function from the seed set is proven to be #P-hard [12]. Thus finding the solution to the problem within the time allowed is very challenging.

3. Integrated Greedy Algorithm

In this section, we first propose Integrated Greedy ( IG ) Algorithm which is well-known to resolve monotone and sub-modular problems that ensures an lower-bounded of optimization solution. The details of algorithm is described in Algorithm 1.
Algorithm 1: Integrated Greedy ( IG ) algorithm
Algorithms 13 00183 i001
Assume S 1 is the solution of the problem that finds the minimum seed nodes such that the influence on the priority set is greater than threshold T, and S 2 is a solution of IM problem. The main idea of this algorithm is to modify the native greedy algorithm [1] by combining two above solutions.
The algorithm is divided into two main phases. In the first phase, it tries to find a solution S 1 by a greedy strategy (line 2–4). In each iterator, the algorithm chooses a node u with largest influence incremental to set U into S 1 (line 3–4) until the σ U ( S 1 ) T . Since T < k , | S 1 | T < k . Denote t = k T as the remaining budget (line 6). The algorithm next finds the candidate solution S 2 for IM with the remaining budget t by using a greedy method in the second phase (line 6–10). In each iterator i, it selects a node u with largest influence incremental (line 7). If u already belongs to S 1 , the algorithm increases t by 1 (line 8–9). This phase ends when the remaining budget t is exhausted (line 6). Finally, the algorithm returns the solution S which unites S 1 and S 2 . It is easy to confirm that | S | = k , and t > T k 1 since k > T . Theorem 1 shows the approximation guarantee of IG algorithm.
Theorem 1.
IG algorithm returns ( S , t ) , where S is a feasible solution and t 1 , satisfies:
σ ( S ) 1 1 1 k t σ ( S * )
The worst-case approximation ratio is obtained when t = 1 and it is equal to 1/k.
Proof. 
Denote S IM * = { s 1 , s 2 , , s k } is an optimal solution of IM problem for input data of Algorithm 1 (the graph G and budget k). Obviously, we have σ ( S IM * ) σ ( S * ) . After ending the second phase, assume that S 2 = { u 2 1 , u 2 2 , , u 2 t } , S 2 i = { u 2 1 , u 2 2 , , u 2 i } , and S 2 0 = . In the second phase, the algorithm repeatedly selects a node u of which incremental influence gain is largest and due to the function σ ( · ) is monotone and sub-modular [1], so we have:
σ ( S IM * ) σ ( S 2 i ) σ ( S IM * S 2 i ) σ ( S 2 i )
j = 1 k σ ( S 2 i { s 1 , s 2 , , s j } ) σ ( S 2 i { s 1 , s 2 , , s j 1 } )
j = 1 k σ ( S 2 i { s j } ) σ ( S 2 i ) ( Dueto   σ   is   a   sub-modular   function )
k · max s S IM * σ ( S 2 i { s } ) σ ( S 2 i )
k · σ ( S 2 i + 1 ) σ ( S 2 i )
Therefore, for any i = 0 , , t 1 , we have
σ ( S 2 i + 1 ) σ ( S 2 i ) 1 k ( σ ( S IM * ) σ ( S 2 i ) )
Minus two inequality terms to σ ( S IM * ) , we have:
σ ( S 2 i + 1 ) σ ( S 2 i ) σ ( S IM * ) 1 k σ ( S IM * ) σ ( S IM * ) 1 k σ ( S 2 i )
Rearrange the terms of the above inequality, we have
σ ( S 2 i + 1 ) σ ( S IM * ) 1 1 k ( σ ( S 2 i ) σ ( S IM * ) )
1 1 k t ( σ ( S 2 0 ) σ ( S IM * ) )
Together with the fact that S 2 0 = and σ ( ) = 0 , the above inequality implies
σ ( S 2 ) = σ ( S 2 t ) 1 1 1 k t σ ( S IM * )
Since σ U ( S 1 ) T and S = S 1 S 2 , S is feasible solution of IMP , and
σ ( S ) σ ( S 2 t ) 1 1 1 k t σ ( S IM * ) 1 1 1 k t σ ( S * )
which proves the theorem! ☐
Although Algorithm 1 can provide an approximation guarantee, but it cannot work with real-social networks because the calculation of the influence function σ ( S ) is # P -hard under IC model [12]. To overcome this challenge, we propose a randomize algorithm with provable approximation guarantee based on combining IG with a sampling technique.

4. Sampling Algorithm with Provable Guarantees

In this section, we present an efficient algorithm for IMP problem called Integrated Greedy Sampling ( IGS ) algorithm that can provide an guarantee theoretical. In addition, we show that our algorithm can also be applied to large networks in experiments.

4.1. Estimator of Influence Functions

Firstly, we recap the concept of Reachable Reverse (RR) set [40] to estimate influence function σ ( · ) . Base on that, we propose the concept of Targeted Reachable Reverse (TRR) set to estimate influence function σ U ( S ) . Then we propose IGS algorithm and provide theoretical analysis based on statistical evidence.
Definition 2 
(Reachable Reverse (RR) set [40]). Given a graph G = ( V , E ) under IC model. A random RR set R j is generated from G by:
1. 
Picking a source node u with probability 1 n .
2. 
Generating a sample graph g from G, and returning R j as nodes which can be reached from u in g.
For a random RR set R j , define a random variable X g ( S ) = min { 1 , | R g S | } . Borgs et al. [40] show that RR samples can be used to estimate the influence function by applying the following Lemma.
Lemma 1.
For any set of nodes S V , we have σ ( S ) = n · E [ X g ( S ) ] .
Given a set of RR set R , and a set node S, we can approximate the value of σ ( S ) by σ ^ ( S ) defined as follow:
σ ^ ( S ) = n | R | R g R X g ( S )
Generating RR sets can be accomplished by using IM algorithms in [13,14,15,20,23]. The common algorithm for generating RR set R j is described in Algorithm 2. This algorithm first selects a source node u with a probability 1 n to add into R j . The algorithm uses a queue Q to store the visited nodes. Initially, u is also added to Q. The algorithm next retrieves each node v in Q and picks an incoming node x with probability p ( x , v ) (line 6). If successful, it adds x in to Q and R j . This process takes place until the set Q is empty.
Algorithm 2: Generating RR sample under IC model
Algorithms 13 00183 i002
We now introduce the definition of Targeted Reachable Reverse (TRR) Set on the basis of modifying RR concept.
Definition 3 
(Targeted Reachable Reverse (TRR) Set). Given a graph G = ( V , E ) under IC model. A random TRR set R j U is generated from G by:
1. 
Picking a source node u U with probability 1 | U | .
2. 
Generating a sample graph g from G, and returning R g U as nodes which can be reached from u in g.
We define a random variable Y g ( A ) = min { 1 , | R g U S | } . Similar to Lemma 1, Lemma 2 shows that we can use the value of Y g ( S ) to estimate function σ U ( S ) .
Lemma 2.
For any set of nodes S V , we have σ U ( S ) = | U | · E [ Y g ( S ) ]
Proof. 
Denote R g U ( u ) is a TRR sample with a source node u for the sample graph g, we have:
σ U ( S ) = g G | R ( g , S U ) | = u U g G Pr [ g G ] [ v S : u is   reached   from   v ] = u U g G Pr [ g G ] [ v S : v R g U ( u ) ] = | U | u V 1 | U | g G Pr [ g G ] Y g ( S ) = | U | u V g G Pr [ u is   source   node ] Pr [ g G ] Y g ( S ) = | U | · E [ Y g ( S ) ]
The transition from the second equality to the third equality comes from the definition of R g U ( u ) and from the third to the fourth then to the fifth is caused by the distribution of choosing a node u as a source node. ☐
Given a set of TRR samples R and a set node S, we define and an approximation value of σ U ( S ) as follow:
σ ^ U ( S ) = | U | | R | R g U R Y g ( S )
From Lemma 2, we can give a good approximation of σ U ( · ) when the number of TRR samples is large enough. We can re-use Algorithm 2 to generate a TRR set R j U by a modification. We replace line 1 in the algorithm by picking source node u U with probability 1 | U | and leave the rest as is.

4.2. Algorithm Description and Theoretical Analysis

Algorithm description. The algorithm is detailed in Algorithm 3. It generates the set of N U TRR sets R 1 , and set two candidate solutions S 1 , S 2 empty at first. Then the body of the algorithm divides into two phases. In phase 1, it finds a candidate solution S 1 with minimum-size so that σ ^ ( S ) ( 1 + α ) T by using a greedy strategy with potential function σ ^ over R 1 . In each iterator, it selects a node u with maximal incremental value of the potential function (line 4) until σ ^ ( S ) ( 1 + α ) T . The candidate solution S 1 obtained by this phase satisfies the priority constraint, σ U ( S 1 ) T with probability at least 1 δ (Lemma 4).
The phase 2 selects a candidate solution S 2 with the remaining budget ( t = k | S 1 | ) so that the influence spread σ ( · ) is maximized. In this phase, it first sets the parameters ϵ 1 , t m a x , N m a x and generates N 1 set of RR samples R 2 . The main of this phase operates in several iterators (line 12–27) until meeting the stopping condition (line 22). In each iterator, it finds a candidate solution S 2 by a greedy strategy. It picks a node u with maximal incremental of approximation influence σ ^ ( · ) over R 2 (line 12) until t nodes are selected. Similar to IG algorithm, if u already belongs to S 1 , the algorithm increases t by 1. After that, the algorithm checks the quality of candidate solution S 2 (line 17). It calculates F l ( S 2 , R 2 , δ ) - a lower-bounded of σ ( S 2 ) , and F u ( S 2 , R 2 , δ ) -an upper-bounded of an optimal solution respect to IMP problem. These functions ensure the statistical criterion, which are claimed in the Lemmas 5 and 6. If solution S 2 meets the approximation condition (line 19), the algorithm returns S 2 . If not, it moves to the next iterator and stops when the number of TRR samples is at least N m a x (line 21).
Algorithm 3: Integrated Greedy -based Sampling ( IGS ) algorithm
Algorithms 13 00183 i003
Theoretical analysis. Fortunately, the sequence of random variables X g ( S ) and Y g ( S ) constructed from the RR and TRR samples can be shown to form a martingale. For any random variable X g ( S ) [ 0 , 1 ] , let a random variable M i = j = 1 i ( X g i ( S ) μ ) , i 1 , where μ = E [ X g ] . For a sequence of random variables M 1 , M 2 , we have E [ M i | M 1 , , M j 1 ] = E [ M i 1 ] + E [ X g i ( S ) μ ] = E [ M i 1 ] . Hence, M 1 , M 2 , be a form of martingale [41]. Similarly, Y g is also a form of martingale. Therefore, the following concentration inequality [41] applies:
Lemma 3.
If M 1 , M 2 , be a form of martingale, | M 1 | a , | M j M j 1 | a for j [ 1 , i ] , and
Var [ M 1 ] + j = 2 i Var [ M j | M 1 , M 2 , , M j 1 ] = b
where Var [ · ] denotes the variance of a random variable. Then, for any λ, we have:
Pr [ M i E [ M i ] λ ] exp λ 2 2 3 a λ + 2 b
Apply this Lemma with | M 1 | = | X g 1 ( S ) | 1 , | M j M j 1 | = | X g j ( S ) X g j 1 ( S ) | 1 , Var [ M 1 ] = Var [ X g 1 ( S ) μ ] = Var [ X g ( S ) ] , Var [ M j | M 1 , M 2 , , M j 1 ] = Var [ X g j ( S ) μ ] = Var [ X g ( S ) ] , and Var [ X g ( S ) ] μ ( 1 μ ) μ , we have:
Pr i = 1 | R | X g i ( S ) | R | · μ λ exp λ 2 2 3 λ + 2 μ | R |
Similarly, M 1 , , M i , also form a Martingale, so apply Lemma 3, we have:
P r i = 1 | R | X g i ( S ) | R | · μ λ exp λ 2 2 μ | R |
Let λ = ϵ μ | R | and put it in two above inequalities, we have:
Pr [ i = 1 | R | X g i | R | · μ ϵ | R | μ ] exp ϵ 2 | R | μ 2 + 2 3 ϵ
Pr [ i = 1 | R | X g i | R | · μ ϵ | R | μ ] exp ϵ 2 | R | μ 2
The following Lemma shows the lower-bound of the influence of candidate solution S 1 .
Lemma 4.
The candidate solution S 1 obtained by phase 1 of Algorithm 3 satisfies Pr [ σ U ( S 1 ) T ] 1 δ .
Proof. 
Denote μ Y = E [ Y g ] = σ U ( S 1 ) | U | , and μ ^ Y = 1 N U i = 1 N U Y g i = σ ^ U ( S 1 ) | U | ( T + α T ) | U | . Apply (27) for set R 1 , we have:
Pr [ μ ^ Y ( 1 α ) μ Y ] = Pr Y g i N U μ Y α N U μ Y exp ϵ 2 N U μ Y 2 exp ϵ 2 μ ^ Y N U 2 ( 1 α ) exp ϵ 2 ( T + α T ) 2 ( 1 α ) | U | N U exp ( 2 + 2 3 α ) ln ( | U | | U | / 2 / δ ) 2 ( 1 α ) exp ln ( | U | | U | / 2 / δ ) δ | U | / 2
We assume that the event μ ^ Y ( 1 α ) μ Y happens, apply (26) for set R 1 , we have:
Pr [ σ U ( S 1 ) T ] Pr σ U ( S 1 ) σ ^ U ( S 1 ) 1 + α
= Pr σ ^ U ( S 1 ) ( 1 + α ) σ U ( S 1 )
= Pr | U | N U i = 1 N U Y g i | U | μ Y | U | α μ Y
= Pr i = 1 N U Y g i N U μ Y N U α μ Y
exp α 2 μ Y 2 + 2 3 α N U
exp α 2 μ ^ Y ( 2 + 2 3 α ) ( 1 α ) N U
exp α 2 σ ^ U ( S 1 ) ( 2 + 2 3 α ) ( 1 α ) ( T + α T ) N U
exp ln ( | U | | U | / 2 / δ )
δ | U | | U | / 2
Assume that | S 1 | = k 1 , there are at most n k 1 possibilities for the candidate solutions S 1 . Therefore,
Pr [ S 1 : σ U ( S 1 ) T ] n k 1 δ | U | | U | / 2 δ
 ☐
Lemma 5
(Lower-bound). For any δ ( 0 , 1 ) , a set of RR samples R , let c = ln ( 1 δ ) , and
F l ( R , S , δ ) = min σ ^ ( S ) n c 3 | R | , σ ^ ( S ) + n | R | 2 c 3 4 c 2 9 + 2 | R | c σ ^ ( S ) n
We have Pr [ σ ( S ) F l ( R , δ ) ] 1 δ .
Proof. 
Denote μ = E [ X g ( S ) ] = σ ( S ) n and μ ^ = 1 n R g R X g ( S ) = σ ^ ( S ) n . Apply (24) with λ = c 3 + c 2 9 + 2 c μ | R | , we have:
Pr j = 1 T X g ( S ) | R | · μ λ δ
Therefore, the following event happens with probability at least 1 δ
j = 1 T Z j ( S ) | R | · μ λ | R | μ ^ | R | μ c 3 c 2 9 + 2 c μ | R |
We consider two following cases:
Case 1: 
If | R | μ ^ | R | μ c 3 0 , then μ μ ^ c 3 | R | .
Case 2: 
If | R | μ ^ | R | μ c 3 > 0 , (40) becomes:
| R | μ ^ | R | μ c 3 2 c 2 9 + 2 c μ | R |
( μ ^ μ ) 2 | R | + 4 c 3 ( μ ^ μ ) 2 c μ ^ 0
Solve the above inequality for μ , we obtain:
μ μ ^ + 1 T 2 c 3 4 c 2 9 + 2 | R | c μ ^
Combine two above cases and replace μ = σ ( S ) n , μ ^ = σ ^ ( S ) n , we obtain the proof. ☐
Lemma 6
(Upper-bound). For any δ ( 0 , 1 ) , in an iterator t of Algorithm 3, denote R 2 t is a set of RR samples with N t = | R t | , S 2 t is a candidate solution of phase 2, and
F u ( R 2 t , S 2 t , δ ) = σ ^ ( S 2 t ) 1 1 1 k t + n N t c 2 + 2 N t c σ ^ ( S 2 t ) 1 1 1 k t n c
We have Pr [ OPT F u ( R 2 t , S 2 t , δ ) ] 1 δ .
Proof. 
Let λ = 2 c μ N t , apply inequality (25), we have:
Pr i = 1 N t X g i ( S ) | R | · μ λ exp λ 2 2 μ | R | δ
Therefore, the following event happens with the probability at least 1 δ :
N t μ ^ N t μ 2 c μ N t N t ( μ ^ μ ) 2 c μ N t
Solve the above quadratic inequality for μ , we obtain upper-bound for μ is,
μ max μ ^ , μ ^ + 1 N t c 2 + 2 N t c μ ^ c
= μ ^ + 1 N t c 2 + 2 N t c μ ^ c
Denote S 0 = arg max S , | S | k σ ^ ( S ) , where σ ^ is calculated over R t 2 . Since the phase of Algorithm 3 selects a candidate solution S 2 t by a greedy strategy. Similar to Theorem 1, we have:
σ ^ ( S 2 t ) 1 ( 1 1 k ) t σ ^ ( S 0 ) 1 ( 1 1 k ) t σ ^ ( S * )
Replace μ = σ ( S 2 t ) n , μ ^ = σ ^ ( S 2 t ) n into (47) and combine it with (48), we have:
OPT = σ ( S * ) σ ^ ( S * ) + n N t c 2 + 2 N t c σ ^ ( S * ) n c
σ ^ ( S 2 t ) 1 ( 1 1 k ) t + n N t c 2 + 2 N t c σ ^ ( S 2 t ) n 1 ( 1 1 k ) t c
which completes the proof. ☐
Based on above theoretical analysis, the following Theorem Approximation guarantee of IGS algorithm.
Theorem 2.
The Algorithm 3 provides a solution S and an integer t, satisfies:
  • Pr [ σ U ( S ) T ] 1 δ
  • Pr [ σ ( S ) 1 ( 1 1 k ) t OPT ] 1 δ
Proof. 
Since S = S 1 S 2 and Lemma 4, we have:
Pr [ σ U ( S ) T ] Pr [ σ U ( S 1 ) T ] 1 δ
We consider two following cases:
Case 1: 
If the algorithm stops with the condition | R 2 t | N m a x , apply (26) with set S * and R 2 , we have:
Pr [ σ ^ ( S * ) ( 1 ϵ 1 ) σ ( S * ) ] exp ϵ 1 2 N m a x OPT 2 n
exp ϵ 1 2 N m a x k 2 n ( Due   to   OPT k )
exp ϵ 1 2 N m a x t 0 2 n ( Due   to   k t 0 )
δ 2 / n t m a x δ 2
From (27), we have:
Pr [ σ ( S 2 t ) σ ^ ( S 2 t ) ( 1 + ϵ 1 ) ] exp ϵ 1 2 N m a x σ ( S 2 t ) ( 2 + 2 3 ϵ 1 ) n
exp ϵ 1 2 N m a x t ( 2 + 2 3 ϵ 1 ) n ( Due   to   σ ( S 2 t ) t )
δ 2 / n t m a x δ 2
Apply an union probability that the events (54) and (57) happen with the probability at most δ 1 + δ 1 = δ / 3 . Assume that they do not happen, we have:
σ ( S 2 t ) σ ^ ( S 2 t ) 1 + ϵ 1 1 ( 1 1 k ) t σ ^ ( S 0 ) 1 + ϵ 1
1 ( 1 1 k ) t σ ^ ( S * ) 1 + ϵ 1
1 ϵ 1 1 + ϵ 1 1 ( 1 1 k ) t σ ( S * )
= 1 ( 1 1 k ) t 2 ϵ 1 1 + ϵ 1 1 ( 1 1 k ) t σ ( S * )
1 ( 1 1 k ) t 2 ϵ 1 1 + ϵ 1 ( 1 1 e ) σ ( S * )
1 ( 1 1 k ) t ϵ σ ( S * )
Hence, in this case the algorithm satisfies approximation guarantee with probability at least 1 δ 3 .
Case 2: 
If the algorithm stops at any iterator i , i = 1 , 2 , , i m a x . At this iterator, the condition in line 19 is satisfied, apply Lemma 5 and Lemma 6, the following thing happens with the probability at least 1 2 i m a x δ 2 = 1 2 δ / 3 :
σ ( S 2 t ) OPT F l ( S 2 t , R 2 , δ 2 ) F u ( S 2 t , R 2 , δ 2 ) 1 ( 1 1 k ) t ϵ
Combine two above cases, the algorithm meets the approximation ratio condition with the probability at least 1 δ / 3 2 δ / 3 = 1 δ . ☐

5. Experiments

In this section, we implement and compare our algorithm IGS to other influence maximization methods about the influence in general, the influence on priority nodes, running time and memory usage. The dataset includes several network databases with thousands or even millions nodes and edges (Table 1).

5.1. Experimental Settings

All the implementations are on Linux machine with configurations are 2× Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz and 4 × 16 GB DIMM ECC DDR4 @ 2400MHz.
Algorithm comparisons. Since IMP is an expansion of IM , we compare IGS algorithm with several state-of-the-art IM algorithms including: DSSA [15], BCT [2], OPIM -C [23]. In addition, we use the basic algorithm, Max degree ( Degree ), which is the common baseline for information diffusion problems. In IMP , there are two factors that impact the solution in practice: the budget (k) of selecting seed node and the priority set of nodes (U). As a result, these two factors also affect the algorithms. From the above observation, we conduct experiments under two settings: varies k and fixed T; varies T and fixed k.
The dataset. For experimental purpose, we choose 5 types of databases from various resources: NetHept, NetPhy, DBLP are citation networks, Email-Enron is communication network [15] and Twitter Retweet is online social networks [42]. The brief of these ones are described on Table 1. These databases are experimented because they are popular in information diffusion problems, especially used in the state-of-the-art algorithms what we are comparing.
Parameter Settings. Graphs are formatted as each edge e = ( u , v ) E has the weight w ( u , v ) formulated as w ( u , v ) = 1 d i n ( v ) where d i n ( v ) is the in-degree of node v [14,15,20].
For the first case, k is assigned with 150, 160, 170, 180, 190 and 200, respectively, while T is fixed at 100. In addition, set U is generated with 200 nodes. With the second case, the value of k is fixed at 500. U set includes about 1000 nodes. We change the value of T increasing from 100 to 500. In all experiments, we keep ϵ = 0.1 , δ = 1 / n according setting for IM algorithms [14,15,20] and α = 0.01 .

5.2. Experimental Results

We install IGS to compare with state-of-the-art algorithms such as BCT , DSSA , OPIM C and Degree then calculate the spread of influence on all nodes and to U, the priority set, U V . Results are shown in following tables and figures.
The Influence. The Figure 2 and the Table 2 indicate IGS outperforms the others when influencing to priority nodes by a given threshold T.
The above figure gives information about the influence values in case k changes from 150 to 200, U includes 200 nodes and the threshold T is 100. The terms “infU”, “inf” mean the influences to set U ( σ U ( S ) ) and to all nodes ( σ ( S ) ), respectively. These algorithms output differently on various databases. Looking at red bars, we can see IGS approximately affects the set U twice the value of the threshold T on most databases except Re-Tweet but still higher than T. Conversely, the influence on U of the remaining sharply fluctuate according to the databases. While DSSA and BCT influence on U over T with netHEPT and ENRON, they work quite low with the others. OPIM C and Degree often affect U much lower than T. Besides, the σ ( S ) of BCT is highest on netHEPT whereas the one of IGS keeps at top in all other cases. In general, the values of σ ( S ) of DSSA , OPIM C and Degree have similarities with each others.
Besides, Table 2 describes the experiment while T comes from 100 to 500, k = 500 and enlarge U up to 1K nodes. This setting is to check the case when U is large and when the threshold T is incremental. Certainly, the condition that k T has to be maintained so we fixed k = 500. Looking at bold values, we can see although U and S both become large and T increments gradually, the influence on U of IGS is always significantly higher than T, even up to more than ten times. DSSA , BCT and OPIM C also give the outputs over threshold T in many cases, they still have values lower than T = 500 on netPHY, DBLP and RETWEET however. The σ U ( S ) of Degree is lowest, especially, is only 22.77 on Re-Tweet.
From Figure 2 and Table 2, we can see σ U ( S ) of IGS is significantly higher than T and produces better results than the state-of-the-art algorithms. This is because IGS always prioritizes affecting U until over the threshold T then affects other nodes as well even with large values of k, U size and T. The other algorithms show that they are not always possible to influence U to exceed the desired threshold. On the whole, the state-of-the-art IM algorithms cannot influence the given priority set as well as IGS can.
Running time.Figure 3 compares running time of these algorithms. They indicate time of IGS gives lowest values on netHEPT, ENRON and netPHY databases. Nevertheless, IGS stays at top 3 on DBLP while it costs highest running time on the remaining of the dataset to find 150 and 160 seed sets but return to top 3 at the other values of budget k. IGS only takes about 0.1 s to find out the seed set in most cases except RETWEET. Besides, the figures also give information about the other algorithms. First, BCT runs significantly slow on netHEPT than the others. This method often stays at top 3 or top 4 on ENRON, DBLP and RETWEET. Second, running time of DSSA and IGS look similiar, while that of OPIM -C and Degree is usually higher than the above two algorithms. As the whole, IGS ’s running time gives the most stable results and usually runs around the 0.1-s mark.
The time of IGS is fast and stable because of parallel programming and this algorithm costs most of time to find out S 1 while the loop to calculate S 2 usually stops at 1–2 rounds. The TRR sampling technique also helps to quickly identify which seeds will affect to the priority U.
Memory Usage. The Table 3 illustrates the memory consumption of IGS and state-of-the-art methods including DSSA , BCT , OPIM C and Degree . The smallest numbers are highlighted in bold while the largest ones are in red. The output shows that IGS outperforms the others, especially on small databases with tens of thousands of nodes and from tens to hundreds of thousands of edges such as netHEPT, ENRON, and netPHY. IGS also consumes sharply less memory than OPIM C and Degree when testing with larger databases such as DBLP and RETWEET. When IGS spends only more than 130 MB and more than 200 MB, OPIM C and Degree spend about four times higher with DBLP and RETWEET, respectively. Besides, DSSA also results less expensive memory usage in all cases. BCT is less stable than IGS and DSSA because it works as DSSA does on ENRON, netPHY, DBLPB and RETWEET but suddenly costs the most memory in NetHEPT.
TRR sampling technique focuses on finding the seeds that influence the priority U first then Algorithm 3 explores another seeds to push on the seed set. Hence the algorithm 3 saves memory to run loop more than the others because of must not check whether a seed node influences to U set or not. Moreover, the condition of F l ( S 2 , R 2 , δ ) F u ( S 2 , R 2 , δ ) 1 ( 1 1 k ) t ϵ helps S 2 generated soon without waiting for the stop condition of the repeat.
Finally, our algorithm, IGS , was designed very well to get a balance between the target to influence on the given priority set and the influence that has to propagate to the largest number of nodes. Hence, running time, memory used and the influence of IGS give significantly high results and even more steadily rather than the others in general.

6. Conclusions

In this paper, we investigate the IMP problem, which is a variant of the IM problem with priority constraint that arises in a realistic scenario in which companies or organizations often prioritize influencing potential users during their viral marketing campaigns. The goal of the IMP problem is to select a seed set with k nodes can influence of a given priority set U greater than a threshold T which adjusts the influence of the seed set to the priority set. Although the objective function (influence spread function) is still a monotone and sub-modular function, but when considering the priority constraint the state-of-the-art IM algorithms cannot be applied.
To address this challenge, we propose two algorithms with provable theoretical guarantees, called IG and IGS . We show that IG provides a 1 ( 1 1 k ) t -approximation solution; IGS is an efficient randomized approximation algorithm based on sampling method that returns a 1 ( 1 1 k ) t ϵ -approximation solution with probability at least 1 δ with ϵ > 0 , δ ( 0 , 1 ) as input parameters of the problem. Experiments on real world social networks show our algorithm outperforms state-of-the-art IM algorithms including DSSA [15], BCT [2] and OPIM [23] in terms of influences, running time, and memory used.
In the future, we are going to improve our algorithm to expand it with large networks to billions scale with acceptable time. In addition, the problem with multiple priority user sets and thresholds is going to be considered.

Author Contributions

Methodology and writing—original draft preparation, C.V.P. and D.K.T.H.; investigation Q.C.V., A.N.S.; Conceptualization, H.X.H.; Data curation, Q.C.V. and A.N.S.; Investigation, C.V.P. and D.K.T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

There is no conflict of interest.

References

  1. Kempe, D.; Kleinberg, J.M.; Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 137–146. [Google Scholar] [CrossRef] [Green Version]
  2. Nguyen, H.T.; Thai, M.T.; Dinh, T.N. A Billion-Scale Approximation Algorithm for Maximizing Benefit in Viral Marketing. IEEE/ACM Trans. Netw. 2017, 25, 2419–2429. [Google Scholar] [CrossRef]
  3. Li, Y.; Zhang, D.; Tan, K. Real-time Targeted Influence Maximization for Online Advertisements. PVLDB 2015, 8, 1070–1081. [Google Scholar] [CrossRef] [Green Version]
  4. Pham, C.V.; Thai, M.T.; Duong, H.V.; Bui, B.Q.; Hoang, H.X. Maximizing misinformation restriction within time and budget constraints. J. Comb. Optim. 2018, 35, 1202–1240. [Google Scholar] [CrossRef]
  5. Tong, G.A.; Wu, W.; Guo, L.; Li, D.; Liu, C.; Liu, B.; Du, D. An efficient randomized algorithm for rumor blocking in online social networks. In Proceedings of the 2017 IEEE Conference on Computer Communications, INFOCOM 2017, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
  6. Budak, C.; Agrawal, D.; El Abbadi, A. Limiting the spread of misinformation in social networks. In Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India, 28 March–1 April 2011; pp. 665–674. [Google Scholar] [CrossRef]
  7. Nguyen, H.T.; Cano, A.; Tam, V.; Dinh, T.N. Blocking Self-avoiding Walks Stops Cyber-epidemics: A Scalable GPU-based Approach. IEEE Trans. Knowl. Data Eng. 2020, 32, 1263–1275. [Google Scholar] [CrossRef] [Green Version]
  8. Nguyen, N.P.; Yan, G.; Thai, M.T. Analysis of misinformation containment in online social networks. Comput. Netw. 2013, 57, 2133–2146. [Google Scholar] [CrossRef]
  9. Zhang, H.; Alim, M.A.; Li, X.; Thai, M.T.; Nguyen, H.T. Misinformation in Online Social Networks: Detect Them All with a Limited Budget. ACM Trans. Inf. Syst. 2016, 34, 18:1–18:24. [Google Scholar] [CrossRef]
  10. Zhang, H.; Kuhnle, A.; Zhang, H.; Thai, M.T. Detecting misinformation in online social networks before it is too late. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2016, San Francisco, CA, USA, 18–21 August 2016; pp. 541–548. [Google Scholar] [CrossRef]
  11. Ye, M.; Liu, X.; Lee, W. Exploring social influence for recommendation: A generative model approach. In Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR ’12, Portland, OR, USA, 12–16 August 2012; pp. 671–680. [Google Scholar] [CrossRef]
  12. Chen, W.; Collins, A.; Cummings, R.; Ke, T.; Liu, Z.; Rincón, D.; Sun, X.; Wang, Y.; Wei, W.; Yuan, Y. Influence Maximization in Social Networks When Negative Opinions May Emerge and Propagate. In Proceedings of the Eleventh SIAM International Conference on Data Mining, SDM 2011, Mesa, AZ, USA, 28–30 April 2011; pp. 379–390. [Google Scholar] [CrossRef] [Green Version]
  13. Borodin, A.; Filmus, Y.; Oren, J. Threshold Models for Competitive Influence in Social Networks. In Proceedings of the Internet and Network Economics—6th International Workshop, WINE 2010, Stanford, CA, USA, 13–17 December 2010; pp. 539–550. [Google Scholar] [CrossRef]
  14. Tang, Y.; Shi, Y.; Xiao, X. Influence Maximization in Near-Linear Time: A Martingale Approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, 31 May–4 June 2015; pp. 1539–1554. [Google Scholar] [CrossRef]
  15. Nguyen, H.T.; Thai, M.T.; Dinh, T.N. Stop-and-Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–1 July 2016; pp. 695–710. [Google Scholar] [CrossRef] [Green Version]
  16. Chen, W.; Yuan, Y.; Zhang, L. Scalable Influence Maximization in Social Networks under the Linear Threshold Model. In Proceedings of the ICDM 2010, The 10th IEEE International Conference on Data Mining, Sydney, Australia, 14–17 December 2010; pp. 88–97. [Google Scholar] [CrossRef]
  17. Chen, S.; Fan, J.; Li, G.; Feng, J.; Tan, K.; Tang, J. Online Topic-Aware Influence Maximization. PVLDB 2015, 8, 666–677. [Google Scholar] [CrossRef]
  18. Aslay, Ç.; Barbieri, N.; Bonchi, F.; Baeza-Yates, R.A. Online Topic-aware Influence Maximization Queries. In Proceedings of the 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, 24–28 March 2014; pp. 295–306. [Google Scholar] [CrossRef]
  19. Pham, C.V.; Duong, H.V.; Hoang, H.X.; Thai, M.T. Competitive Influence Maximization within Time and Budget Constraints in Online Social Networks: An Algorithmic Approach. Appl. Sci. 2019, 9, 2274. [Google Scholar] [CrossRef] [Green Version]
  20. Tang, Y.; Xiao, X.; Shi, Y. Influence maximization: Near-optimal time complexity meets practical efficiency. In Proceedings of the International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014; pp. 75–86. [Google Scholar] [CrossRef]
  21. Domingos, P.M.; Richardson, M. Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, CA, USA, 26–29 August 2001; pp. 57–66. [Google Scholar]
  22. Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.M.; Glance, N.S. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 420–429. [Google Scholar] [CrossRef] [Green Version]
  23. Tang, J.; Tang, X.; Xiao, X.; Yuan, J. Online Processing Algorithms for Influence Maximization. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD’ 18), Houston, TX, USA, 10–15 June 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 991–1005. [Google Scholar] [CrossRef]
  24. Nguyen, H.; Zheng, R. On Budgeted Influence Maximization in Social Networks. IEEE J. Sel. Areas Commun. 2013, 31, 1084–1094. [Google Scholar] [CrossRef] [Green Version]
  25. Pham, C.V.; Duong, H.V.; Thai, M.T. Importance Sample-Based Approximation Algorithm for Cost-Aware Targeted Viral Marketing. In Proceedings of the Computational Data and Social Networks—8th International Conference, CSoNet 2019, Ho Chi Minh City, Vietnam, 18–20 November 2019; pp. 120–132. [Google Scholar] [CrossRef]
  26. Li, X.; Smith, J.D.; Dinh, T.N.; Thai, M.T. TipTop: (Almost) Exact Solutions for Influence Maximization in Billion-Scale Networks. IEEE/ACM Trans. Netw. 2019, 27, 649–661. [Google Scholar] [CrossRef]
  27. Barbieri, N.; Bonchi, F.; Manco, G. Topic-aware social influence propagation models. Knowl. Inf. Syst. 2013, 37, 555–584. [Google Scholar] [CrossRef]
  28. Li, G.; Chen, S.; Feng, J.; Tan, K.-L.; Li, W.-S. Efficient Location-Aware Influence Maximization. In Proceedings of the 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, 16–19 April 2018; pp. 1569–1572. [Google Scholar] [CrossRef]
  29. Wang, X.; Zhang, Y.; Zhang, W.; Lin, X. Efficient Distance-Aware Influence Maximization in Geo-Social Networks. IEEE Trans. Knowl. Data Eng. 2017, 29, 599–612. [Google Scholar] [CrossRef]
  30. Bharathi, S.; Kempe, D.; Salek, M. Competitive Influence Maximization in Social Networks. In Proceedings of the Internet and Network Economics, Third International Workshop, WINE 2007, San Diego, CA, USA, 12–14 December 2007; pp. 306–311. [Google Scholar] [CrossRef]
  31. Liu, W.; Yue, K.; Wu, H.; Li, J.; Liu, D.; Tang, D. Containment of competitive influence spread in social networks. Knowl.-Based Syst. 2016, 109, 266–275. [Google Scholar] [CrossRef]
  32. He, X.; Song, G.; Chen, W.; Jiang, Q. Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model. In Proceedings of the Twelfth SIAM International Conference on Data Mining, Anaheim, CA, USA, 26–28 April 2012; pp. 463–474. [Google Scholar] [CrossRef]
  33. Lu, W.; Bonchi, F.; Goyal, A.; Lakshmanan, L.V.S. The bang for the buck: Fair competitive viral marketing from the host perspective. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, 11–14 August 2013; pp. 928–936. [Google Scholar] [CrossRef]
  34. Chen, W.; Lakshmanan, L.V.S.; Castillo, C. Information and Influence Propagation in Social Networks; Synthesis Lectures on Data Management; Morgan & Claypool Publishers: San Rafael, CA, USA, 2013. [Google Scholar] [CrossRef]
  35. Bozorgi, A.; Samet, S.; Kwisthout, J.; Wareham, T. Community-based influence maximization in social networks under a competitive linear threshold model. Knowl.-Based Syst. 2017, 134, 149–158. [Google Scholar] [CrossRef]
  36. Tsang, A.; Wilder, B.; Rice, E.; Tambe, M.; Zick, Y. Group-Fairness in Influence Maximization. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019; pp. 5997–6005. [Google Scholar] [CrossRef] [Green Version]
  37. Farnadi, G.; Babaki, B.; Gendreau, M. A Unifying Framework for Fairness-Aware Influence Maximization. In Proceedings of the Companion of The 2020 Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 714–722. [Google Scholar] [CrossRef]
  38. Stoica, A.; Han, J.X.; Chaintreau, A. Seeding Network Influence in Biased Networks and the Benefits of Diversity. In Proceedings of the WWW ’20: The Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2089–2098. [Google Scholar] [CrossRef]
  39. Nguyen, L.N.; Zhou, K.; Thai, M.T. Influence Maximization at Community Level: A New Challenge with Non-submodularity. In Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019, Dallas, TX, USA, 7–10 July 2019; pp. 327–337. [Google Scholar] [CrossRef]
  40. Borgs, C.; Brautbar, M.; Chayes, J.T.; Lucier, B. Maximizing Social Influence in Nearly Optimal Time. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, OR, USA, 5–7 January 2014; pp. 946–957. [Google Scholar] [CrossRef] [Green Version]
  41. Chung, F.R.K.; Lu, L. Survey: Concentration Inequalities and Martingale Inequalities: A Survey. Internet Math. 2006, 3, 79–127. [Google Scholar] [CrossRef] [Green Version]
  42. Rossi, R.A.; Ahmed, N.K. The Network Data Repository with Interactive Graph Analytics and Visualization; AAAI: Palo Alto, CA, USA, 2015. [Google Scholar]
Figure 1. A toy example shows the difference between the influence maximization and our proposed problem.
Figure 1. A toy example shows the difference between the influence maximization and our proposed problem.
Algorithms 13 00183 g001
Figure 2. Comparisons of Influence Spreading with k = 100 500 , T = 100 and U size = 200
Figure 2. Comparisons of Influence Spreading with k = 100 500 , T = 100 and U size = 200
Algorithms 13 00183 g002
Figure 3. Comparisons about Runtime (s) with k varies from 150 to 200 between IGS and the others.
Figure 3. Comparisons about Runtime (s) with k varies from 150 to 200 between IGS and the others.
Algorithms 13 00183 g003
Table 1. Dataset’s statistics.
Table 1. Dataset’s statistics.
Database#Nodes#EdgesTypesAvg. Degree
netHEPT [15]15 K59 Kdirected4.1
ENRON [15]37 K184 Kdirected5
netPHY [15]37 K181 Kdirected13.4
DBLP [15]655 K2 Mdirected6.1
TWITTER RETWEET [42]1 M2 Mdirected4
Table 2. Comparisons about σ ( S ) and σ U ( S ) between IGS and the others with k = 500, U size = 1 K and T = 100 500 .
Table 2. Comparisons about σ ( S ) and σ U ( S ) between IGS and the others with k = 500, U size = 1 K and T = 100 500 .
Dataset
T NetHeptEnronnetPHYDBLPRETWEET
IGS 100 σ ( S ) 5666.1614,267.401865.9254,033.5017,307.70
σ U ( S ) 1482.041075.771192.841271.62511.08
200 σ ( S ) 5581.3414,162.201805.2653,553.9018,581.50
σ U ( S ) 1478.931079.741175.321267.52491.35
300 σ ( S ) 5645.4014,284.801773.3353,240.5019,459.10
σ U ( S ) 1476.081074.301153.321264.79492.39
400 σ ( S ) 5640.2114,196.501688.5352,918.8018,832.20
σ ( S ) 1468.481075.681125.691260.31490.46
500 σ ( S ) 5039.4514,245.501593.6652,130.90228,801.00
σ U ( S ) 1238.541079.281104.201252.70994.40
DSSA σ ( S ) 4098.639960.353230.2758,197.738,253.7
σ U ( S ) 1093.7857.608174.479474.635168.087
BCT σ ( S ) 11,088.1019,901.706675.95117,197.0077,316.90
σ U ( S ) 1280.541701.60386.49474.635159.77
OPIM-C σ ( S ) 3779.0919,326.36262.5112,33472,026.1
σ U ( S ) 600.93894.18194.04459.801173.41
Degree σ ( S ) 3824.4419,349.106345.86114,24973,936
σ U ( S ) 292.82779.84164260.9422.77
Table 3. Memory usage (MB) comparisons between IGS and the others.
Table 3. Memory usage (MB) comparisons between IGS and the others.
DatasetAlgorithmBudget k
150160170180190200
NetHEPTIGS9.909.909.909.899.899.95
DSSA22.8422.8422.8422.8422.8422.84
BCT1023.791017.521021.601012.211020.181020.74
OPIM-C47.7647.9148.0348.1148.3048.46
Degree49.1449.1849.4849.6849.8650.13
ENRONIGS16.8216.7916.8116.8116.8216.82
DSSA30.4828.0728.0728.0728.0730.48
BCT30.3530.3530.3930.3930.3930.39
OPIM-C27.1627.2042.0027.2227.2527.30
Degree27.9828.0843.7728.1928.2728.41
NetPHYIGS15.1815.1815.1815.1815.1815.04
DSSA52.1252.1252.1252.1238.5052.14
BCT34.8234.8234.8234.8234.8234.80
OPIM-C87.8888.3988.9289.3190.2690.51
Degree92.2692.7193.3393.8894.6894.98
DBLPIGS138.66138.66138.66138.66138.66138.66
DSSA152.90152.87152.87152.91152.91152.83
BCT162.88162.87162.87162.88162.88162.89
OPIM-C475.05373.72373.78373.95477.18477.51
Degree500.87395.00394.26395.35504.52505.26
RETWEETIGS214.67214.67214.67214.67214.67214.67
DSSA253.14253.14253.14253.14253.14253.14
BCT282.50282.50282.50282.47282.50282.48
OPIM-C877.31874.20722.91876.99886.78877.80
Degree918.53916.23756.93920.00930.33921.95

Share and Cite

MDPI and ACS Style

Pham, C.V.; Ha, D.K.T.; Vu, Q.C.; Su, A.N.; Hoang, H.X. Influence Maximization with Priority in Online Social Networks. Algorithms 2020, 13, 183. https://doi.org/10.3390/a13080183

AMA Style

Pham CV, Ha DKT, Vu QC, Su AN, Hoang HX. Influence Maximization with Priority in Online Social Networks. Algorithms. 2020; 13(8):183. https://doi.org/10.3390/a13080183

Chicago/Turabian Style

Pham, Canh V., Dung K. T. Ha, Quang C. Vu, Anh N. Su, and Huan X. Hoang. 2020. "Influence Maximization with Priority in Online Social Networks" Algorithms 13, no. 8: 183. https://doi.org/10.3390/a13080183

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop