1 Introduction

The problem of maximizing a concave, strictly increasingFootnote 1, and differentiable utility function \(f: {\mathbb {R}} \rightarrow {\mathbb {R}}\) over a discrete set plays an important role in many applications. Examples can be found in, among others, investment problems with discrete choices such as infrastructure projects, venture capital, and private equity deals (Ahmed and Atamtürk [2]), project selection (Klastorin [21], Mehrez and Sinuany-Stern [27], Weingartner [36]), competitive facility location (Berman and Krass [4], Aboolian et al. [1], Ljubić and Moreno [25]), and combinatorial auctions (Feige et al. [11], Lehmann et al. [23]). In such utility-maximization problems, the function f typically models the decision maker’s risk-averse attitude. When some of the coefficients of f are subject to uncertainty, the tools of stochastic optimization are used to maximize it in expectation over a typically discrete set of scenarios.

Together with his coauthors, Shabbir Ahmed made substantial contributions in advancing the state-of-the-art of mathematical programming methods for solving this challenging set of problems (Ahmed and Atamtürk [2], Yu and Ahmed [38, 39]). In this article, we study optimization problems in which the expected value of the function f composed with a set union operator is maximized over a discrete feasible region. The expected value of the objective function is computed w.r.t. a set of coefficients taking values from a discrete set of scenarios. In this way, we generalize the family of problems addressed in the three aforementioned articles.

1.1 The problem introduced in Ahmed and Atamtürk [2]

Let N be a ground set of n items and M a finite set of m scenarios, each occurring with a strictly positive probability \(\pi _i\), \(i \in M\), and let \(a_{ij} \in {\mathbb {R}}_+\), \(i \in M, j \in N\), and \(d_i \in {\mathbb {R}}_+\), \(i \in M\), be nonnegative real numbers. Letting the variable \(x_j \in \{0,1\}^{n}\) be equal to 1 if and only if item \(j \in N\) is chosen, the problem tackled in Ahmed and Atamtürk [2] can be cast as the following Mixed Integer Nonlinear Program (MINLP):

$$\begin{aligned} \max _{x \in \{0,1\}^n} \left\{ \, \sum _{i \in M} \pi _i \, f \Big (\sum _{j \in N} a_{ij} \, x_j + d_i\Big ): \, x \in X \right\} . \end{aligned}$$
(1)

Here, X is a polyhedron encapsulating the constraints of the problem and \(X \cap \{0,1\}^n\) is the discrete set of feasible solutions. From a stochastic-optimization perspective, Problem (1) asks for maximizing

$$\begin{aligned} \mathbb {E}_\xi \left[ f\left( \sum _{j \in N} a_{j}(\xi ) \, x_j + d(\xi ) \right) \right] \end{aligned}$$

subject to \(x \in X \cap \{0,1\}^n\), where \(\xi \) is a random variable and the discrete realizations of \(a_j(\xi )\), \(j \in N\), and \(d(\xi )\) are, respectively, \(a_{1j}, \dots , a_{mj}\) and \(d_1, \dots , d_m\), and occur with strictly positive probabilities \(\pi _1, \dots , \pi _m\).

The objective function of Problem (1) can be interpreted as a linear combination of m set functions \(h_i: 2^{N} \rightarrow {\mathbb {R}}_+\) with weights \(\pi _i\), \(i \in M\). For each \(i \in M\), \(h_i\) is defined as the composition of f with an affine set function \(q_i: 2^N \rightarrow {\mathbb {R}}\) that maps subsets of N into the reals. Formally:

$$\begin{aligned} h_i(S) := f \bigg (\underbrace{\sum _{j \in S} a_{ij} + d_i}_{q_i(S)}\bigg ), \quad i \in M, S \subseteq N. \end{aligned}$$

It is not difficult to see that the function \(h_i\), \(i \in M\), is submodular, see Ahmed and Atamtürk [2], Yu and Ahmed [39], and so is the whole objective function of the problem. Thanks to this, Problem (1) can be reformulated as a Mixed Integer Linear Program (MILP) by exploiting a reformulation of Nemhauser et al. [29] featuring an exponential number of linear inequalities, one per subset S of the ground set N. Ahmed and Atamtürk [2] showed how to tighten such inequalities by a sequence-independent sequential lifting procedure. In a follow-up article, Yu and Ahmed [39] further improved the strength of such inequalities for the case where the set of discrete choices X is subject to a single knapsack constraint. Further results on the nature of the exact lifting function adopted in Ahmed and Atamtürk [2] are reported in Shi et al. [33].

The specific function f considered in Ahmed and Atamtürk [2] and Yu and Ahmed [39] is the negative exponential function

figure a

which is frequently used to model the behavior of risk-averse decision makers. Here, \(\lambda > 0\) represents the risk-tolerance parameter and, the larger the value of \(\lambda \) and, thus, the closer f is to being linear, the less risk-averse the decision maker is. An illustration of such a function for different values of \(\lambda \) is provided in Fig. 1.

Fig. 1
figure 1

The utility function \(f(z)=1 - \exp (-\frac{z}{\lambda })\) graphed for different values of \(\lambda \).

1.2 A generalization of Problem (1)

In this work, we study a generalization of Problem (1) in which, besides the set of items N, we are given an additional ground set \({\hat{N}}\) of \({\hat{n}}\) metaitems. We assume that the two ground sets N and \({\hat{N}}\) are linked by a covering relationship modeled by a bipartite graph \(G = ({\hat{N}} \cup N, E)\) where, for each \(j \in N\) and \(\ell \in {\hat{N}}\), \(\{j,\ell \} \in E\) if and only if item j is covered by metaitem \(\ell \). For each item \(j \in N\), we define by \({\hat{N}}(j) \subseteq {\hat{N}}\) the set of metaitems of \({\hat{N}}\) by which j is covered and, for each \(\ell \in {\hat{N}}\), we define by \(N(\ell ) \subseteq N\) the set of items that are covered by \(\ell \).

The key feature of this generalized problem is that, in order to access an item j of N and benefit from its profit, the decision maker has to select at least one of the metaitems \(\ell \in {\hat{N}}\) covering it. An illustration of an instance of the problem is reported in Fig. 2.

Fig. 2
figure 2

An instance with \(n = 19\) items (denoted by circles) and \({\hat{n}} = 8\) metaitems (denoted by squares). In the instance, the covering relationship between a metaitem \(\ell \in {\hat{N}}\) and an item \(j \in N\) (corresponding to the edge \(\{\ell ,j\}\)) is induced by the “covering radius” represented by the dotted circles. This way, the sets \({N}(\ell )\), \(\ell \in {\hat{N}}\), are determined by the discs centered on the metaitems; for, e.g., \(\ell =1\), we have \(N(1)=\{5,6,17\}\).

After introducing a binary variable \(y_\ell \in \{0,1\}\) for each \(\ell \in {\hat{N}}\), equal to 1 if and only if metaitem \(\ell \) is chosen, the problem can be cast as the following MINLP:

$$\begin{aligned} \max _{\begin{array}{c} x \in \{0,1\}^{n}\\ y \in Y \cap \{0,1\}^{{\hat{n}}} \end{array}}&\quad \sum _{i \in M} \pi _i \, f \left( \sum _{j \in N} a_{ij} \, x_j + d_i \right) \end{aligned}$$
(2a)
$$\begin{aligned} \text {s.t.}&\quad x_j \le \sum _{\ell \in {\hat{N}}(j)} y_\ell , \qquad \qquad \qquad \qquad \,\,\,j \in N\end{aligned}$$
(2b)
$$\begin{aligned}&\quad y_{\ell } \le x_j, \qquad \qquad \qquad \qquad \qquad \qquad \ell \in {\hat{N}}, j \in N(\ell ), \end{aligned}$$
(2c)

where Y is a polyhedron encapsulating the constraints of the problem and \(Y \cap \{0,1\}^{{\hat{n}}}\) is the discrete set of feasible choices of metaitems. The parameters \(\pi \), a, and d have the same meaning as in Problem (1), i.e., M contains the discrete-scenario realizations of the uncertain parameters a and d, and the objective function corresponds to the expected value of f over such realizations. For each \(j \in N\), Constraints (2b) and (2c) imply that an item \(j \in N\) is taken if and only if at least a metaitem \(\ell \) in \({\hat{N}}\) is chosen from \({\hat{N}}(j) \subseteq {\hat{N}}\). Since the objective function of the problem is strictly increasing, Constraints (2c) are automatically satisfied in any optimal solution and can be dropped. Besides being binary, no further constraints are assumed for x.

To see that Problem (2) generalizes Problem (1), it suffices to observe that the latter is obtained from the former by defining \({\hat{N}}\) as a copy of N and defining the edge set E as a matching; this way, \(x = y\) is imposed and the constraints \(x \in X\) are translated into \(y \in Y\).

We define the set-union operator expressing the covering relationship between \({\hat{N}}\) and N as

$$\begin{aligned} N({\hat{S}}) := \bigcup _{\ell \in {\hat{S}}} N(\ell ), \qquad {\hat{S}} \subseteq {\hat{N}}. \end{aligned}$$
(3)

With it, we can rewrite the objective function of Problem (2) as a linear combination with weights \(\pi _i\), \(i \in M\), of m set functions \({\hat{h}}_i: 2^{{\hat{N}}} \rightarrow \mathbb {R}_{+}\), each of which is defined as the composition of f with a set function \({\hat{q}}_i: 2^{{\hat{N}}} \rightarrow \mathbb {R}_{+}\) that maps the choice of metaitems \({\hat{S}}\) into the value of the items that \({\hat{S}}\) covers. For each \(i \in M\), \({\hat{h}}_i\) is defined as

$$\begin{aligned} {\hat{h}}_i({\hat{S}}) := f \Big ( \underbrace{ \sum _{j \in N({\hat{S}})} a_{ij} + d_i }_{{\hat{q}}_i({\hat{S}})} \Big ) \end{aligned}$$

and \({\hat{q}}_i\) is defined as

$$\begin{aligned} {\hat{q}}_i({\hat{S}}) := \sum _{j \in N({\hat{S}})} a_{ij} + d_i, \qquad i \in M, {\hat{S}} \subseteq {\hat{N}} . \end{aligned}$$
(4)

The function \({\hat{q}}_i\) frequently occurs in problems featuring a set covering component and it has been heavily studied in the combinatorial optimization literature—see, e.g., Schrijver [32], pag. 768.

Due to \(d_i\) and \(a_{ij}\) being nonnegative for all \(i \in M, j \in N\), \({\hat{q}}_i\) is submodular. Thanks to f being strictly increasing and concave, it is not difficult to see that its composition with \({\hat{q}}_i\) (i.e., \({\hat{h}}_i\)) is submodular as well.

Problem (2) is \({\mathcal{N}\mathcal{P}}\)-hard even in the simple setting where Y is defined by a single cardinality constraint and f is the trivial identity function \(f(z) := z\). This is because, letting \(m = 1\), \(a_{1j} = 1\) for all \(j \in N\), \(d_1 = 0\), and \(Y = \{{\hat{S}} \subseteq {\hat{N}}: |{\hat{S}}| \le k\}\), Problem (2) admits a solution of value n if and only if there is a feasible solution to the \({\mathcal{N}\mathcal{P}}\)-complete problem of covering N with at most k subsets from \(\{N (\ell )\}_{\ell \in {\hat{N}}}\).

1.3 Applications of Problem (2)

Problem (2) arises in different applications. An important one is found in social-network analysis when looking for “key players” or “influencers” to quickly distribute a piece of information through the network (Güney et al. [16], Wu and Küçükyavuz [37], Kempe et al. [20], Karimi et al. [19]). Both of the widely adopted independent cascade (IC) and linear threshold (LT) propagation models (based on probabilistic networks and random sampling) described in Kempe et al. [20] fit into the framework of Problem (2). In IC, the influence propagates on the graph according to a specified edge-propagation probability. In LT, each node of the graph is influenced only if the fraction of the node’s neighbors that has become active (weighted by the corresponding influence weight) exceeds a node-specific stochastic threshold. Given a graph \({\tilde{G}} = ({\tilde{V}}, {\tilde{E}})\), m so-called live-edge graphs \({\tilde{G}}_i=({\tilde{V}}, {\tilde{E}}_i)\), \(i \in M := \{1,\dots ,m\}\), are created by sampling the edges. For each \(i \in M\), \(\pi _i\) is defined as probability of sampling graph i. The sampling process and, accordingly, the live-edge graphs one obtains, depend on the propagation mechanism in use.Footnote 2 A bipartite graph \(G_i = (N \cup {\hat{N}}, E_i)\) is then constructed for each \(i \in M\). In it, the set of items N is equal to \({\tilde{V}}\) (therefore, N contains all the users of the social network) and the set of metaitems \({\hat{N}}\) is equal to the set of influencers in \({\tilde{V}}\). For each user-influencer pair \(j, \ell \), \(E_i\) contains the edge \(\{j, \ell \}\) if and only if there is a path between \(\ell \) and j in \({\tilde{G}}_i\). Thus, for each \(i \in M\) and \(\ell \in {\hat{N}}\), \(N_i(\ell ) \subseteq N\) is the subset of users that can be reached by influencer \(\ell \) in the live-edge graph \({\tilde{G}}_i\) due to belonging to the same connected component as \(\ell \) (see also Wu and Küçükyavuz [37] for further details). The goal is then to choose a subset of influencers so as to maximize the expected number of users that can be reached. Modeling such a problem as an instance of Problem (2) requires a slight modification of the set-union operator as well as of the function \({\hat{q}}_i\) defined in (3)–(4) to make them scenario-dependent:

$$\begin{aligned}&N_i({\hat{S}}) := \bigcup \limits _{\ell \in {\hat{S}}} N_i(\ell ),&\quad i \in M, {\hat{S}} \subseteq {\hat{N}}, \\&\qquad \quad {\hat{q}}_i({\hat{S}}) := \sum \limits _{j \in N_i({\hat{S}})} a_{ij} + d_i,&\quad i \in M, {\hat{S}} \subseteq {\hat{N}}. \end{aligned}$$

For each \(i \in M, j \in N\), \(a_{ij}\) is set to 1. Without loss of generality, all the results presented in this paper can be applied to such a scenario-dependent case as well. To simplify the presentation, we will restrict ourselves to the original definitions given in (3)–(4). We remark that, while in the standard applications of influence maximization referenced above the function f is the identity function, the methods we propose in this paper allow for taking a more general risk-averse utility function into account to measure the total influence in the network.

A second application arises in marketing problems. Let M be the set of products that a marketer can advertise to a set N of potential customers, and let \({\hat{N}}\) be a set of marketing campaigns, where each campaign \(\ell \in {\hat{N}}\) can only reach a subset \(N(\ell )\) of customers. In such a case, the value of \(a_{ij}\), \(i \in M, j \in N\), corresponds to customer j’s demand of product i. Letting \(\pi _i\) be a weight measuring the relevance of a product \(i \in M\), the utility function f expresses the decreasing marginal utility of serving an additional unit of customer demand.

A third application arises in a class of stochastic competitive facility location problems with uncertain customer demands. In the deterministic competitive facility location problem (see Ljubić and Moreno [25], Berman and Krass [4], Aboolian et al. [1] and further references therein), a subset of facilities has to be opened subject to a budget constraint in such a way that the customer demand that is satisfied is maximized. In this application, the set \({\hat{N}}\) corresponds to a set of potential facility locations and the customers are represented by the set N. When adopting, e.g., the multinomial logit model, one can show that the market share function satisfies the properties of function f. In the stochastic setting, the expected market share is then calculated over the set M of possible scenario realizations, each with probability \(\pi _i\), \(i \in M\), where \(a_{ij}\) corresponds to the demand of customer \(j \in N\) under scenario \(i \in M\).

1.4 Contribution and outline of the paper

Throughout the paper, we focus on the case with \(n \gg {\hat{n}}\), i.e., with a number of items orders of magnitude larger than the number of metaitems, which holds in many applications, including those we introduced before. The main contribution of the paper is an exact method for solving Problem (2) based on a double-hypograph decomposition by which the many variables associated with the item set N are projected out. Such a decomposition exploits the structural properties of the utility function f and of the set-union operator. In it, f is linearized via an outer-approximation method, whereas the set-union operator is linearized in two ways: either (i) via a reformulation based on submodular cuts or (ii) via a Benders decomposition. In particular, we show that, assuming f can be computed in constant time, the inequalities arising from the outer approximation combined with the Benders decomposition can be separated in linear time even for fractional points. We compare the strength of the two resulting mixed integer linear programming reformulations from a theoretical perspective, and also show how to extend them to the case where the utility function is not necessarily increasing. Lastly, we embed the reformulations in two branch-and-cut algorithms, the most efficient of which, according to our computational experiments, allows for solving to optimality instances with up to \(n=20\),000, \({\hat{n}}=60\), and \(m=100\) in a short amount of computing time.

The paper is organized as follows. After recalling some background notions on submodularity in Sect. 2, we introduce our decomposition approach and our two reformulations in Sect. 3. The strength of the inequalities of these two reformulations is compared in Sect. 4. Section 5 presents an extension of our method to the case where the utility function is not necessarily increasing. In Sect. 6, we demonstrate via computational experiments the advantages offered by the algorithms we propose. Concluding remarks are reported in Sect. 7. In the Appendix, we also investigate the construction of worst-case instances for the case where Problem (2) is approximately solved via the classical greedy algorithm proposed in Nemhauser et al. [29] for maximizing an increasing submodular function subject to a k-cardinality constraint.

2 Preliminaries

In this section, we briefly summarize the basic terminology and notions used in optimization problems involving submodular functions as well as the key result of Nemhauser et al. [29] that leads to a direct MILP reformulation of Problem (2).

Let \(h: 2^{N} \rightarrow {\mathbb {R}}\) be a generic set function defined for some ground set N. The associated set function \(\varrho ^h_j(S) := h(S \cup \{j\}) - h(S)\), \(j \in N, S \subseteq N\), is called the marginal contribution of j with respect to S. We call the function h increasing if \(\varrho ^h_j(S) \ge 0\) for all \(j \in N, S \subseteq N\) (in the literature, such a function is sometimes called nondecreasing), and submodular if \(\varrho ^h_j(S) \ge \varrho ^h_j(T)\) for all \(S,T \subseteq N\) with \(S \subseteq T\) and \(j \in N\). Throughout the paper, we report the name of the submodular function to which a marginal contribution is referred as a superscript (as in \(\varrho ^h_j\)).

Every submodular set function h enjoys the following properties:

Proposition 1

(Nemhauser et al. [29]) If h is submodular, then:

$$\begin{aligned}&\quad h(T) \le h(S) - \sum \limits _{j \in S \setminus T} \varrho ^h_j \big (N \setminus \{j\} \big ) + \sum \limits _{j \in T \setminus S} \varrho ^h_j(S), \quad \quad S, T \subseteq N,\\&h(T) \le h(S) - \sum \limits _{j \in S \setminus T} \varrho ^h_j \big (S \setminus \{j\} \big ) + \sum \limits _{j \in T \setminus S} \varrho ^h_j(\emptyset ), \quad \quad \, \, S, T \subseteq N. \end{aligned}$$

By applying Proposition 1 to each function \({\hat{h}}_i\), \(i \in M\), featured in Problem (2), we straightforwardly obtain the following MILP reformulation, which we refer to as the Reformulation based on Submodular Cuts, or Ref(SC):

figure b

For the correctness of the reformulation, we refer the reader to Nemhauser et al. [29].

An alternative correct reformulation is obtained by replacing Constraints (SC\(_1^{{\hat{h}}}\)) by

figure c

Such constraints are dominated by Constraints (SC\(_1^{{\hat{h}}}\)) if there is an item \(j \in N\) which is covered by a single metaitem \(\ell \in {\hat{N}}\) (i.e., such that \(|{\hat{N}}(j)| = 1\)) and \(a_{ij} > 0\), but as strong otherwise. This is because, by definition of \({\hat{h}}_i\), \(\varrho ^{{\hat{h}}_i}_\ell ({\hat{N}} \setminus \{\ell \}) = 0\) unless there is an item \(j \in N\) such that \({\hat{N}}(j) = \{ \ell \}\) and \(a_{ij} > 0\). Constraints (SC\(_3^{{\hat{h}}}\)) will be useful in the analysis carried out in Sect. 4.

The main advantage of Ref(SC) is that it is linear and it solely works in the y space. Similar reformulations have been extensively used in many works involving submodular functions. In spite of this, they are known to be rather weak ((Ahmed and Atamtürk [2], Ljubić and Moreno [25], Nemhauser et al. [28]). This is confirmed by the computational experiments we report in Sect. 6.

3 Decomposition methods and MILP reformulations for Problem (2)

In this section, we propose a decomposition approach for Problem (2) that relies on the structural properties of f as well as on those of the set-union operator underlying \({\hat{q}}_i\), \(i \in M\). We then combine this decomposition with an outer approximation of f (see Duran and Grossmann [9], Fletcher and Leyffer [13] for general references on outer approximations) and a linear approximation of the hypograph of \({\hat{q}}_i\), \(i \in M\), based on either submodular cuts or Benders cuts. This results in two MILP reformulations that only work in the y space. As we will see, both reformulations are solved much more efficiently by a branch-and-cut method than Ref(SC).

3.1 Double-hypograph decomposition

The key idea of our decomposition approach is reformulating Problem (2) in such a way that the function \({\hat{h}}_i\), \(i \in M\), is decomposed into its two parts f and \({\hat{q}}_i\).

Letting y be the characteristic vector of \({\hat{S}} \subseteq {{\hat{N}}}\), we rewrite the set function \({\hat{q}}_i({\hat{S}})\), \(i \in M\), as the following function \(\hat{{\mathcal {Q}}}_i(y) : \{0,1\}^{{\hat{n}}} \rightarrow \mathbb {Z}_+\):

figure d

\(\hat{{\mathcal {Q}}}_i(y)\) can be interpreted as the value function of the subproblem of computing \({\hat{q}}_i({\hat{S}})\) for a subset of metaitems \({\hat{S}} \subseteq {\hat{N}}\) with incidence vector \(y \in Y \cap \{0,1\}^{{\hat{n}}}\). Such a function is studied, e.g., by Cordeau et al. [6] in the context of maximal-covering location problems.

The problem of computing \(\hat{{\mathcal {Q}}}_i(y)\) can be solved in linear time O(|E|) by letting \(x_{j}^* := 1\) for each \(j \in N\) with \(\sum _{\ell \in {\hat{N}}(j)} y_\ell \ge 1\) and letting \(x_j^* := 0\) otherwise. Moreover, as the problem only features upper bounds on the x variables (equal to \(\min \{1,\sum _{\ell \in {\hat{N}}(j)} y_\ell \}\) for each \(j \in N\)), its Linear Programming (LP) relaxation is integer. This property will be useful later.

Let \(\eta _i,w_i\), \(i\in M\) be two auxiliary variables. We introduce the following double hypograph decomposition in which two hypograph reformulations are applied in sequence:

figure e

This decomposition is correct as, in any optimal solution, \(w_i = f(\eta _i)\) holds due to \(\pi _i > 0\), \(i \in M\), and \(\eta _i = \hat{{\mathcal {Q}}}_i(y)\), \(i \in M\), holds due to f being strictly increasing.

Starting from this decomposition, in the following subsections we propose two MILP reformulations for Problem (2) that belong to the (wy) space and feature \(O({\hat{n}} + m)\) variables. Both reformulations are obtained by projecting out the n decision variables x associated with the items \(j \in N\) as well as the m auxiliary variables \(\eta _i\), \(i \in M\).

3.2 Projecting out the \(\eta \) variables via an outer approximation

The auxiliary variable \(\eta _i\), \(i \in M\), can be projected out via Fourier-Motzkin elimination after applying an outer-approximation technique to Constraints (HYPO\(_1\)).

Proposition 2

Constraints (HYPO\(_1\))–(HYPO\(_2\)) are equivalent to

figure f

Proof

Since f is concave, for each \(i \in M\) the set of pairs \((w_i,\eta _i) \in {\mathbb {R}}^2\) that satisfy Constraint (HYPO\(_1\)) forms a convex set. By means of an outer-approximation technique, we can therefore restate Constraints (HYPO\(_1\)) as:

$$\begin{aligned}&w_i \le \big (f(p) - f'(p) \, p \big ) + f'(p) \, \eta _i,&\quad i \in M, p \in \left[ 0, \sum _{j \in N} a_{ij} + d_i\right] , \end{aligned}$$
(5)

where \(f'\) is the first derivative of f and \([0, \sum _{j \in N} a_{ij} + d_i]\) is a superset of the set of values that \(\eta _i\) (and, therefore, \(\hat{{\mathcal {Q}}}_i\) and \({\hat{q}}_{i}\)) can take. As \(f' > 0\) due to f being strictly increasing, we can combine Constraints (5) and (HYPO\(_2\)) using Fourier-Motzkin elimination to project out the \(\eta _i\) variables. This results in Constraints (OA). \(\square \)

In the following two subsections, we discuss two ways of obtaining MILP reformulations of Problem (2) starting from Constraints (OA). The first one exploits the submodularity of \({\hat{q}}_i\), \(i \in M\). The second one relies on the integrality property of the LP relaxation of (VF), and it exploits LP duality in a Benders-cuts fashion. The idea is to combine Constraints (OA) with a finite collection of affine functions yielding a relaxation of the hypograph of, respectively, \({\hat{q}}_i\) and \(\hat{{\mathcal {Q}}}_i\), \(i \in M\), that is tight at every integer point.

3.3 Reformulation based on outer approximation plus submodular cuts: Ref(OA+SC)

As \({\hat{q}}_i\) is submodular, the following constraints are valid due to Proposition 1:

figure g

Combining these constraints with Constraints (OA) of Proposition 2, we obtain the following constraints for each choice of \({\hat{S}} \subseteq {\hat{N}}\), \(i \in M\), and \(p \in [0, \sum _{j \in N} a_{ij} + d_i]\):

$$\begin{aligned}&w_i \le f(p)-f'(p) \, p + f'(p) \, {\hat{q}}_i({\hat{S}}) + \sum _{\ell {\in {\hat{N}} \setminus {\hat{S}}}} f'(p) \, \varrho ^{{\hat{q}}_i}_\ell ({\hat{S}}) \, y_\ell \\&\quad \qquad - \sum _{\ell \in {\hat{S}}} f'(p) \, \varrho ^{{\hat{q}}_i}_\ell \big ({\hat{N}} \setminus \{\ell \}\big ) \, (1-y_\ell ),\\&w_i \le f(p)-f'(p) \, p + f'(p) \, {\hat{q}}_i({\hat{S}}) + \sum _{\ell {\in {\hat{N}} \setminus {\hat{S}}}} f'(p) \, \varrho ^{{\hat{q}}_i}_\ell (\emptyset ) \, y_\ell \\&\qquad \quad - \sum _{\ell \in {\hat{S}}} f'(p)\, \varrho ^{{\hat{q}}_i}_\ell \big ({\hat{S}} \setminus \{\ell \}\big ) \, (1-y_\ell ). \end{aligned}$$

By restricting ourselves to \(p = {\hat{q}}_i({\hat{S}})\) (which suffices to obtain a correct reformulation—see Proposition 3), we obtain the following MILP reformulation of Problem (2), to which we refer as the Reformulation based on Outer Approximation plus Submodular Cuts, or Ref(OA+SC):

figure h

Proposition 3

Ref(OA+SC) is valid.

Proof

Let \(y^* \in Y \cap \{0,1\}^{{\hat{n}}}\) be a feasible solution to Problem (2). Letting \({\hat{S}}^* := \{\ell \in {\hat{N}}: y^*_j = 1\}\) for each \(i \in M\), the pair of Constraints (OA+SC\(_1\)) and (OA+SC\(_2\)) corresponding to \({\hat{S}} = {\hat{S}}^*\) boil down to \(w_i \le f({\hat{q}}_i({\hat{S}}^*))\), thereby guaranteeing (due to \(\pi _i > 0\), \(i \in M\)) that \(\sum _{i \in M} \pi _i w_i = \sum _{i \in M} {\pi _i} f({\hat{q}}_i({\hat{S}}^*))\) holds in any optimal solution. The latter coincides with the objective function of Problem (2). \(\square \)

When solving the Ref(OA+SC) with a branch-and-cut method, to obtain a correct algorithm it suffices to check for the existence of violated constraints among (OA+SC\(_1\)) and (OA+SC\(_2\)) after a solution \((w^*,y^*)\) with \(y^*\) integer is found (thus carrying out a so-called lazy cut separation). Such solutions can be separated very efficiently:

Proposition 4

Assume that f and \(f'\) can be computed in constant time. Given a pair \((w^*, y^*)\) with \(w \in {\mathbb {R}}^m\) and \(y^* \in \{0,1\}^{{\hat{n}}}\), the separation problem asking for a constraint among (OA+SC\(_1\))–(OA+SC\(_2\)) that is violated by \((w^*, y^*)\) can be solved in linear time O(|E|) for each \(i \in M\).

Proof

As observed in the proof of Proposition 3, Constraints (OA+SC\(_1\))–(OA+SC\(_2\)) impose \(w_i \le f({\hat{q}}_i({\hat{S}}))\) for every set \({\hat{S}} \subseteq {\hat{N}}\). Let \({\hat{S}}^* := \{\ell \in {\hat{N}}: y_\ell ^* = 1\}\). If \((w^*, y^*)\) is infeasible, there is some \(i \in M\) such that the two constraints in (OA+SC\(_1\))–(OA+SC\(_2\)) corresponding to \(i \in M\) and \({\hat{S}} = {\hat{S}}^*\) are violated. To solve the separation problem, we only need to compute the coefficients of such constraints. Since \({\hat{q}}_i\) can be computed in linear time by construction and f and \(f'\) can be computed in constant time by assumption, all these coefficients can be computed in linear time O(|E|). Moreover, the coefficients \(\varrho ^{{\hat{q}}_i}_\ell ({\hat{N}} \setminus \{\ell \})\), \(\ell \in {\hat{S}}\), and \(\varrho _\ell ^{{\hat{q}}_i}(\emptyset )\), \(\ell \in {\hat{N}} \setminus {\hat{S}}\), can be precomputed since they do not depend on \({\hat{S}}^*\)\(\square \)

3.4 Reformulation based on outer approximation plus Benders cuts: Ref(OA+BC)

Let us focus on the subproblem underlying the value function \(\hat{{\mathcal {Q}}}_i\), \(i \in M\), defined in (VF). Assuming \(y \in Y \cap \{0,1\}^{{\hat{n}}}\), every optimal solution to the LP relaxation of the subproblem of computing \(\hat{{\mathcal {Q}}}_i\) in which \(x \in \{0,1\}^{n}\) is replaced by \(x \in [0,1]^{n}\) can be assumed to be binary without loss of generality. As the subproblem is bounded, we can consider its LP dual. Let \({\delta }\) and \(\sigma \) be the dual variables associated with, respectively, the constraints linking the x and the y variables and the unit upper-bound constraints on x. Let

$$\begin{aligned} {\mathcal {P}}_i := \bigg \{ ({\delta }, \sigma ) \in {\mathbb {R}}^{n + n}_+ : ~~{\delta }_j + \sigma _j \ge a_{ij},~~ j \in N \bigg \}, \quad i \in M, \end{aligned}$$

be the polyhedron of the dual feasible solutions to (VF) (for simplicity, we drop the index i when referring to \(({\delta },\sigma )\)). By LP duality, the following holds:

figure i

In the Benders decomposition literature, (BS) is called the Benders subproblem. Letting \({\mathcal {P}}^e_i\), \(i \in M\), denote the set of extreme points of \({\mathcal {P}}_i\), we derive the following Benders cuts:

figure j

Thanks to Proposition 2 and by restricting ourselves to \(p = \hat{{\mathcal {Q}}}_i({\hat{y}})\) for all the not necessarily continuous \({\hat{y}} \in Y\) (this suffices to obtain a correct reformulation—see Proposition 5), we obtain the following alternative MILP reformulation of Problem (2), which we refer to as the Reformulation based on Outer Approximation plus Benders Cuts, or Ref(OA+BC):

figure k

For all \(i \in M\) and \(({\tilde{\delta }}, {\tilde{\sigma }}) \in {\mathcal {P}}^e_i\), if \({\hat{y}}\) is integer, (OA+BC) coincides with the constraint

$$\begin{aligned}&w_i \le f({\hat{q}}_i({\hat{S}}))-f'({\hat{q}}_i({\hat{S}})) \, {\hat{q}}_i({\hat{S}}) + f'({\hat{q}}_i({\hat{S}})) \, d_i + \sum _{j \in N} f'({\hat{q}}_i({\hat{S}})) \, {\tilde{\sigma }}_j \\&\qquad \quad + \sum _{j \in N} f'({\hat{q}}_i({\hat{S}})) \, {\tilde{\delta }}_j \Bigg ( \sum _{\ell \in {\hat{N}}(j)} y_\ell \Bigg ), \end{aligned}$$

where \({\hat{S}} := \{\ell \in {\hat{N}}: {\hat{y}}_\ell = 1\}\). Notice that, even if we restrict ourselves to \({\hat{y}} \in Y \cap \{0,1\}^{{\hat{n}}}\) (i.e., to \({\hat{S}} \in {\hat{N}}\)), we have a valid formulation. This is shown in the following proposition:

Proposition 5

Ref(OA+BC) is valid even if we restrict \({\hat{y}}\) to belong to \(Y \cap \{0,1\}^{{\hat{n}}}\).

Proof

Let \(y^* \in Y\) be an optimal integer solution to Problem (2). Pick an optimal pair \((\sigma ^*, \delta ^*) \in {\mathcal {P}}^e_i\) satisfying \(d_i + \sum _{j \in N} \sigma _j^* + \sum _{j \in N} \delta _j^* \Big ( \sum _{\ell \in {\hat{N}}(j)} y_\ell ^* \Big ) = \hat{{\mathcal {Q}}}_i(y^*)\). For each \(i \in M\), it is easy to verify that the instance of Constraints (OA+BC) corresponding to \({\hat{y}} = y^*\) and \(({\tilde{\sigma }}, {\tilde{\delta }}) = (\sigma ^*, \delta ^*)\) boils down to \(w_i \le f(\hat{{\mathcal {Q}}}_i(y^*))\). In particular, due to \(\pi _i > 0\), \(i \in M\), this guarantees that \(\sum _{i \in M} \pi _i w_i = \sum _{i \in M} {\pi _i} f(\hat{{\mathcal {Q}}}_i(y^*))\). \(\square \)

For a given (not necessarily integer) \(y^* \in [0,1]^{{\hat{n}}}\), optimal solutions to (BS) can be computed in closed form according to the following lemma.

Lemma 1

A pair \(({\tilde{\delta }}, {\tilde{\sigma }}) \in {\mathbb {R}}^{n+n}\) is an optimal solution to (BS) if and only if it satisfies the following conditions:

$$\begin{aligned} \begin{array}{llll} {{\tilde{\delta }}_j} = a_{ij}, &{}&{} {\tilde{\sigma }}_j = 0, &{}\quad j \in N: \sum _{\ell \in {\hat{N}}(j)} y_\ell { < 1} \\ {{\tilde{\delta }}_j} = 0, &{}&{} {\tilde{\sigma }}_j = a_{ij}, &{}\quad j \in N: \sum _{\ell \in {\hat{N}}(j)} y_\ell { > 1} \\ {{\tilde{\delta }}_j} = \gamma _j a_{ij}, &{}&{} {\tilde{\sigma }}_j = (1-\gamma _j)a_{ij}, \text{ with } \gamma _j \in [0,1], &{}\quad j \in N: \sum _{\ell \in {\hat{N}}(j)} y_\ell = 1. \end{array} \end{aligned}$$

Proof

Problem (BS) decomposes into n subproblems, one per item \(j \in N\). Each subproblem asks for a value for \({\delta }_j\) and \(\sigma _j \delta _j + \sigma _j \ge a_{ij}\) at minimum cost \(( \sum _{\ell \in {\hat{N}}(j)} y_\ell ) {\delta }_j + \sigma _j\). We refer to it as \(({\tilde{\delta }}, {\tilde{\sigma }})\). If \(\sum _{\ell \in {\hat{N}}(j)} y_\ell < {1}\), the coefficient of \({\delta }_j\) is smaller than the coefficient of \(\sigma _j\) and, thus, we have \({{\tilde{\delta }}_j} = a_{ij}\) and \({\tilde{\sigma }}_j = 0\). If \(\sum _{\ell \in {\hat{N}}(j)} y_\ell > 1\), the coefficient of \({\delta }_j\) is strictly larger than the coefficient of \(\sigma _j\) and, thus, we have \({{\tilde{\delta }}_j} = 0\) and \({\tilde{\sigma }}_j = a_{ij}\). If \(\sum _{\ell \in {\hat{N}}(j)} y_\ell = 1\), the coefficients of \({\delta }_j\) and \(\sigma _j\) are identical and \(({\tilde{\sigma }}, {\tilde{\delta }})\) with \({\tilde{\delta }}_j = \gamma _j a_{ij}\) and \({\tilde{\sigma }}_j = (1-\gamma _j) a_{ij}\) is optimal for every \(\gamma _j \in [0,1]\). \(\square \)

The third case in the lemma arises when the Benders subproblem is (dual) degenerate and admits multiple optimal solutions. Thanks to this, Lemma 1 generalizes the results given in [Cordeau et al. [6], Propositions 1, 2, and 4].

Since any dual pair \(({\tilde{\delta }}, {\tilde{\sigma }})\) with \(\gamma _j \in (0,1)\) for some \(j \in N\) can be obtained as a convex combination of two other dual pairs \(({\tilde{\delta }}^1, {\tilde{\sigma }}^1)\), \(({\tilde{\delta }}^2, {\tilde{\sigma }}^2)\) (one corresponding to \(\gamma _j=1\) and one to \(\gamma _j=0\)), the Benders cut corresponding to \(({\tilde{\delta }}, {\tilde{\sigma }})\) is implied by the Benders cuts derived from \(({\tilde{\delta }}^1, {\tilde{\sigma }}^1)\) and \(({\tilde{\delta }}^2, {\tilde{\sigma }}^2)\). Thus, in Lemma 1 we can w.l.o.g. restrict ourselves to \(\gamma _j \in \{0,1\}\), \(j \in N\).

Thanks to Lemma 1, the following holds:

Proposition 6

Assume that f and \(f'\) can be computed in constant time. Given a pair \((w^*, y^*)\) with \(w^* \in {\mathbb {R}}^m\) and \(y^* \in [0,1]^{{\hat{n}}}\) (not necessarily integer), the separation problem asking for a constraint among (OA+BC) that is violated by \((w^*, y^*)\) can be solved in linear time O(|E|) for each \(i \in M\).

Proof

Due to Proposition 5, \((w^*, y^*)\) is infeasible for a given \(i \in M\) if and only if \(w_i^* > f(\hat{{\mathcal {Q}}}_i(y^*))\), which implies that \(w_i^*\) is strictly larger than the outer approximation of f at \(\hat{{\mathcal {Q}}}_i(y^*)\). Thanks to Lemma 1, the value of the dual multipliers \((\delta ^*, \sigma ^*)\) satisfying \(d_i + \sum _{j \in N} \sigma _j^* + \sum _{j \in N} \delta _j^* \Big ( \sum _{\ell \in {\hat{N}}(j)} y_\ell ^* \Big ) = \hat{{\mathcal {Q}}}(y^*)\) can be computed in linear time O(|E|). Since \({\hat{q}}_i\), f, and \(f'\) can be computed in linear time, the claim follows. \(\square \)

The main advantage of Ref(OA+BC) over Ref(OA+SC) is that, as we show in the following section, Constraints (OA+BC) subsume Constraints (OA+SC\(_1\))–(OA+SC\(_2\)). Moreover, if we assume that f and \(f'\) can be computed in constant time, the separation problem for Constraints (OA+BC) can be solved in linear time even when \(y^*\) is fractional, a property which is not enjoyed by Constraints (OA+SC\(_1\))–(OA+SC\(_2\)). From a branch-and-cut perspective, this allows for the efficient separation of Constraints (OA+BC) not just when a solution \((w^*,y^*)\) with an integer \(y^*\) is found, but also at each node of the branch-and-bound tree to produce tighter bounds.

4 On the strength of the new inequalities

In this section, we provide a theoretical comparison of the relative strength of the inequalities featured in Ref(SC), Ref(OA+SC), and Ref(OA+BC). Throughout it, we will compare pairs of constraints of different families corresponding to the same subset \({\hat{S}} \subseteq {\hat{N}}\) and to the same scenario \(i \in M\). When comparing two constraints, we say that one dominates the other one if the coefficients of the former are at least as tight as those of the latter, and that the domination is strict if at least one of the coefficients of the former is strictly tighter.

Let \(N_1\) be the set of items in N that can be covered by a single metaitem \(\ell \in {\hat{N}}\), namely, \(N_1 := \{j \in N: |{\hat{N}}(j)| = 1\}\), and let \(\ell (j)\) be such a metaitem. Similarly, for every subset of metaitems \({\hat{S}} \subseteq {\hat{N}}\), let \(N_1({\hat{S}})\) be the set of items in N which can be covered by a single metaitem \(\ell \in {\hat{S}}\), namely, \(N_1({\hat{S}}) := \{j \in N: |{\hat{S}} \cap {\hat{N}}(j)| = 1 \}\). The definition of \(N_1\) will be useful in the following analysis.

4.1 Comparison between the inequalities in Ref(SC) and Ref(OA+SC)

In this subsection, we make a pairwise comparison between (SC\(_1^{{\hat{h}}}\)) and (OA+SC\(_1\)) and between (SC\(_2^{{\hat{h}}}\)) and (OA+SC\(_2\)). After that, we show that (OA+SC\(_1\)) and (OA+SC\(_2\)) do not dominate one another.

4.1.1 Comparison between (SC\(_1^{{\hat{h}}}\)) and (OA+SC\(_1\))

Sufficient conditions showing that there are instances in which (SC\(_1^{{\hat{h}}}\)) and (OA+SC\(_1\)) do not dominate one another are reported in the following proposition, as well as a sufficient condition subject to which (SC\(_1^{{\hat{h}}}\)) dominates (OA+SC\(_1\)).

Proposition 7

For each \(i \in M\), \({\hat{S}} \subseteq {\hat{N}}\), we have:

  1. 1.

    If \(\varrho _{\ell }^{{\hat{q}}}({\hat{S}}) > 0\) for some \(\ell \in {\hat{N}} \setminus {\hat{S}}\) and \({\hat{q}}_i({\hat{N}} \setminus \{\ell '\}) \ge {\hat{q}}_i({\hat{S}})\) and \(\varrho _{\ell '}^{{\hat{q}}_i} ({\hat{N}} \setminus \{\ell '\}) >0\) for some \(\ell ' \in {\hat{S}}\), (SC\(_1^{{\hat{h}}}\)) and (OA+SC\(_1\)) do not dominate one another.

  2. 2.

    If \({\hat{q}}_i({\hat{S}}) = {\hat{q}}_i({\hat{N}})\), (SC\(_1^{{\hat{h}}}\)) dominates (OA+SC\(_1\)).

The proof is reported in Appendix A. The appendix also reports a numerical illustration of the no domination result between the two constraints.

4.1.2 Comparison between (SC\(_2^{{\hat{h}}}\)) and (OA+SC\(_2\))

Sufficient conditions showing that there are instances in which (SC\(_2^{{\hat{h}}}\)) and (OA+SC\(_2\)) do not dominate one another are reported in the following proposition. The proposition also provides sufficient conditions subject to which (SC\(_2^{{\hat{h}}}\)) dominates (OA+SC\(_2\)) and vice versa.

Proposition 8

For each \(i \in M\) and \({\hat{S}} \subseteq {\hat{N}}\), we have:

  1. 1.

    If \({\hat{q}}_i ({\hat{S}}) \ge {\hat{q}}_i (\{\ell \})\) and \(\varrho _{\ell }^{{\hat{q}}_i} (\emptyset ) >0\) for some \(\ell \in {\hat{N}} \setminus {\hat{S}}\) and \(\varrho _{\ell '}^{{\hat{q}}}({\hat{S}} \setminus \{\ell '\}) > 0\) for some \(\ell ' \in {\hat{S}}\), (SC\(_2^{{\hat{h}}}\)) and (OA+SC\(_2\)) do not dominate one another.

  2. 2.

    If \({\hat{q}}_i ({\hat{S}}) \ge {\hat{q}}_i (\{\ell \})\) for all \(\ell \in {\hat{N}} \setminus {\hat{S}}\) and \(\varrho _{\ell '}^{{\hat{q}}}({\hat{S}} \setminus \{\ell '\}) = 0\) for all \(\ell ' \in {\hat{S}}\), then (OA+SC\(_2\)) dominates (SC\(_2^{{\hat{h}}}\)).

  3. 3.

    If \({\hat{q}}_i({\hat{S}}) = {\hat{q}}_i(\emptyset )\), then (SC\(_2^{{\hat{h}}}\)) dominates (OA+SC\(_2\)).

The proof is reported in Appendix B, together with a numerical illustration of the no domination result between the two constraints.

4.1.3 Comparison between (OA+SC\(_1\)) and (OA+SC\(_2\))

One may wonder whether the two constraints featured in Ref(OA+SC) dominate one another. We answer the question in the negative:

Remark 1

There is at least an instance of Problem (2) where (OA+SC\(_1\)) and (OA+SC\(_2\)) do not dominate one another.

To see this, consider an instance with \(N := \{1, \dots , 6\}\), \({\hat{N}} := \{1, \dots , 5\}\), \(G=(N \cup {\hat{N}}, E)\) with \(N(1) = \{2\}\), \(N(2) = \{2,3\}\), \(N(3) = \{3\}\), \(N(4) = \{1,4\}\), \(N(5) = \{2,3,5,6\}\), \(m = 1\), \(f(z) = 1-\exp (-\frac{1}{\lambda } z)\) with \(\lambda = 10\), and \(a = (0.7, 0.6, 0.5, 0.4, 0.3, 0.2)\). Let \({\hat{S}} = \{1,2\}\). (OA+SC\(_1\)) and (OA+SC\(_2\)) read:

$$\begin{aligned} w&\le 0.105 -0.000 (1-y_1) -0.000 (1-y_2) + 0.000 y_3 +0.099 y_4 +0.045 y_5\\ w&\le 0.105 -0.000 (1-y_1) -\mathbf{0.044} (1-y_2) + \underline{0.045} y_3 +0.099 y_4 +\underline{0.144} y_5. \end{aligned}$$

As the latter is tighter at the coefficient of \(1-y_2\) (in boldface) and weaker at the coefficients of \(y_3\) and \(y_5\) (underlined), the two inequalities do not dominate one another.

Notice that, as the OA part of the two constraints is identical, the same remark applies to (SC\(_1^{{\hat{h}}}\)) and (SC\(_2^{{\hat{h}}}\)).

4.2 Comparison between the inequalities in Ref(OA+SC) and Ref(OA+BC)

As the outer-approximation part of the constraints featured in Ref(OA+SC) and Ref(OA+BC) is identical, comparing Ref(OA+SC) to Ref(OA+BC) requires understanding the relationship between the submodular cuts (SC\(_1^{{\hat{q}}}\)), (SC\(_2^{{\hat{q}}}\)) and the Benders cut (BC).

4.2.1 Comparison between (OA+SC\(_1\)) and (OA+BC)

To simplify the analysis, we first consider the following additional submodular cut:

figure l

Similarly to what we observed in Sect. 2 when comparing (SC\(_3^{{\hat{h}}}\)) to (SC\(_1^{{\hat{h}}}\)), we have that, for every \(i \in M\) and \({\hat{S}} \subseteq {\hat{N}}\), (SC\(_3^{{\hat{q}}}\)) and (SC\(_1^{{\hat{q}}}\)) are equally strong if \(\varrho ^{{\hat{q}}_i}_\ell ({\hat{N}} \setminus \{\ell \}) = 0\) for all \(\ell \in {\hat{S}}\), i.e., unless \(N_1 \ne \emptyset \) and there is an item \(j \in N\) which is covered by a single metaitem \(\ell \in {\hat{N}}\) (i.e., such that \(|{\hat{N}}(j)| = 1\)) with \(a_{ij} > 0\); in such a case, (SC\(_1^{{\hat{q}}}\)) strictly dominates (SC\(_3^{{\hat{q}}}\)). We establish the following:

Proposition 9

For each  pair  of  \(i \in M\) and \({\hat{S}} \subseteq {\hat{N}}\), the submodular cut (SC\(_3^{{\hat{q}}}\)) coincides with the Benders cut (BC) obtained with dual multipliers defined, for each \(j \in N\), as follows:

$$\begin{aligned} {{\tilde{\delta }}_j} = {\left\{ \begin{array}{ll} a_{ij} &{} \text { if }\, j \not \in N({\hat{S}}) \\ 0 &{} \text { otherwise}, \end{array}\right. } \quad \quad \quad \tilde{\sigma }_j = {\left\{ \begin{array}{ll} a_{ij} &{} \text { if }\,j \in N({\hat{S}}) \\ 0 &{} \text { otherwise}. \end{array}\right. } \end{aligned}$$
(6)

Proof

Due to Lemma 1, \(({{\tilde{\delta }}}, {\tilde{\sigma }})\) is an extreme point of \({\mathcal {P}}^e_i\) obtained by setting \(\gamma _j := 0\) for all \(j \in N\). By the choice of multipliers in (6), we have \(\sum _{j \in N} {\tilde{\sigma }}_j = \sum _{j \in N({\hat{S}})} a_{ij}\). Thus, \(d_i + \sum _{j \in N} {\tilde{\sigma }}_j = {\hat{q}}_i({\hat{S}})\) holds. Furthermore, the coefficient of \(y_{\ell }\) (i.e., \(\sum _{j \in N(\ell )} {{\tilde{\delta }}_j}\)) is equal to \(\sum _{j {\in N(\ell ) \setminus } N({\hat{S}})} a_{ij} = \varrho ^{{\hat{q}}_i}_\ell ({\hat{S}})\) for all \(\ell \in {\hat{N}}\). The claim follows. \(\square \)

As observed, Constraints (SC\(_3^{{\hat{q}}}\)) are dominated by (SC\(_1^{{\hat{q}}}\)). We show that the latter are subsumed by Constraints (BC).

Proposition 10

For each  \(i \in M, {\hat{S}} \subseteq {\hat{N}}\), (SC\(_1^{{\hat{q}}}\)) is equal to the (BC) obtained by setting for each \(j \in N\):

$$\begin{aligned} {{\tilde{\delta }}_j} = {\left\{ \begin{array}{ll} a_{ij} &{} \text { if }\, j \not \in N({\hat{S}})\,\text { or }\,j \in N_1 \\ 0 &{} \text { otherwise}, \end{array}\right. } \quad \quad \tilde{\sigma }_j = {\left\{ \begin{array}{ll} a_{ij} &{} \text { if }\,j \in N({\hat{S}}) \setminus N_1 \\ 0 &{} \text { otherwise} \end{array}\right. } \end{aligned}$$
(7)

(i.e., by setting \(\gamma _j = 1\) for all \(j \in N_1\), according to Lemma 1). Such a (BC) reads:

figure m

The proof is reported in Appendix C.

Proposition 10 implies that, for each \(i \in M\) and \({\hat{S}} \subseteq {\hat{N}}\), Constraints (OA+SC\(_1\)) coincides with the following (OA+BC) constraints:

figure n

4.2.2 Comparison between (OA+SC\(_2\)) and (OA+BC)

We start by comparing (SC\(_2^{{\hat{q}}}\)) to (BC).

Proposition 11

For each  \(i \in M\), \({\hat{S}} \subseteq {\hat{N}}\), (SC\(_2^{{\hat{q}}}\)) is dominated by the (BC) obtained by setting for each \(j \in N\):

$$\begin{aligned} {{\tilde{\delta }}_j} = {\left\{ \begin{array}{ll} a_{ij} &{} \text { if }\,j \not \in N({\hat{S}})\text { or }\,j \in N_1({\hat{S}}) \\ 0 &{} \text { otherwise}, \end{array}\right. } \quad \quad \tilde{\sigma }_j = {\left\{ \begin{array}{ll} a_{ij} &{} \text { if }\,j\in N({\hat{S}}) \setminus N_1({\hat{S}}) \\ 0 &{} \text { otherwise} \end{array}\right. } \end{aligned}$$
(8)

(i.e., by setting \(\gamma _j = 1\) for all \(j \in N\) according to Lemma 1), and strictly dominated if \(\exists \ell {\in {\hat{N}} \setminus {\hat{S}}}: \varrho _{\ell }^{{\hat{q}}_i}(\emptyset ) > \varrho _{\ell }^{{\hat{q}}_i}({\hat{S}}) + \sum _{j \in N(\ell ) \cap N_1({\hat{S}})} a_{ij}\). Such a (BC) reads:

figure o

The proof is reported in Appendix D.

Proposition 11 implies that, for each \(i \in M\) and \({\hat{S}} \subseteq {\hat{N}}\), Constraint (OA+SC\(_2\)) is dominated by the following (OA+BC) constraint:

figure p

4.2.3 Comparison between (OA+BC\(_1\)) and (OA+BC\(_2\))

One may wonder whether (OA+BC\(_1\)) and (OA+BC\(_2\)) dominate one another. We answer the question in the negative:

Remark 2

There is at least an instance of Problem (2) where (OA+BC\(_1\)) and (OA+BC\(_2\)) do not dominate one another.

To see this, consider the instance used in Remark 1 and let \({\hat{S}} = \{1,2\}\). (OA+BC\(_1\)) and (OA+BC\(_2\)) read:

$$\begin{aligned} w&\le 0.105 -0.000 (1-y_1) -0.000 (1-y_2) + 0.000 y_3 +0.099 y_4 +0.045 y_5 \\ w&\le 0.105 -0.000 (1-y_1) -\mathbf{0.044} (1-y_2) + \underline{0.045} y_3 +0.099 y_4 +\underline{0.089} y_5. \end{aligned}$$

As the latter is tighter at the coefficient of \(1-y_2\) (in boldface) and weaker at the coefficients of \(y_3\) and \(y_5\) (underlined), the two inequalities do not dominate one another. \(\square \)

4.3 Comparison of the inequalities in Ref(SC) and Ref(OA+BC)

Due to Proposition 10, Constraints (OA+SC\(_1\)) and (OA+BC\(_1\)) coincide. Thus:

Corollary 1

For each \(i \in M\) and \({\hat{S}} \subseteq {\hat{N}}\), (SC\(_1^{{\hat{h}}}\)) and (OA+BC\(_1\)) enjoy the same relationship as (SC\(_1^{{\hat{h}}}\)) and (OA+SC\(_1\)), as reported in Proposition 7.

A similar result does not hold for (SC\(_2^{{\hat{h}}}\)) and (OA+BC\(_2\)).

Corollary 2

There are instances where (SC\(_2^{{\hat{h}}}\)) and (OA+BC\(_2\)) do not dominate one another.

Proof

Consider the instance used in Example 1 and let \({\hat{S}} = \{1,2,4\}\). (SC\(_2^{{\hat{h}}}\)) and (OA+BC\(_2\)) read:

$$\begin{aligned} w&\le 0.198 -0 (1-y_1) - 0.041 (1-y_2) +0.049 y_3 -0.093 (1-y_4) +0.148 y_5\\ w&\le 0.198 -0 (1-y_1) - \underline{0.040} (1-y_2) +\mathbf{0.041} y_3 -\underline{0.088} (1-y_4) + \mathbf{0.081} y_5. \end{aligned}$$

The latter inequality is tighter at the coefficients of \(y_3\) and \(y_5\) (in boldface) but weaker at those of \(1-y_2\) and \(1-y_4\) (underlined). Thus, the two inequalities do not dominate one another. \(\square \)

4.4 Summary of the theoretical results on the relative strength of the inequalities

A visual summary of the relative strength of the families of inequalities studied in this paper is provided in Fig. 3. An arrow pointing from A to B indicates that A is strictly dominated by B (subject to the indicated conditions). An arrow in both directions indicates that the two inequalities are identical. A connection without directions indicates that domination between the two inequalities may or may not hold based on the conditions we derived. The proposition corresponding to each relationship is reported next to its arrow. Solid lines refer to the general case where no assumptions on f are made. Dashed lines refer to the case where f is an affine function. Such a case is analyzed in the following proposition.

Proposition 12

If f is affine, (OA+BC\(_1\)) coincides with (SC\(_1^{{\hat{h}}}\)) and (OA+BC\(_2\)) dominates (SC\(_2^{{\hat{h}}}\)) for each \(i \in M, {\hat{S}} \subseteq {\hat{N}}\).

Proof

Let \(f (z) := \alpha z + \beta \) for some \(\alpha \in {\mathbb {R}}^+\) and \(\beta \in {\mathbb {R}}\). Then, \(f'({\hat{q}}_i({\hat{A}})) = \alpha \) for all \({\hat{A}} \subseteq {\hat{N}}\). Thus, \(\varrho _\ell ^{{\hat{h}}_i}({\hat{B}}) = \alpha \, \varrho _\ell ^{{\hat{q}}_i}({\hat{B}}) = f'\big ({\hat{q}}_i({\hat{A}})\big ) \, \varrho _\ell ^{{\hat{q}}_i}({\hat{B}})\) for all \(\ell \in {\hat{N}}\) and \({\hat{A}}, {\hat{B}} \subseteq {\hat{N}}\). It follows that (OA+SC\(_1\)) coincides with (SC\(_1^{{\hat{h}}}\)) and that (OA+SC\(_2\)) coincides with (SC\(_2^{{\hat{h}}}\)). Thus, the claim follows since (OA+BC\(_1\)) coincides with (OA+SC\(_1\)) due to Proposition 10 and (OA+BC\(_2\)) dominates (OA+SC\(_2\)) due to Proposition 11\(\square \)

Fig. 3
figure 3

A visual summary of the main theoretical results on the relative strength of the inequalities studied in this paper.

To summarize, the theoretical results we obtained allow us to conclude that, given a pair \(i \in M\) and \({\hat{S}} \in {\hat{N}}\), there are instances where the constraints of Ref(OA+BC) strictly dominate those of Ref(OA+SC), whereas in general there is no domination between the constraints of Ref(SC) and those of Ref(OA+SC) and Ref(OA+BC). For the special case in which the utility function f is affine, the OA part of the reformulation cancels out and we have \(\text {Ref(OA+BC)} \subseteq \text {Ref(SC)} = \text {Ref(OA+SC)}\).

4.5 The case of Problem (1)

We conclude this section by comparing the relative strength of Ref(SC), Ref(OA+SC), and Ref(OA+BC) for the case where Problem (1) of Ahmed and Atamtürk [2], Yu and Ahmed [39] is solved as an instance of Problem (2).

As sketched in Sect. 1, we first reduce Problem (1) to (2) by setting \({\hat{N}} = N\) and defining the edge set E of the graph G as a matching. This induces a bijection between the sets N and \({\hat{N}}\) thanks to which each \(\ell \in {\hat{N}}\) is mapped into a single \(j = j(\ell ) \in N\) and each \(j \in N\) is mapped into a single \(\ell = \ell (j) \in {\hat{N}}\). Thus, we have \(x_j = y_{\ell (j)}\) and \(x_{j(\ell )} = y_{\ell }\) for each \(j \in N\) and \(\ell \in {\hat{N}}\). This implies \(\hat{{\mathcal {Q}}}_i(y) = \sum _{j \in N}a_{ij} x_j + d_i\). Letting \(S := \{j \in N: x_j = 1\}\), \(\hat{{\mathcal {Q}}}_i(y)\) coincides with the function \({\hat{q}}_i(S)\) of Problem (2) as well as with the function \(q_i(S)\) of Problem (1). Thus, \(\hat{{\mathcal {Q}}}_i, {\hat{q}}_i\) and \(q_i\) are linear functions and the application of either the submodular cuts (SC\(_1^{{\hat{q}}}\))–(SC\(_2^{{\hat{q}}}\)) to \({\hat{q}}_i\) or the Benders cut (BC) to \(\hat{{\mathcal {Q}}}(y)\) only results in restating the hypograph constraints (HYPO\(_2\)) as the inequalities \(\eta _i \le \sum _{j \in N}a_{ij} x_j + d_i\), \(i \in M\). Therefore, Ref(OA+SC) and Ref(OA+BC) both coincide with the following outer-approximation reformulation, to which we refer as Ref(OA):

$$\begin{aligned} \Big (\text {Ref(OA)}\Big ) \max _{\begin{array}{c} w \in {\mathbb {R}}^{m}\\ x \in X \cap \{0,1\}^n \end{array}} \quad&\sum _{i \in M} \pi _i \, w_i \end{aligned}$$
(9a)
$$\begin{aligned} \!\text {s.t.} \qquad \quad&w_i \le f(p) -f'(p) \, p + f'(p) \, \left( d_i + \sum _{j \in N} a_{ij} \, x_j\right) , \nonumber \\&\qquad \qquad \qquad \qquad i \in M, p \in \bigg [0, \sum _{j \in N} a_{ij} + d_i\bigg ]. \end{aligned}$$
(9b)

Given a (fractional) point \(x^* \in X \cap [0,1]^n\), the separation problem for Constraints (9b) for each \(i \in M\) is solved in linear time by simply setting \(p := \sum _{j \in N} a_{ij} x_j^* + d_i\) and computing f(p) and \(f'(p)\) (this takes linear time O(n) assuming that f and its derivative can be computed in constant time).

In Yu and Ahmed [39], the authors exploit the submodularity of the objective function of Problem (1) and propose a version of the inequalities obtained in Ahmed and Atamtürk [2] that is stronger when the problem is subject to a single knapsack constraint. In particular, they propose a branch-and-cut algorithm based on the separation of the uplifted version of the submodular constraint (SC\(_2^{{\hat{h}}}\)) (see Inequality (9) on page 153 of Yu and Ahmed [39]), as well as on the separation of the standard submodular inequality (SC\(_1^{{\hat{h}}}\)).

Let us assume the same setting as Yu and Ahmed [39] where Problem (1) is subject to a knapsack constraint \(\sum _{j\in N} \beta _j x_j \le B\) and the utility function is defined as in (f). We compare the constraints featured in Ref(OA) to those of Yu and Ahmed [39] with the following two numerical results.

Remark 3

(SC\(_1^{{\hat{h}}}\)) and (9b) do not dominate one another.

To see this, consider the following instance taken from Yu and Ahmed [39] where G is the matching graph with \(n = {\hat{n}} = 6\), \(m=1\), \(f = -\exp (-a x)\), the knapsack budget is \(B=1\) and the weights and item values are defined as:

$$\begin{aligned} \beta&= (0.3023, 0.1892, 0.3884, 0.1047, 0.5938, 0.6699)\\ a&= (0.3008, 0.3621, 0.4233, 0.6395, 0.1164, 0.0448). \end{aligned}$$

For \(S = \{1,2\}\), the (SC\(_1^{{\hat{h}}}\)) constraint featured in the formulation of Yu and Ahmed [39] and Constraint (9b) featured in Ref(OA) read:

$$\begin{aligned} w&\le -0.515 -0.053 (1-x_1) -0.066 (1-x_2) +0.178 x_3 +0.244 x_4 +0.057 x_5 +0.023 x_6\\ w&\le -0.515 -\mathbf{0.155} (1-x_1) -\mathbf{0.186} (1-x_2) +\underline{0.218} x_3 +\underline{0.330} x_4 +\underline{0.060} x_5 +\underline{0.024} x_6. \end{aligned}$$

Since the latter is tighter at the coefficients of \(1-x_1\) and \(1-x_2\) (in boldface) and weaker at the coefficients of \(x_3, x_4, x_5, x_6\) (underlined), the two inequalities do not dominate one another. \(\square \)

Remark 4

The lifted version of (SC\(_2^{{\hat{h}}}\)) proposed by Yu and Ahmed [39] and (9b) do not dominate one another.

To see this, consider the same example as in the proof of Proposition 3. For \(S = \{1,2,3,4\}\), the inequality obtained with the lifting procedure in Yu and Ahmed [39] (Inequality (9) in the paper) and Constraint (9b) in Ref(OA) read:

$$\begin{aligned} w&\le -0.178 -0.062 (1-x_1) -0.077 (1-x_2) -0.093 (1-x_3) -0.159 (1-x_4) \\&\quad + 0.037 x_5 + 0.017 x_6\\ w&\le -0.178 -\underline{0.053} (1-x_1) -\underline{0.064} (1-x_2) -\underline{0.075} (1-x_3) -\underline{0.113} (1-x_4) \\&\quad +\mathbf{0.021} x_5 +\mathbf{0.008} x_6. \end{aligned}$$

Since the latter is tighter than the previous one at the coefficients of \(1-x_1, 1-x_2, 1-x_3, 1-x_4\) (underlined) and weaker at the coefficients of \(x_5\) and \(x_6\) (in boldface), the two inequalities do not dominate one another. \(\square \)

Computational experiments comparing the impact of these inequalities on solving instances of Problem (1) are reported in Sect. 6.

5 Generalization of Problem (2) to concave and differentiable utility functions

In this section, we discuss the extension of our methods to the case where the utility function is concave and differentiable, but not necessarily strictly increasing (as assumed up to this point). To distinguish this new utility function from the previous one, we will refer to it as g. Such a case arises in applications where, e.g., the utility function g is composed of two terms: a revenue term which increases with the amount of customers (having a positive decreasing derivative due to economies of scale) and a negative component which accounts for the total amount of money spent for running the service (assuming a unit cost per customer which decreases with the number of customers). In such cases, once the system becomes overloaded and the optimal running capacity is exceeded, g switches from increasing to decreasing. Applications of this type can be found in, e.g., competitive facility location problems where the objective function is the difference between the market capture and the investment cost (see Küçükaydın et al. [22], Lin and Tian [24] and further references therein).

As an example of a utility function of this type, consider the following one

figure q

where \(\lambda \) is the risk-tolerance parameter and \(\alpha >0\) is an additional parameter establishing a trade-off between the increasing and the decreasing terms. An illustration of this function for different values of \(\lambda \) is provided in Fig. 4. As shown in the figure, larger values of \(\lambda \) lead to the function achieving its maximum for larger values of z. For the case illustrated in the figure (where \(\alpha = 0.5\)), such a maximum is reached for values of z larger than 1 whenever \(\lambda \ge 2\).

Fig. 4
figure 4

An example of a concave, differentiable, and not increasing utility function \(g(z)=1 - \exp (-\frac{z}{\lambda }) - \alpha \; \frac{z}{\lambda }\), with \(\alpha = 0.5\), graphed for different values of \(\lambda \).

When combining a not increasing function such as the one defined in (g) with the set function \({\hat{q}}_i\), the obtained function \({\hat{h}}_i := g \circ {\hat{q}}_i\) (which reads \({\hat{h}}_i({\hat{S}}) = 1-\exp (-\frac{{\hat{q}}_i({\hat{S}})}{\lambda }) - \alpha \, \frac{{\hat{q}}_i({\hat{S}})}{\lambda }\) for the case of (g)) is the difference of two submodular functions. As such a function is not submodular in the general case, Ref(SC) and the submodular constraints in Proposition 1 are not valid for Problem (2). This implies that the inequalities introduced in Ahmed and Atamtürk [2] and Yu and Ahmed [39] are not valid either. To account for this case, we extend the double-hypograph decomposition we proposed in Sect. 3.1 as follows:

figure r

The new epigraph constraints (EPI) are added since \(g(\eta _i)\) can be decreasing for a sufficiently large \(\eta _i\) and, thus, imposing \(\eta _i \le \hat{{\mathcal {Q}}}_i(y)\) with (HYPO\(_2\)) does not suffice to guarantee \(\eta _i = \hat{{\mathcal {Q}}}_i(y)\) in an optimal solution.

Thanks to a result by Edmonds [10], Constraints (EPI) can be restated as the polymatroid constraints

$$\begin{aligned} \eta _i \ge d_i + \sum _{\ell \in {\hat{N}}} \vartheta _{\ell } \, y_{\ell }, \qquad \qquad \vartheta \in P^{{\hat{q}}_i}, i \in M, \end{aligned}$$

where \(P^{{\hat{q}}_i} := \left\{ \vartheta \in {\mathbb {R}}^{{\hat{n}}}: \sum _{\ell \in \hat{S}} \vartheta _\ell \le {\hat{q}}_i(\hat{S}) - d_i, \; \hat{S} \subseteq \hat{N} \right\} \) is the submodular polyhedron (or polymatroid) of the submodular function \({\hat{q}}_i\), \(i \in M\). These constraints can be separated by maximizing a linear objective function over \(P^{{\hat{q}}_i}\) using Edmonds’ (polynomial time) greedy algorithm. Given a pair \((\eta _i^*, y^*)\) with \(\eta _i^* \in {\mathbb {R}}_+\) and \(y^* \in Y \cap [0,1]^{{\hat{n}}}\), the algorithm sorts the metaitems in \({\hat{N}}\) in nonincreasing order of the associated \(y^*\) values, i.e., such that \(y^*_{\mu (1)} \ge y^*_{\mu (2)} \ge \cdots \ge y^*_{\mu (\hat{n})}\) holds, where \(\mu \) is a suitable permutation of \({\hat{N}}\). Let \(\hat{S}_{\mu (\ell )} := \{\mu (1), \mu (2),\dots , \mu (\ell )\}\) be the set of the first \(\ell \) metaitems according to new order. By setting

$$\begin{aligned} \vartheta _{\mu (\ell )} := \hat{q}_i (\hat{S}_{\mu (\ell )} ) - \hat{q}_i (\hat{S}_{\mu (\ell -1)} ), \qquad ~~~~ \ell =1,2,\dots , {\hat{n}}, \end{aligned}$$
(10)

where \(\hat{S}_{\mu (0)} := \emptyset \), the pair \((\eta _i^*,y^*)\) violates one of the polymatroid constraints if and only if \(\sum _{\ell \in \hat{N}} \vartheta _{\mu (\ell )} \, y_{\ell }^* + d_i > \eta _i^*\). Since \(\vartheta \) only depends on the permutation \(\mu \), the polymatroid constraints can be restated as

figure s

where \(\vartheta \) is defined as in (10) and \(\mu \) is any permutation of the elements of \({\hat{N}}\). Due to \(\hat{q}_i\) being submodular, Constraints (POLY) are facet defining for the epigraph of \(\hat{{\mathcal {Q}}}_i(y)\), see, e.g., Atamtürk and Narayanan [3].

As before, we apply an outer approximation to Constraints (HYPO\(_1'\)) and combine them with Constraints (HYPO\(_2\)) via Fourier-Motzkin elimination. Following the same derivations used for the case with f strictly increasing, we obtain Constraints (OA+SC\(_1\))–(OA+SC\(_2\)) and (OA+BC\(_1\))–(OA+BC\(_2\)). Since \(g'\) can be either 0 or negative, though, such constraints are valid only for a set \({\hat{S}} \subseteq {\hat{N}}\) satisfying \(g'({\hat{q}}_i({\hat{S}})) \ge 0\) and for a vector \({\hat{y}}\) satisfying \(g'(\hat{{\mathcal {Q}}}_i({\hat{y}})) \ge 0\), respectively (note that, at any point where \(g'\) takes value 0, these constraints just boil down to \(w_i \le UB\), where UB is the maximum value of g). For every vector \({\hat{y}} \in Y\) satisfying \(g'(\hat{{\mathcal {Q}}}_i({\hat{y}})) < 0\), the combination of the outer approximation of Constraints (HYPO\(_1'\)) with Constraints (EPI) via Fourier-Motzkin elimination leads to

figure t

We call Ref(OA+SC+POLY) and Ref(OA+BC+POLY) the reformulations featuring (POLY) (imposed only for \({\hat{S}} \subset {\hat{N}}: g'(\hat{{\mathcal {Q}}}_i({\hat{y}})) < 0\)) and either (OA+SC\(_1\))–(OA+SC\(_2\)) or (OA+BC\(_1\))–(OA+BC\(_2\)) (the former imposed only for \({\hat{S}} \subset {\hat{N}}: g'({\hat{q}}_i({\hat{S}})) \ge 0\) and the latter only for \({\hat{y}} \in Y: g'(\hat{{\mathcal {Q}}}_i({\hat{y}})) \ge 0)\)). Experiments carried out with these two reformulations with the utility function (g) are reported in Sect. 6.

It is worth mentioning that for the case where the utility function is not strictly increasing and one has access to a decomposition of \({\hat{h}}_i\) (for each scenario \(i \in M\)) as a difference of two submodular functions, the recently-proposed method of Atamtürk and Narayanan [3] can be an alternative to the approach proposed in this section. Even though such a decomposition always exists, it is not always trivial to find it. Hence, one of the advantages of our method, compared to the one of Atamtürk and Narayanan [3], is its potentially wider applicability as it does not require finding such a decomposition.

6 Computational results

In this section, we assess the computational effectiveness of the MILP reformulations proposed in the paper. We carry out three sets of experiments, considering i) instances of Problem (2) featuring a strictly increasing function f (Sect. 6.1), ii) instances of Problem (2) featuring a not increasing function g (Sect. 6.2), and iii) instances of Problem (1), studied in Ahmed and Atamtürk [2] (Sect. 6.3).

Our experiments are performed on a single node of the IRIDIS 5 cluster at the University of Southampton, equipped with 40 dual 2.0 GHz Intel Skylake processors and 192 GB of DDR4 memory. Our code is written in C/C++ and compiled with gcc 9.2.1, using the -O3 compiler optimization flag. Our branch-and-cut algorithms are implemented in CPLEX 12.9.0 using the CALLABLE LIBRARY framework, and run in single-threaded mode.Footnote 3 We set a time limit of 600 seconds for each run.Footnote 4

6.1 Experiments for Problem (2) with a strictly increasing utility function

We experiment with four branch-and-cut (B &C) algorithms: SC, OA+SC, OA+BC, and OA+BC+f. SC is based on the standard reformulation Ref(SC) that relies on Constraints (SC\(_1^{{\hat{h}}}\)) and (SC\(_2^{{\hat{h}}}\)) (see Sect. 2). OA+SC is based on the reformulation Ref(OA+SC) that relies on Constraints (OA+SC\(_1\)) and (OA+SC\(_2\)) (see Sect. 3.3). OA+BC and OA+BC+f are based on the reformulation Ref(OA+BC) that relies on Constraints (OA+BC\(_1\)) and (OA+BC\(_2\)) (see Sect. 3.4); the separation is carried out only for an integer \(y^*\) in the former and for both an integer and a fractional \(y^*\) in the latter. The separation is performed with a violation tolerance of \(10^{-6}\).

6.1.1 Instances

In this set of experiments, we adopt the strictly increasing function (f) (see Sect. 1.1), also adopted in Ahmed and Atamtürk [2], Yu and Ahmed [38]. We consider five values of \(\lambda \in \{0.25,0.5,0.75,1,2\}\) and two values of \(m \in \{50,100\}\) (number of scenarios).

We consider four different sets of constraints to define the feasible region Y of Problem (2): (i) a k-cardinality constraint \(\sum _{\ell \in {\hat{N}}} y_{\ell } \le k\), used in, e.g., Nemhauser et al. [29]; (ii) a knapsack constraint \(\sum _{\ell \in {\hat{N}}} \beta _{\ell } \; y_{\ell } \le B\) with weights \(\beta _\ell \ge 0\), \(\ell \in {\hat{N}}\), and budget \(B \ge 0\), used in, e.g., Sviridenko [34], Ahmed and Atamtürk [2], Yu and Ahmed [38]; (iii) a set of partition-matroid constraints \(\sum _{\ell \in {\hat{N}}_q} y_{\ell } \le k_q\), with \(1 \le q \le p\), according to which \({\hat{N}}\) is partitioned into p groups \({\hat{N}}_1, \dots , {\hat{N}}_p\), each subject to a cardinality constraint with budget \(k_p\); these constraints are used in, e.g., Friedrich et al. [14], Sviridenko [34]; (iv) a set of conflict constraints \(y_{\ell } + y_{\ell '} \le 1\), \(\{\ell ,\ell '\} \in {\mathcal {E}}\), used to model pairwise incompatibilities between metaitems; these constraints are based on a conflict graph \({{\mathcal {G}}}=({\hat{N}},{\mathcal {E}})\), where \({\mathcal {E}}\) is the set of edges representing the conflicts.

Motivated by the fact that Problem (2) subsumes the maximal-coverage location problem with uncertain customer demands and a concave utility function, we generate our instances by extending the procedure proposed by ReVelle et al. [30] and Cordeau et al. [6] for the deterministic version of such a problem. To model the covering relationship between the sets of items N and metaitems \({\hat{N}}\), N and \({\hat{N}}\) are embedded in \({\mathbb {R}}^2\) and the covering relationship is spatially induced by associating a covering radius \({\hat{R}}\) with each metaitem (see Fig. 2). The coordinates of each item and metaitem in \(N \cup {\hat{N}}\) are chosen uniformly at random from [0, 30].

For each item \(j \in N\), the set \({\hat{N}}(j)\) contains all the metaitems \(\ell \in {\hat{N}}\) whose Euclidean distance from j is no larger than the covering radius \(\hat{R} \in \{2, 4, 5, 6\}\) (smaller values of \({\hat{R}}\) result in sparser instances). The items are sampled in such a way that each of them is covered by at least a metaitem. For each scenario \(i \in M\) and item \(j \in N\), we set \(a_{ij}\) to zero with probability \(r \in \{\frac{1}{4},\frac{1}{2}\}\) and draw the value of \(u_{ij}\) uniformly at random in \(\{1,2,\dots , 10\}\) with probability \(1-r\). We then introduce the coefficient \({\varDelta } := \sum _{i \in M}\pi _i \sum _{j \in N} u_{ij}\), corresponding to the expected maximum return across all scenarios, and define the normalized value \(a_{ij} := u_{ij}/{\varDelta }\). For each choice of n and \({\hat{n}}\) and for each set of constraints, we generate 5 instances using a different seed for the random-number generator, 4 different values for the radius \({\hat{R}}\), and 2 different values for the scenario-item probability r.

We consider three main classes of instances: (i) the class k-card, containing 7200 instances subject to a k-cardinality constraint; they are obtained by choosing \(n \in \{5000, 10,\!000,\) \(20,\!000\}\), \({\hat{n}} \in \{40,50,60\}\), and \(k \in \{10,15\}\). (ii) the class kp, containing 3600 instances subject to a knapsack constraint; they are obtained by choosing \(n \in \{5000, 10,\!000, 20,\!000\}\) and \({\hat{n}} \in \{40,50,60\}\); \(\beta _\ell \) and B are defined following the procedure described by Martello et al. [26] for generating strongly correlated instances, which are among the most difficult instances for the knapsack problem.Footnote 5 (iii) the class p-matroid, containing 1200 instances subject to partition-matroid constraints; they are obtained by choosing \(n \in \{5000, 10,\!000, 20,\!000\}\), \({\hat{n}} = 60\), \(p=3\), and \(k_q=5\), \(q = 1, \dots , 3\). In all these instances, we set \(\pi _i = \frac{1}{m}\) and \(d_i=0\), \(i \in M\). For each of these three classes, we generate another class that is also subject to conflict constraints induced by an Erdös-Rény graph \({{\mathcal {G}}}=({\hat{N}},{\mathcal {E}})\) with edge probability 0.01; in the following, we refer to these three classes as k-card & confl, kp & confl, and p-matroid & confl. Overall, the testbed contains six classes and a total of 24,000 instances.

6.1.2 Initialization of the B &C algorithms with a greedy solution

The greedy algorithm proposed in Nemhauser et al. [29] is a simple and effective way to obtain feasible solutions to the problem of maximizing a submodular function subject to a k-cardinality constraint. As shown in Nemhauser et al. [29], such solutions are \((1 - \frac{1}{e})\)-approximate (where e is the base of the natural logarithm). For the basic variant of Problem (2) with a single k-cardinality constraint, the greedy algorithm starts with an empty set of metaitems (\({\hat{S}} = \emptyset \)), and, at each iteration, adds to \({\hat{S}}\) the metaitem \(\ell \in \arg \max _{\ell \not \in {\hat{S}}} \sum _{i \in M} \pi _i \; \varrho ^{{\hat{h}}_i}_{\ell }({\hat{S}})\) (thereby selecting the metaitem with the largest marginal contribution) until \(|{\hat{S}}| = k\).

The algorithm can be extended to work with all the constraints we introduced by selecting the item of maximum marginal contributions among those that can be added to \({\hat{S}}\) without violating any constraints. In the partition-matroid case, this algorithm achieves a \(\frac{1}{2}\) approximation ratio, see Cornuejols et al. [7]. In the knapsack case, an extension that combines it with a partial enumeration scheme achieves a \(1-\frac{1}{e}\) approximation, see Sviridenko [34]. As the problem of maximizing a linear function subject to a set of conflicts is inapproximable to within \({\hat{n}}^{1-\epsilon }\) for any \(\epsilon > 0\) unless \({\mathcal {ZPP}}={\mathcal{N}\mathcal{P}}\) Håstad [17], no constant-factor approximation factor is obtained when conflict constraints are imposed.

In all of our experiments, we adopt the greedy algorithm to provide a first feasible solution to our branch-and-cut algorithms. In Appendix E, we illustrate how to construct a set of worst-case instances of Problem (2) subject to a k-cardinality constraint on which the greedy algorithm achieves the worst-case approximation factor of \(1-\frac{1}{e}\).

6.1.3 Comparison of the four B &C algorithms

In Table 1, we compare the performance of the four B &C algorithms (SC, OA+SC, OA+BC, and OA+BC+f) for solving Problem (2) with f strictly increasing. The table is vertically divided in six parts. The first part reports the instance class and the total number of instances it contains (# total). The next four parts report, for each of the four exact methods, the total number of instances solved to optimality within the time limit (denoted by # opt) and the average gap for the unsolved instances at the time limit. The gaps are calculated as \(\frac{UB - LB}{UB}\cdot 100\), where UB and LB are, respectively, the best upper bound and the incumbent solution value achieved by the corresponding method within the time limit. The last part reports the gap obtained with the greedy algorithm for the instances that are not solved to optimality by OA+BC+f.

Table 1 Computational performance of the four B &C algorithms within a time limit of 600 secs. The gaps are reported in percentage for the instances not solved to proven optimality.

When comparing OA+SC to SC, the number of instances solved to optimality by OA+SC is about 80% larger and the average gap is about 3 times smaller. For all classes of instances except for the kp one, OA+SC solves three to four times more instances than SC.

Further performance improvements are obtained with OA+BC, which solves about 1700 more instances than OA+SC and reduces the average gap of OA+SC by about 40%. This is in line with our theoretical results, showing that Constraints (OA+BC\(_2\)) strictly dominate Constraints (OA+SC\(_2\)) (see Proposition 11) and that Constraints (OA+BC\(_1\)) are equivalent to Constraints (OA+SC\(_1\)) (see Proposition 10).

OA+BC+f achieves a drastic improvement when compared to OA+BC thanks to separating Constraints (OA+BC\(_1\))–(OA+BC\(_2\)) also on fractional points. In particular, OA+BC+f is able to solve 94% of the instances across all classes, obtaining an average gap for the remaining 1553 unsolved instances of only 0.3%. Performance profiles that clearly demonstrate the superiority of OA+BC+f for each class of instances are reported in Appendix G.Footnote 6

We note that our implementation of the greedy algorithm requires on average \(\approx 1\) sec of computing time. Let us consider the gap of this algorithm for the instances that cannot be solved to optimality by OA+BC+f, as reported in the last column of Table 1 (this gap is obtained by replacing LB by the value of the greedy solution). By comparing the gaps obtained with OA+BC+f to those obtained with the greedy algorithm, we observe that the average gap of the instances that are not solved by OA+BC+f decreases from 1.5% to 0.3%. Overall, Table 1 shows that the quality of the incumbent solutions found by the B &C algorithm OA+BC+f is superior to the quality of the solutions found by the greedy algorithm not only on those which OA+BC+f manages to solve to optimality, but also on those where the algorithm incurs in the time limit.

We now focus on the 7200 instances from the class k-card to identify the main instance features by which the computing times of OA+BC+f are affected. Figure 5 reports the computing times of OA+BC+f after aggregating the instances by the covering radius \({\hat{R}}\), the value of \(\lambda \), the number of items n, and the number of metaitems \({\hat{n}}\).Footnote 7 The four box plots in the figure show that, by doubling the number of items, the average computing time doubles. However, by increasing the number of metaitems by 50%, the average computing time grows by a factor of 10. Increasing the length of the covering radius (and, thus, the density of G) also results in slightly larger computing times. In spite of this, very sparse instances obtained for \({\hat{R}}=2\)Footnote 8 are extremely easy to solve for our B &C methods. For larger values of \(\lambda \), the instances tend to become easier. This can be due to the fact that, the larger \(\lambda \), the more the function f is close to being linear. Further analysis (not illustrated in the tables for reasons of space) indicates that, by increasing the number of scenarios from 50 to 100, the average computing time doubles; differently, changing the value of k from 10 to 15 does not have a significant impact on the computing time of OA+BC+f.

Fig. 5
figure 5

Computing times (in logarithmic scale) of OA+BC+f for the k-card instances aggregated by: (i) the covering radius \({\hat{R}}\), (ii) the risk-tolerance \(\lambda \), (iii) the number of items n, and (iv) the number of metaitems \({\hat{n}}\). The number of instances solved within the time limit of 600 secs is reported above each boxplot.

Table 2 Detailed results obtained by using the OA+BC+f algorithm on the instances solved to proven optimality within a time limit of 600 seconds.

Table 2 illustrates the most relevant features concerning the execution of OA+BC+f (the best-performing algorithm) on the 22,447 instances it solves to optimality: LP gap (computed as \(\frac{Z^\text {LP} - Z}{Z^\text {LP}}\cdot 100\), where \(Z^\text {LP}\) is the optimal solution value of the LP relaxation at the root node and Z is the optimal solution value of the problem), number of generated inequalities, number of branching nodes, and average computing time. The column “LP time [s]” reports the average computing time necessary to solve the LP relaxation at the root node (including the time spent to generate the inequalities). The last column of the table, denoted by “time [s]”, reports the total average computing time of OA+BC+f. For what concerns the average and the maximum LP gaps, the table shows that very small LP gaps are obtained across all classes, suggesting that the tight LP gap is one of the main drivers of the excellent performance of OA+BC+f. In particular, the gaps are never larger than \(\approx \) 1.25% and their average values are smaller than \(\approx \) 0.1%. The time taken to solve the LP relaxation is quite small and it varies as a function of the class of the instances. For what concerns the average number of Constraints (OA+BC\(_1\)) and (OA+BC\(_2\)) generated when separating fractional and integer points, the table show that, as one may expect, many more cuts are generated when fractional solutions are separated. Such an increased number of inequalities is well compensated by the smaller LP gaps. The average number of nodes is small in most of the instances, further evidencing the superior performance of OA+BC+f. The only exceptions are the instances from the kp class, for which several thousands of nodes are explored, albeit the time taken to solve each of them is extremely small—smaller, on average, than for the other instances. The average computing time is in line with the percentage of instances solved for the different classes, showing that the hardest instances are the ones subject to partition-matroid constraints. Interestingly, while the introduction of conflict constraints leads to larger LP gaps, it does not affect too much the computing times, leading to an increase by only 14%, 22%, and 12%, for, respectively, the k-card, kp, and p-matroid classes.

6.2 Experiments for the generalization of Problem (2) with a not increasing utility function

We report the results obtained on instances of Problem (2) where the utility function is not increasing and \({\hat{h}}_i\), \(i \in M\), is not submodular. We consider the utility function (g) introduced in Sect. 5, and set \(\lambda \in \{0.25,0.5,0.75,1,2, {3,4}\}\) and \(\alpha =0.5\), considering two values of \(m \in \{50,100\}\).

We consider three B &C algorithms: OA+SC+POLY, OA+BC+POLY and OA+BC+f+POLY. In each of them, Constraints (OA+POLY) are separated at every integer point and also at every fractional point but (based on preliminary experiments) only at the root node. The other constraints are separated as we did before, but only if g is positive at the point being separated: in OA+SC+POLY, we separate Constraints (OA+SC\(_1\))–(OA+SC\(_2\)) at integer points; in OA+BC+POLY, we separate Constraints (OA+BC\(_1\))–(OA+BC\(_2\)) at integer points; in OA+BC+f+POLY, we separate Constraints (OA+BC\(_1\))–(OA+BC\(_2\)) at integer and fractional points.

As testbed, we adopt a subset of 1120 instances of the subclass k-card with \({\hat{n}} = 40\), \(n \in \{5000, 10,000\}\), and \(k = 10\). In line with the previous experiments, each B &C algorithm is initialized with the greedy algorithm (see Sect. 6.1.2), halting it as soon as the marginal contribution becomes negative (note that, since \({\hat{h}}_i\), \(i \in M\), is not submodular, the greedy algorithm does not provide approximate solutions).

The computational results are summarized in Table 3. They are aggregated by the risk-tolerance parameter \(\lambda \) (for a total (# total) of 160 instances per value of \(\lambda \)). For each of the three B &C algorithms, we report the number of instances solved to optimality within the time limit of 600 seconds (# opt) and the average gaps for the remaining instances (gap).

The results are in line with those obtained for the strictly increasing case, and they indicate that the outer approximation method combined with Benders cuts (OA+BC+POLY) clearly outperforms the outer approximation one combined with submodular cuts (OA+SC+POLY). This phenomenon is even more pronounced if Benders cuts are also separated at fractional points (OA+BC+f+POLY). Indeed, OA+BC+f+POLY solves the largest number of instances (616 out of 1120) and, on average, provides the smallest gaps for the unsolved ones (0.2%).

We note that the problem is harder for small values of \(\lambda \) where the decreasing term of g becomes dominant. For \(\lambda =0.25, 0.5\), only 16 instances are solved to optimality by OA+BC+f+POLY and the largest average gaps (between 0.5 and 0.9) are obtained. On such instances, the performance of the three algorithms is almost comparable. For larger values of \(\lambda \), though, OA+BC+f+POLY becomes much better than the other two methods. In particular, all the instances with \(\lambda = 2,3,4\) are solved to optimality by it. On average around 20 thousand (POLY) constraints are generated at integer points (with a maximum of 50 thousand) and less than one thousand constraints are generated at fractional points at the root node of the branching tree (with a maximum of three thousand). It is worth noticing that the number of generated (POLY) constraints tends to decrease by increasing the value of \(\lambda \). This is not surprising since in this case the increasing part of the objective function becomes dominant.

The tests show that our decomposition method based on a double-hypograph reformulation (and, in particular, algorithm OA+BC+f+POLY) can be effectively applied to tackle Problem (2) also when the utility function is not increasing and, thus, the objective function is not submodular and the Ref(SC) reformulation that relies on the submodular cuts of Nemhauser et al. [29] cannot be adopted.

Table 3 Comparison of three B &C algorithms for the utility function (g), with \(\alpha =0.5\), aggregated w.r.t. the value of \(\lambda \).

6.3 Experiments when solving the problem of Ahmed and Atamtürk [2]

We now present the results obtained with our methods applied for the solution of Problem (1) with a single knapsack constraint that is studied in Ahmed and Atamtürk [2], Yu and Ahmed [39] and which Problem (2) generalizes. We adopt the reformulation Ref(OA) derived in Sect. 4.5. In the experiments, we separate its constraints at integer points only (the computational benefits of separating fractional points are negligible for these instances). We refer to the resulting B &C method as OA.

We compare OA to the state-of-the-art B &C method of Yu and Ahmed [39] (which outperforms the previous one proposed in Ahmed and Atamtürk [2]), to which we refer as BnC-LI (LI stands for lifted inequalities).

To test the performance of OA against BnC-LI, we consider the same set of instances tested in Yu and Ahmed [39] with \(n \in \{100,150,200\}\) (number of items), \(m \in \{50,100\}\) (number of scenarios) and \(\lambda \in \{0.8,1,2\}\) (parameter of the utility function \(f(z)=1 - \exp (-\frac{z}{\lambda })\)). For the way a and \(\beta \) are constructed, we refer the reader to Ahmed and Atamtürk [2]. For each value of these parameters, we consider 20 random instances as in Yu and Ahmed [39], for a total of 360 instances.Footnote 9

Since the source code of BnC-LI is not available, its results reported in Table 4 are directly taken from the table reported on page 161 in Yu and Ahmed [39], for which the authors used a Python code based on Gurobi 5.6.3, running on a 2.3 GHz x86 Linux workstation with 7 GB of RAM. While these results were obtained with Gurobi with an optimality gap of \(10^{-5}\), in ours we run CPLEX with a tighter optimality gap of \(10^{-9}\), thus obtaining solutions that are numerically more accurate.

In Table 4, we compare the performance of OA, BnC-LI, as well as of BARON (the latter is used for solving the MILP formulation of Problem (1) given in Sect. 1). The results are aggregated by the value of n, \(\lambda \) and m. For the two B &C methods OA and BnC-LI, we report the average number of cuts, the average number of B &C nodes, and the average computing time. For BARON, we report the average computing time.

While all three methods are able to solve every instance to optimality, they exhibit a substantial difference in terms of performance. As far as the computing time is concerned, the best approach is OA, which solves all the instances in an average time always smaller than 0.2 seconds, whereas the average computing time of BnC-LI can be more than one hundred seconds. OA scales much better than BnC-LI for an increasing number of items n, while BnC-LI tends to struggle for small values of \(\lambda \). On average, the number of B &C nodes of OA is smaller (albeit only slightly) than that of BnC-LI (despite the fact that OA adopts a more stringent optimality tolerance). The difference is more substantial for the instances with a larger number of items n and a smaller value of \(\lambda \). As for BARON, its computing times, while smaller than that of BnC-LI, are up to three orders of magnitude larger than those of OA.

When inspecting the maximum values (not reported in Table 4 for reasons of space) rather than average ones, we observed that the maximum computing time of OA never exceeds 0.32 seconds, whereas for BnC-LI it is as high as 800 seconds and for BARON as high as 15 seconds. The maximum number of cuts for OA never exceeds 1600, whereas it can be as high as 12,864 for BnC-LI. The maximum number of nodes for OA never exceeds 1400, whereas it can be as high as 7031 for BnC-LI.

Overall, our analysis shows that, in spite of the experiments being run on different architectures and with different solvers, the difference in the computing times between OA and BnC-LI is of various orders of magnitude (up to 3) in favor of OA. We believe that such a difference is mainly due to the different complexity of the corresponding separation algorithms. Indeed, while the one in OA only takes linear time O(n), the one in BnC-LI requires a larger time, at least quadratic as shown in Proposition 12 and Corollary 14 of Yu and Ahmed [39]. Moreover, the separation problem in BnC-LI can be solved in polynomial time only for integer solutions and, for fractional solutions, a heuristic separation approach is adopted, differently from our implementation of OA, where fractional points are not separated.

Table 4 Average computational performance of OA, BnC-LI of Yu and Ahmed [39] and BARON when solving Problem (1) with a single knapsack constraint and employing function (f), aggregated w.r.t. n, \(\lambda \), or m.

7 Conclusions

In this article, we have studied the generalization of the problem of maximizing a concave, strictly increasing, and differentiable utility function over a discrete set studied by Ahmed and Atamtürk [2] that is obtained by combining the utility function with a set-union operator. We have introduced a double-hypograph reformulation for the problem that has allowed us to decompose it into its two main components, one related to the utility function and the other one related to the set-union operator. We have then linearized the two components: we have replaced the utility function by its first order outer-approximation (OA) and the set-union operator by either submodular cuts (SCs) or via Benders cuts (BCs). We have obtained two MILP reformulations: Ref(OA+SC) and Ref(OA+BC). We have studied the relationship between these two reformulations and the standard formulation Ref(SC) of Nemhauser et al. [29], and derived theoretical results to compare the strength of the associated constraints. In particular, we have shown that the constraints in Ref(OA+BC) are at least as tight as those in Ref(OA+SC) and that there are instances in which there is no dominance between these constraints and those of Ref(SC). For the special case in which the utility function is affine, we have shown that the constraints of Ref(SC) and Ref(OA+SC) are dominated by those of Ref(OA+BC). Considering Problem (1), we have shown that there is no dominance between the constraints proposed in Ahmed and Atamtürk [2], Yu and Ahmed [39] and those of our OA-based reformulation.

We have then extended our methods to the case where the utility function is not necessarily increasing and, thus, the objective function of the problem is not submodular, combining our double-hypograph reformulation with an epigraph reformulation of a submodular function obtained via polymatroid inequalities.

In our computational study, we have shown that a branch-and-cut approach based on Ref(OA+BC) drastically outperforms the alternative MILP reformulations as well as the state-of-the-art solver for mixed-integer nonlinear optimization BARON, also for the case where the utility function is not increasing. This is particularly true when tackling problems in which the number of items is many orders of magnitude larger than the number of metaitems. We remark that our approach can be straightforwardly generalized to the case in which a different utility function is considered in each scenario. The same is true if the utility function is a piecewise linear function. In particular, since, in such a case, only as many supporting functions as the number of pieces are needed, the overall approach is likely to be more efficient.

Many open questions still remain to be addressed for this rich optimization problem. One way to linearize the problem is to work in the (xy) space and proceed along the lines of Ahmed and Atamtürk [2] to “project f out onto the x variables”, which is a way of deriving outer-approximation cuts for a hypograph reformulation of the problem. Once the linearization is performed in the (xy) space, the x variables could be projected out in a Benders-like fashion. Finally, one could obtain an alternative MILP formulation in the y space by generating Generalized Benders Cuts (Geoffrion [15], Fischetti et al. [12]) derived from the associated convex continuous subproblem. The connection between these different types of decomposition and linearizations and the strength of the respective MILP reformulations (both in the (xy) and in the y space) remain an open topic for future research.

Another line of research addresses the investigation of problems where the function \({\hat{q}}_i\) is not induced by the set-union operator. In particular, our decomposition approach and the inequalities we derived based on the outer approximation combined with submodular cuts can be applied to any problem involving a function \(h_i = f \circ {\hat{q}}_i\) where \({\hat{q}}_i\) is an increasing submodular set function. A notable example is the one where \({\hat{q}}_i\) is the rank (or weighted rank) function of a matroid, e.g., the value function of a graphic matroid whose ground set is determined by the set variable \({\hat{S}}\). In such a case, one could transform \({\hat{q}}_i\) into the value function \(\hat{{\mathcal {Q}}}_i\) of a suitable optimization problem and construct Benders cuts analogue to those we derived by leveraging the full description of the convex hull of the problem underlying \(\hat{{\mathcal {Q}}}_i\).