Skip to content Skip to navigation


Papers under Review and Research in Progress

On the Disclosure of Promotion Value in Platforms with Learning Sellers.

With Gregory Macnamara and Daniela Saban.

We consider a platform facilitating trade between sellers and buyers with the objective of maximizing consumer surplus. In many such platforms prices are set by revenue-maximizing sellers, but the platform may influence prices through its promotion policy (e.g., increasing demand to a certain product by assigning to it a prominent position on the webpage), and the information it reveals about the additional demand associated with being promoted. Identifying effective platform's joint information design and promotion policies is a challenging dynamic problem as sellers can sequentially learn the promotion "value" from sales observations and update prices accordingly. We introduce the notion of confounding promotion polices, that are designed to prevent a Bayesian seller from learning the promotion value (at the cost of diverting customers away from the best product offering). Leveraging this notion, we characterize the maximum long-run average consumer surplus that is achievable by the platform when the seller is myopic. We then establish that long-run average optimality can be maintained by optimizing over a class of joint information design and promotion policies under which the platform provides the seller with a (random) information signal at the beginning of the horizon, and then uses the best confounding promotion policy, which prevents the seller from further learning. Additionally, we show that myopic pricing is a best response to such platform strategy, establishing an approximate Bayesian Nash equilibrium between the platform and the seller. Our analysis allows one to identify practical and effective platform policies in a broad range of demand models.

Sequential Procurement with Contractual and Experimental Learning.

With Gregory Macnamara and Daniela Saban.

We study the design of sequential procurement strategies that integrate stochastic and strategic information. We consider a buyer who repeatedly demands a certain good and is unable to commit to long-term contracts. In each time period, the buyer makes a price offer to a seller who has private, persistent information regarding his cost and quality of provision. If the offer is accepted, the seller provides the good with a stochastic quality that is not contractible. The buyer can therefore learn from the (strategic) acceptance decisions taken by the seller, and from the (stochastic) quality delivered whenever a purchase occurs. Hence, the buyer not only faces a tradeoff between exploration and exploitation, but also needs to decide how to explore: by facilitating quality observations, or by strategically separating seller types. We characterize the Perfect Bayesian Equilibria of this sequential interaction and show that the buyer’s equilibrium strategy consists of a dynamic sequence of thresholds on her belief about the seller’s type. In equilibrium, the buyer offers high prices that incentivize trade and quality experimentation at early stages of the interaction and, after gathering enough information, she may advance to offering low prices that partially separate seller types. Contrasting the buyer’s strategy with benchmark strategies designed to learn from each form of information in isolation, we identify the effect that strategic sellers may have on the buyer’s optimal strategy relative to more traditional learning dynamics, and establish that, paradoxically, when sellers are strategic, the ability to observe delivered quality is not always beneficial for the buyer.
  • finalist of Informs Revenue Management & Pricing Section 2019 student paper award (G. Macnamara)

Value Loss in Allocation Systems with Provider Guarantees.

With Dan A. Iancu and Xavier Warnes. 

Centralized planning systems routinely allocate tasks to workers or service providers in order to generate the maximum possible value. These allocations can also critically influence the service providers’ well-being, and thus the planning systems are often concerned with ensuring that their allocations satisfy particular desirable attributes. In some cases, such guarantees may induce a loss in the total value created or in the system’s share of that value. We provide a broad framework that allows quantifying the magnitude of such value losses due to provider guarantees. We consider a general class of guarantees that includes considerations of practical interest arising in the design of sustainable two-sided markets, workforce welfare and compensation, sourcing and payments in supply chains, among other application domains. We derive tight bounds on the relative value loss, and show that this loss is limited for any restriction included in our general class. Our analysis shows that when many providers are present, the largest losses are driven by fairness considerations; but when few providers are present, the main loss driver is the heterogeneity in the providers’ ability to generate value. We study additional loss drivers such as the variation in the value of jobs and the supply-demand balance, and find that less variability in value and a more balanced supply-demand ratio may lead to larger losses. Using both real-world and synthetic data we demonstrate the impact of loss drivers and show that the relative loss can be small in cases of practical interest.

Adaptive Sequential Experiments with Unknown Information Flows.

With Ahmadreza Momeni.

When information is limited, online recommendation services aim to strike a balance between maximizing immediate payoffs based on available information, and acquiring new information that is essential for maximizing future payoffs. This trade-off is captured by the multi-armed bandit (MAB) framework that has been studied and applied for designing sequential experiments when at each time epoch a single observation is collected on the action that was selected at that epoch. However, in many cases, additional information may become available between decision epochs. We introduce a generalized MAB formulation in which auxiliary information may appear arbitrarily over time. By obtaining matching lower and upper bounds, we characterize the minimax complexity of this family of problems as a function of the information arrival process, and study how salient characteristics of this process impact policy design and achievable performance. We establish that while Thompson sampling and upper-confidence bound policies leverage additional information naturally, other policies do not exhibit such robustness. We introduce a virtual time indexes method for dynamically controlling the exploration rate of policies, and apply it for designing additional policies that, without any prior knowledge on the information arrival process, attain the best performance that is achievable. We use data from a large media site to empirically analyze the value that may be captured in practice by leveraging auxiliary information flows for content recommendations. We further apply our method to a contextual bandit framework that allows for personalized recommendations using idiosyncratic consumer characteristics.
  • finalist of Informs Applied Probability Society 2018 student paper award (A. Momeni)
  • early version appeared in Advances in Neural Information Processing Systems 31 (NeurIPS 2018), S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett (editors), 7473-7482. Poster available here and 3-minutes video available here


Published Research Papers

Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-Stationary Rewards.

With Omar Besbes and Assaf Zeevi. Stochastic Systems, forthcoming.

Multi-armed bandit problems have been used for modelling a broad variety of practical settings where uncertainty on rewards associated with different actions requires the design of sequential experimentation. This class of problems have been studied extensively when the reward distributions do not change over time, and uncertainty is purely spatial and essentially amounts to identifying the optimal action. We complement this literature by developing a flexible nonparametric model for temporal uncertainty in the rewards. The extent of temporal uncertainty is measured via the cumulative change of mean rewards over the horizon, a metric we refer to as temporal variation, and regret is measured relative to a dynamic oracle that plays the pointwise optimal action at each instant in time. Assuming that nature can choose any sequence of mean rewards such that their temporal variation does not exceed some temporal uncertainty budget, we characterize the complexity of this problem via the minimax regret, which depends on this budget (the hardness of the problem), the horizon length, and the number of arms. When the uncertainty budget is not known a priori, we develop a family of policies that adapt to the realized variation in rewards and provide numerical evidence suggesting that these policies are nearly minimax optimal.
  • early version appeared in Advances in Neural Information Processing Systems 27 (NIPS 2014), Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence and K. Q. Weinberger (editors), 199-207

Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium.

With Santiago R. Balseiro. Management Science, 2019, 65 (9), 3952-3968. 

In online advertising markets, advertisers often purchase ad placements through bidding in repeated auctions based on realized viewer information. We study how budget-constrained advertisers may compete in such sequential auctions in the presence of uncertainty about future bidding opportunities and competition. We formulate this problem as a sequential game of incomplete information, where bidders know neither their own valuation distribution, nor the budgets and valuation distributions of their competitors. We introduce a family of practical bidding strategies we refer to as adaptive pacing strategies, in which advertisers adjust their bids according to the sample path of expenditures they exhibit, and analyze the performance of these strategies in different competitive settings. We establish the asymptotic optimality of these strategies when competitors’ bids are independent and identically distributed over auctions, but also when competing bids are arbitrary. When all the bidders adopt these strategies, we establish the convergence of the induced dynamics and characterize a regime (well motivated in the context of online advertising markets) under which these strategies constitute an approximate Nash equilibrium in dynamic strategies: the benefit from unilaterally deviating to other strategies, including ones with access to complete information, becomes negligible as the number of auctions and competitors grows large. This establishes a connection between regret minimization and market stability, by which advertisers can essentially follow approximate equilibrium bidding strategies that also ensure the best performance that can be guaranteed off equilibrium.

The Competitive Facility Location Problem for a Duopoly: Advances Beyond Trees.

With Daniela Saban and Nicolas E. Stier-Moses. Operations Research, 2018, 66 (4), 1058-1067. 

We consider a competitive facility location problem on a network where consumers located on vertices wish to connect to the nearest facility. Knowing this, each competitor locates a facility on a vertex, trying to maximize market share. We focus on the two-player case and study conditions that guarantee the existence of a pure-strategy Nash equilibrium for progressively more complicated classes of networks. For general graphs, we show that attention can be restricted to a subset of vertices referred to as the central block. By constructing trees of maximal bi-connected components, we obtain sufficient conditions for equilibrium existence. Moreover, when the central block is a vertex or a cycle (for example, in cactus graphs), this provides a complete and efficient characterization of equilibria. In that case, we show that both competitors locate their facilities in a solution to the 1-median problem, generalizing a well-known insight arising from Hotelling’s model. We further show that an equilibrium must solve the 1-median problem in other classes of graphs, including grids, which essentially capture the topology of urban networks. In addition, when both players select a 1-median, the solution must be at equilibrium for strongly-chordal graphs, generalizing a previously known result for trees.
  • online companion available here

Framework Agreements in Procurement: An Auction Model and Design Recommendations.

With Lijian Lu and Gabriel Y. Weintraub. Manufacturing & Service Operations Management, 2017, 19 (4), 586-603. 

Framework agreements (FAs) are procurement mechanisms commonly used by buying agencies around the world to satisfy demand that arises over a certain time horizon. This paper is one of the first in the literature that provides a formal understanding of FAs, with a particular focus on the cost uncertainty bidders face over the FA time horizon. We generalize standard auction models to include this salient feature of FAs and analyze this model theoretically and numerically. First, we show that FAs are subject to a sort of winner’s curse that in equilibrium induces higher expected buying prices relative to running first-price auctions as needs arise. Then, our results provide concrete design recommendations that alleviate this issue and decrease buying prices in FAs, highlighting the importance of (i) monitoring the price charged at the open market by the FA winner to bound the buying price; (ii) implementing price indexes for the random part of suppliers’ costs; and (iii) allowing suppliers the flexibility to reduce their prices to compete with the open market throughout the selling horizon. These prescriptions are already being used by the Chilean government procurement agency that buys US$2 billion worth of contracts yearly using FAs.
  • early version appeared in Charting a Course in Public Procurement, Innovation & Knowledge Sharing, International Public Procurement Conference Publications (practitioners' outlet), 2012
  • online companion available here and additional supporting material available here

Optimization in Online Content Recommendation Services: Beyond Click-Through Rates.

With Omar Besbes and Assaf Zeevi. Manufacturing & Service Operations Management, 2016, 18 (1), 15-33. 

A new class of online services allows Internet media sites to direct users from articles they are currently reading to other content they may be interested in. This process creates a “browsing path” along which there is potential for repeated interaction between the user and the provider, giving rise to a dynamic optimization problem. A key metric that often underlies this recommendation process is the click-through rate (CTR) of candidate articles. Whereas CTR is a measure of instantaneous click likelihood, we analyze the performance improvement that one may achieve by some lookahead that accounts for the potential future path of users. To that end, by using some data of user path history at major media sites, we introduce and derive a representation of content along two key dimensions: clickability, the likelihood to click to an article when it is recommended; and engageability, the likelihood to click from an article when it hosts a recommendation. We then propose a class of heuristics that leverage both clickability and engageability, and provide theoretical support for favoring such path-focused heuristics over myopic heuristics that focus only on clickability (no lookahead). We conduct a live pilot experiment that measures the performance of a practical proxy of our proposed class, when integrated into the operating system of a worldwide leading provider of content recommendations, allowing us to estimate the aggregate improvement in clicks per visit relative to the CTR-driven current practice. The documented improvement highlights the importance and the practicality of efficiently incorporating the future path of users in real time.

Non-Stationary Stochastic Optimization.

With Omar Besbes and Assaf Zeevi. Operations Research, 2015, 63 (5), 1227-1244. 

We consider a non-stationary variant of a sequential stochastic optimization problem, in which the underlying cost functions may change along the horizon. We propose a measure, termed variation budget, that controls the extent of said change, and study how restrictions on this budget impact achievable performance. We identify sharp conditions under which it is possible to achieve long-run average optimality and more refined performance measures such as rate optimality that fully characterize the complexity of such problems. In doing so, we also establish a strong connection between two rather disparate strands of literature: (1) adversarial online convex optimization and (2) the more traditional stochastic approximation paradigm (couched in a non-stationary setting). This connection is the key to deriving well-performing policies in the latter, by leveraging structure of optimal policies in the former. Finally, tight bounds on the minimax regret allow us to quantify the “price of non-stationarity,” which mathematically captures the added complexity embedded in a temporally changing environment versus a stationary one.
  • honorable mention at the Informs 2013 G. Nicholson student paper award
  • online companion available here