Publications
Selected peer-reviewed publications, proceedings, book chapters, and preprints in applied AI, mobility, computational social science, geospatial analytics, consumer behavior, and decision science.
Selected Publications
Let Me Share My Groupware: An Analysis of Groupware as a Remote Collaboration Medium
Fatih Oztank, Mohsen Bahrami, Selim Balcisoy
The increased popularity of remote work has led to wide adoption of videoconferencing for remote collaboration. However, videoconferencing has limitations on conveying non-verbal cues, which can reduce the overall collaboration experience. This study is an exploratory attempt to discover the potential of groupware tools as an alternative to videoconferencing due to their enhanced capabilities to transmit non-verbal cues. For this study, we implemented a groupware system and conducted a user study involving 60 participants to assess collaboration under three conditions: in-person, videoconferencing, and groupware. Our findings showed that groupware offered a better user experience and higher task success rate, highlighting its potential as a medium for remote collaboration.
NAICS-Aware Graph Neural Networks for Large-Scale POI Co-visitation Prediction: A Multi-Modal Dataset and Methodology
Yazeed Alrubyli, Omar Alomeir, Abrar Wafa, Diána Hidvégi, Hend Alrasheed, Mohsen Bahrami
Understanding where people go after visiting one business is crucial for urban planning, retail analytics, and location-based services. However, predicting these co-visitation patterns across millions of venues remains challenging due to extreme data sparsity and the complex interplay between spatial proximity and business relationships. Traditional approaches using only geographic distance fail to capture why coffee shops attract different customer flows than fine dining restaurants, even when co-located. We introduce NAICS-aware GraphSAGE, a novel graph neural network that integrates business taxonomy knowledge through learnable embeddings to predict population-scale co-visitation patterns. Our key insight is that business semantics, captured through detailed industry codes, provide crucial signals that pure spatial models cannot explain. The approach scales to massive datasets (4.2 billion potential venue pairs) through efficient state-wise decomposition while combining spatial, temporal, and socioeconomic features in an end-to-end framework. Evaluated on our POI-Graph dataset comprising 94.9 million co-visitation records across 92,486 brands and 48 US states, our method achieves significant improvements over state-of-the-art baselines: the R-squared value increases from 0.243 to 0.625 (a 157 percent improvement), with strong gains in ranking quality (32 percent improvement in NDCG at 10).
Customer Behavioral Shifts as a Result of the COVID-19 Pandemic: Are They “Sticky”?
Yilun Xu, Mohsen Bahrami, Alex Pentland
This study examines the COVID-19 pandemic's impact on customer preferences for shopping destinations and its persistence using a quantitative approach. It uses a modified version of the Huff gravity model to quantify temporal shifts in customer preferences when it comes to choosing a store for shopping purposes. The study uses large-scale mobility and place datasets and census information to analyze department stores in New York City. Using clustering techniques and statistical inference models, this study estimates the immediate and long-term impact of the COVID-19 pandemic on customer behavioral shifts, highlighting the heterogeneity of response among different socioeconomic communities. The findings of this research, show that the proposed model effectively captures the dynamics of shopping location decisions and temporal visit patterns, allowing managers and marketers to understand customer preferences and adapt strategies accordingly. Retailers should use quantitative methods to update assumptions based on new consumption patterns, rather than expecting a complete return to pre-pandemic consumer behavior.
Effect of mobile food environments on fast food visits
Bernardo Garcia Bulle Bueno, Abigail L Horn, Brooke M Bell, Mohsen Bahrami, Burcin Bozkaya, Alex Pentland, Kayla De la Haye, Esteban Moro Egido
Poor diets, including those high in fast food, are a leading cause of morbidity and mortality. Exposure to low-quality food environments, such as ‘food swamps’ saturated with fast food outlets (FFO), is hypothesized to negatively impact diet and related disease. However, research linking such exposure to diet and health outcomes has generated mixed findings and led to unsuccessful policy interventions. A major research limitation has been a predominant focus on static food environments around the home, such as food deserts and swamps, and sparse availability of information on mobile food environments people are exposed to and food outlets they visit as they move throughout the day. In this work, we leverage population-scale mobility data to examine peoples’ visits to food outlets and FFO in and beyond their home neighborhoods and to evaluate how food choice is influenced by features of food environments people are exposed to in their daily routines vs. individual preference. Using a semi-causal framework and various natural experiments, we find that 10% more FFO in an area increases the odds of people visiting a FFO by approximately 20%. This strong influence of the food environment happens similarly during weekends and weekdays, is largely independent of individual income. Using our results, we investigate multiple intervention strategies to food environments to promote reduced FFO visits. We find that optimal locations for intervention are a combination of where i) the prevalence of FFO is the highest, ii) most decisions about food outlet visits are made, and most importantly, iii) visitors’ food decisions are most susceptible to the environment. Multi-level interventions at the individual behavior- and food environment-level that target areas combining these features could have 1.7x to 4x larger effects than traditional interventions that alter food swamps or food deserts.
Investigating neighborhood adaptability using mobility networks: a case study of the COVID-19 pandemic
Hasan Alp Boz, Mohsen Bahrami, Selim Balcisoy, Burcin Bozkaya, Nina Mazar, Aaron Nichols, Alex Pentland
What predicts a neighborhood's resilience and adaptability to essential public health policies and shelter-in-place regulations that prevent the harmful spread of COVID-19? To answer this question, in this paper we present a novel application of human mobility patterns and human behavior in a network setting. We analyze mobility data in New York City over two years, from January 2019 to December 2020, and create weekly mobility networks between Census Block Groups by aggregating Point of Interest level visit patterns. Our results suggest that both the socioeconomic and geographic attributes of neighborhoods significantly predict neighborhood adaptability to the shelter-in-place policies active at that time. That is, our findings and simulation results reveal that in addition to factors such as race, education, and income, geographical attributes such as access to amenities in a neighborhood that satisfy community needs were equally important factors for predicting neighborhood adaptability and the spread of COVID-19. The results of our study provide insights that can enhance urban planning strategies that contribute to pandemic alleviation efforts, which in turn may help urban areas become more resilient to exogenous shocks such as the COVID-19 pandemic.
Predicting merchant future performance using privacy-safe network-based features
Mohsen Bahrami, Hasan Alp Boz, Yoshihiko Suhara, Selim Balcisoy, Burcin Bozkaya, Alex Pentland
Small and Medium-sized Enterprises play a significant role in most economies by contributing to job creation and economic growth. A majority of such merchants rely on business financing, and thus, financial institutions and investors need to assess their performance before making decisions on business loans. However, current methods of predicting merchants’ future performance involve their private internal information, such as revenue and customer base, which cannot be shared without potentially exposing critical information. To address this problem, we first propose a novel approach to predicting merchants’ future performance using credit card transaction data. Specifically, we construct a merchant network, regarding customers as bridges between merchants, and extract features from the constructed network structure for prediction purposes. Our study results demonstrate that the performance of machine learning models with features extracted from our proposed network is comparable to those with conventional revenue- and customer-based features, while maintaining higher privacy levels when shared with third-party organizations. Our approach offers a practical solution to privacy concerns over data and information required for merchants’ performance prediction, enabling safe data-sharing among financial institutions and investors, helping them make more informed decisions on allocating their financial resources while ensuring that merchants’ sensitive information is kept confidential.
Disaggregating sales prediction: A gravitational approach
Carla Freitas Silveira Netto, Mohsen Bahrami, Vinicius Andrade Brei, Burcin Bozkaya, Selim Balcisoy, Alex Pentland
Whenever companies plan to enter new geographical areas, they need disaggregated sales in each location. To make such predictions, sales time series or final customers’ data in geographical disaggregation are necessary. However, for most companies, such datasets are unavailable or impractical. The manuscript has two main goals. One focal problem is how to disaggregate an aggregate sales prediction with no historical proportions. The other is how to improve spatial models using Point of Interest (POI) data. To solve these problems, we combine two literature streams — spatial marketing and sales forecasting — and propose a new hybrid probabilistic approach: Gravitational Sales Prediction (GSP). Our approach uses POI data to estimate area attraction, customer stocks, and flows to predict sales proportions. We later use these proportions to disaggregate an aggregate forecast. GSP is validated using sales data from two countries and more than ten economic segments. When compared to a strong benchmark that relies on past sales proportions, GSP exceeded expectations by achieving not only a similar performance to the benchmark but also outperforming it in some locations. It showed the most promising results in the middle level of aggregation. The result is a powerful and flexible approach that can be embedded in any decision support system.
Social Behavior and COVID-19: Analysis of the Social Factors behind Compliance with Interventions across the United States
Morteza Maleki, Mohsen Bahrami, Monica Menendez, Jose Balsa-Barreiro
Since its emergence, COVID-19 has caused a great impact in health and social terms. Governments and health authorities have attempted to minimize this impact by enforcing different mandates. Recent studies have addressed the relationship between various socioeconomic variables and compliance level to these interventions. However, little attention has been paid to what constitutes people’s response and whether people behave differently when faced with different interventions. Data collected from different sources show very significant regional differences across the United States. In this paper, we attempt to shed light on the fact that a response may be different depending on the health system capacity and each individuals’ social status. For that, we analyze the correlation between different societal (i.e., education, income levels, population density, etc.) and healthcare capacity-related variables (i.e., hospital occupancy rates, percentage of essential workers, etc.) in relation to people’s level of compliance with three main governmental mandates in the United States: mobility restrictions, mask adoption, and vaccine participation. Our aim was to isolate the most influential variables impacting behavior in response to these policies. We found that there was a significant relationship between individuals’ educational levels and political preferences with respect to compliance with each of these mandates.
Voyage Viewer: Empowering Human Mobility at a Global Scale
Isabella Loaiza, Tobin South, Germán Sánchez, Serena Chan, Alice Yu, Felipe Montes, Mohsen Bahrami, Alex Pentland
The challenge of refugee relocation is fertile ground to pose a new direction in the quest for extended human intelligence: developing systems that leverage big data, and the power of social learning to provide personalized visual analytics for big life decisions. To probe into this new avenue, this paper presents Voyage Viewer, a novel open-access multi-stream data dashboard called Voyage Viewer. It helps individuals make their own relocation and migration decisions given personalized queries and visualizations, which stands in contrast to previous top-down approaches that use algorithms to match individuals and places, as is the case for some refugee relocation programs. Voyage Viewer hopes to foster social learning between community members to improve the match between migrants and their potential new communities so that both can reap the benefits of the move.
Using gravity model to make store closing decisions: A data driven approach
Mohsen Bahrami, Yilun Xu, Miles Tweed, Burcin Bozkaya, Alex Pentland
Many studies propose methods for finding the best location for new stores and facilities, but few studies address the store closing problem. As a result of the recent COVID-19 pandemic, many companies have been facing financial issues. In this situation, one of the most common solutions to prevent loss is to downsize by closing one or more chain stores. Such decisions are usually made based on single-store performance; therefore, the under-performing stores are subject to closures. This study first proposes a multiplicative variation of the well-known Huff gravity model and introduces a new attractiveness factor to the model. Then a forward–backward approach is used to train the model and predict customer response and revenue loss after the hypothetical closure of a particular store from a chain. In this research the department stores in New York City are studied using large-scale spatial, mobility, and spending datasets. The case study results suggest that the stores recommended being closed under the proposed model may not always match the single store performance, and emphasizes the fact that the performance of a chain is a result of interaction among the stores rather than a simple sum of their performance considered as isolated and independent units. The proposed approach provides managers and decision-makers with new insights into store closing decisions and will likely reduce revenue loss due to store closures.
Post-Pandemic Economic Transformations in the United States of America
Avi Chawla, Nidhi Mulay, Mohsen Bahrami, Vikas Bishnoi, Yatin Katyal, Esteban Moro, Ankur Saraswat, Alex Pentland
The COVID-19 pandemic has impacted economic activity not only in the United States, but across the globe. Lockdown and travel restrictions imposed by local authorities have led to change in customer preferences and thus transformation of economic activity from traditional areas to new regions. While most changes have been temporary and short term, some of them have been observed to be of permanent nature. Using large-scale aggregated and anonymized transaction data across various socio-economic groups, we analyse and discuss such temporary relocation of citizens' economic activities in metropolitan areas of 15 states in the US. The results of this study have extensive implications for urban planners and business owners, and can provide insights into the temporary relocation of economic activities resulting from an extreme exogenous shock like the COVID-19 pandemic.
FMSClusterFinder: A new tool for detection and identification of clusters of sequential motifs with varying characteristics inside genomic sequences
Mohammad Mahdi Hejazi, Faegheh Golabi, Mohsen Bahrami, Houman Kahroba, Mohammad Saeid Hejazi
This paper describes FMSClusterFinder, a new tool and algorithm for identification and detection of clusters of sequential blocks inside the DNA and RNA subject sequences. Gene expression and genomic groups’ performance is under the control of functional elements cooperating with each other as clusters. The functional motifs or blocks are often comparably short, degenerate and are located within varying distances from each other. Since functional motifs mostly act in relation to each other as clusters, finding such clusters of blocks is an effective approach to identify functional groups and their function and structure, which represents the need for development of new corresponding tools. The presented web application finds clusters of sequential blocks, with even altering sequences and located in varying distances from each other inside the subject sequences, simultaneously. Additionally, the blocks could be searched with user defined constant or varying characteristics such as: a) different levels of similarity, b) varying minimum number of blocks required to build up the query cluster, c) different types of sequence (degenerate or standard) and d) one or multiple alternative sequences for each block.
Effects of stimulus checks on spending patterns of different economic groups
Nidhi Mulay, Vikas Bishnoi, Yatin Katyal, Mohsen Bahrami, Esteban Moro, Ankur Saraswat, Alex Pentland
This paper uses daily anonymous aggregated trans-action data to analyze the changes in consumer spending caused by receipt of the stimulus payments in the United States during the COVID-19 pandemic. The stimulus checks were provided as part of the CARES Act aiming to provide emergency assistance for individuals and businesses affected by the pandemic. We analyze the impact of the receipt of those payments on the aggregated daily spending of different socio-economic groups and industries. We show that the transaction patterns of low spending consumers were most impacted by the stimulus payments among different spending groups. Our study results also indicate that the consumer responses after the first stimulus check (April 2020) were substantial and significant on industries that sell daily essential items, whereas consumer responses after the third stimulus check (March 2021) were significant in non-essential goods (e.g. luxury and entertainment sector). The results of this study are of crucial importance because they could help policy makers better shape stimulus payments that may be needed in future emergencies.
Investigating mobility-based fast food outlet visits as indicators of dietary intake and diet-related disease
Abigail L Horn, Brooke M Bell, Bernardo Garcia Bulle Bueno, Mohsen Bahrami, Burcin Bozkaya, Yan Cui, John P Wilson, Alex Pentland, Esteban Moro, Kayla de la Haye
Food environments, where people acquire and consume food, impact diet and related diseases (i.e., nutritional health). To date, research has focused on predefined local and static food environments, largely of the home neighborhood. Their features (e.g., the availability of fast food outlets) can predict nutritional health although findings are mixed. A growing proportion of food acquisition occurs miles from our homes, therefore the limited focus on static food environments may be one cause of these mixed results. A major gap in the literature is evidence of the dynamic food environments people are exposed to in their daily routines (i.e., their “activity space”), the food outlets they visit, and how these mobile food environments impact dietary intake and health. With the availability of big data on human mobility (i.e., geolocations captured by people’s smartphones), population-level research on the food outlets that people have access to and visit given their daily movements is now possible. Some studies have begun to use GPS tracking technologies to continuously observe how people navigate their environment to acquire food over relatively brief time periods (i.e., 1 week). However, to our knowledge, large-scale mobility data has not been used to study food environments and their connection with nutritional health. This study undertakes a critical first step in this line of research: investigating whether visits to food outlets observed in population-level mobility data provide meaningful indicators of dietary intake and diet-related disease. We focus these analyses on fast food (FF) outlets specifically because FF intake is linked to disease risk, makes up 16% of Americans’ caloric intake , and because FF outlets are well-distributed across food environments. We utilize a large mobility data set from Los Angeles County (LAC), U.S.A., to generate neighborhood-level measures of visits to FF outlets. The first objective was to determine whether visits to FF outlets from population mobility data are a meaningful indicator of individuals’ self-reported FF intake. The second objective was to determine whether visits to FF outlets (mobility data) are a meaningful predictor of individuals’ obesity and diabetes, and a comparable or better predictor than self-reported FF intake.
Validating Gravity-Based Market Share Models Using Large-Scale Transactional Data
Yoshihiko Suhara, Mohsen Bahrami, Burçin Bozkaya, Alex Pentland
Customer patronage behavior has been widely studied in market share modeling contexts, which is an essential step toward estimating retail sales and finding new store locations in a competitive setting. Existing studies have conducted surveys to estimate merchants' market share and factors of attractiveness to use in various proposed mathematical models. Recent trends in Big Data analysis allow us to better understand human behavior and decision making, potentially leading to location models with more realistic assumptions. In this article, we propose a novel approach for validating the Huff gravity market share model, using a large-scale transactional dataset that describes customer patronage behavior at a regional level. Although the Huff model has been well studied and widely used in the context of sales estimation, competitive facility location, and demand allocation, this article is the first in validating the Huff model with a real dataset. Our approach helps to easily apply the model in different regions and with different merchant categories. Experimental results show that the Huff model fits well when modeling customer shopping behavior for a number of shopping categories, including grocery stores, clothing stores, gas stations, and restaurants. We also conduct regression analysis to show that certain features such as gender diversity and marital status diversity lead to stronger validation of the Huff model. We believe we provide strong evidence, with the help of real-world data, that gravity-based market share models are viable assumptions for retail sales estimation and competitive facility location models.
Economic outcomes predicted by diversity in cities
Shi Kai Chong, Mohsen Bahrami, Hao Chen, Selim Balcisoy, Burcin Bozkaya, Alex Pentland
Much recent work has illuminated the growth, innovation, and prosperity of entire cities, but there is relatively less evidence concerning the growth and prosperity of individual neighborhoods. In this paper we show that diversity of amenities within a city neighborhood, computed from openly available points of interest on digital maps, accurately predicts human mobility (“flows”) between city neighborhoods and that these flows accurately predict neighborhood economic productivity. Additionally, the diversity of consumption behaviour or the diversity of flows together with geographic centrality and population density accurately predicts neighborhood economic growth, even after controlling for standard factors such as population, etc. We develop our models using geo-located purchase data from Istanbul, and then validate the relationships using openly available data from Beijing and several U.S. cities. Our results suggest that the diversity of goods and services within a city neighborhood is the largest single factor driving both human mobility and economic growth.
Using Behavioral Analytics to Predict Customer Invoice Payment
Mohsen Bahrami, Burcin Bozkaya, Selim Balcisoy
Experiences from various industries show that companies may have problems collecting customer invoice payments. Studies report that almost half of the small- and medium-sized enterprise and business-to-business invoices in the United States and United Kingdom are paid late. In this study, our aim is to understand customer behavior regarding invoice payments, and propose an analytical approach to learning and predicting payment behavior. Our logic can then be embedded into a decision support system where decision makers can make predictions regarding future payments, and take actions as necessary toward the collection of potentially unpaid debt, or adjust their financial plans based on the expected invoice-to-cash amount. In our analysis, we utilize a large data set with more than 1.6 million customers and their invoice and payment history, as well as various actions (e.g., e-mail, short message service, phone call) performed by the invoice-issuing company toward customers to encourage payment. We use supervised and unsupervised learning techniques to help predict whether a customer will pay the invoice or outstanding balance by the next due date based on the actions generated by the company and the customer's response. We propose a novel behavioral scoring model used as an input variable to our predictive models. Among the three machine learning approaches tested, we report the results of logistic regression that provides up to 97% accuracy with or without preclustering of customers. Such a model has a high potential to help decision makers in generating actions that contribute to the financial stability of the company in terms of cash flow management and avoiding unnecessary corporate lines of credit.
A time-based model and GIS framework for assessing hazardous materials transportation risk in urban areas
Ronay Ak, Mohsen Bahrami, Burcin Bozkaya
Every day, trucks carrying hazardous materials (hazmat) in a large and densely populated city expose public health risks to the residents of the city as well as risks to the economic assets in the area. In this paper, we introduce a new risk model that considers population exposure along a route and the duration of such exposure, the latter being variable due to the congested nature of road transportation in urban areas. We have developed a spatial decision support system (SDSS) for quantitative risk analysis and the calculation of minimum-risk paths as well as their visualization on digital maps. We illustrate the usage of our proposed model and SDSS via a case study using the real road network of Istanbul, Turkey, a large metropolitan area with more than 15 million residents. We analyze Istanbul's hazmat transportation risk profile using risk analysis techniques via our interactive Geographical Information Systems (GIS)-based decision support system. We then produce a risk map of the city and run several routing scenarios between selected hazmat shipment origins and destinations. Results suggest that the hazmat routes under the new model may not always match with how hazmat is usually transported, with an economic decision-making perspective, in an urban area. Our proposed approach provides decision makers with new insights into urban hazmat transportation and is likely to reduce the consequences of incidents with large impact on public health.
An Exploratory Visual Analytics Tool for Multivariate Dynamic Networks
Hasan Alp Boz, Mohsen Bahrami, Yoshihiko Suhara, Burcin Bozkaya, Selim Balcisoy
Visualizing multivariate dynamic networks is a challenging task. The evolution of the dynamic network within the temporal axis must be depicted in conjunction with the associated multivariate attributes. In this paper, an exploratory visual analytics tool is proposed to display multivariate dynamic networks with spatial attributes. The proposed tool displays the distribution of multivariate temporal domain and network attributes in scattered views. Moreover, in order to expose the evolution of a single or a group of nodes in the dynamic network along the temporal axis, an egocentric approach is applied in which a node is represented with its neighborhood as an ego-network. This approach allows users to observe a node's surrounding environment along the temporal axis. On top of the traditional ego-network visualization methods, such as timelines, the proposed tool encodes ego-networks as feature vectors consisting of the domain and network attributes and projects them onto 2D views. As a result, the distance between projected ego-networks represents the dissimilarity across the temporal axis in a single view. The proposed tool is demonstrated with a real-world use case scenario on merchant networks obtained from a one-year-long credit card transactions.
Measuring fine-grained multidimensional integration using mobile phone metadata: the case of Syrian refugees in Turkey
Michiel A Bakker, Daoud Piracha, Patricia Lu, Keis Bejgo, Mohsen Bahrami, Yan Leng, Jose Balsa-Barreiro, Julie Ricard, Alfredo Morales, Vivek K Singh, Burcin Bozkaya, Selim Balcisoy, Alex Pentland
The current Syrian civil war has led to a mass migration of Syrian refugees into Turkey. As the Syrian conflict has intensified and lengthened, many refugees have faced challenges integrating into their host societies. Here we introduce and evaluate different measures extracted from mobile phone metadata to study integration of refugees along three dimensions: (1) social integration, (2) spatial integration, and (3) economic integration through signatures of employment activity. We use these measures to compare integration across different regions in Turkey and find striking differences both in the distributions of these dimensions and the relations between them. Finally, leveraging the results from two general elections in Turkey in 2015 and 2018, we confirm earlier findings concerning the impact of refugee presence on voting behavior and demonstrate that we can better explain voting behavior by incorporating integration metrics.
Twitter Reveals: Using Twitter Analytics to Predict Public Protests
Mohsen Bahrami, Yasin Findik, Burcin Bozkaya, Selim Balcisoy
The right to protest is perceived as one of the primary civil rights. Citizens participate in mass demonstrations to express themselves and exercise their democratic rights. However, because of the large number of participants, protests may lead to violence and destruction, and hence can be costly. Thus, it is important to predict such demonstrations in advance to safeguard against such damages. Recent research has shown that about 75 percent of protests that are regarded as legal, are planned in advance. Twitter, the prominent micro-blogging website, has been used as a tool by protestors for planning, organizing, and announcing many of the recent protests worldwide such as those that led to the Arab Spring, Britain riots, and those against Mr. Trump after the presidential election in the U.S. In this paper, we aim to predict protests by means of machine learning algorithms. In particular, we consider the case of protests against the then-president-elect Mr. Trump after the results of the presidential election were announced in November 2016. We first identify the hashtags calling for demonstration from Trending Topics on Twitter, and download the corresponding tweets. We then apply four machine learning algorithms to make predictions. Our findings indicate that Twitter can be used as a powerful tool for predicting future protests with an average prediction accuracy of over 75 percent (up to 100 percent). We further validate our model by predicting the protests held in the U.S. airports after President Trump's executive order banning citizens of seven Muslim countries from entering the U.S. An important contribution of our study is the inclusion of event specific features for prediction purposes which helps to achieve high levels of accuracy.