Computational Social Science

Mentor Areas

I am broadly interested in topics at the intersection of social science and computer science, with a particular interest in computational studies of media, collective intelligence, network diffusion, and common sense.

Description:

There are 4 main research areas undergraduates can contribute to:

1. Media Analytics: Large-Scale, Shared Data for the Study of the Information Ecosystem

Objectives: The goal of this project is to understand the information ecosystem across media (the web and TV). In parallel we hope to design, build, and test tools and services to improve the state of public knowledge and discourse.

Opportunities: Research opportunities vary, including data engineering/preprocessing/cleaning to statistical modeling and more advanced text processing at scale. Starting from raw data, the first step of the project is transferring the text data into efficient, and searchable structured data in AWS databases. Data analytics and design of websites for various media-related research questions is also another line of research.

2. Quantifying Commonsense

Objectives: Commonsense is challenging to define generally, yet widely used as a persuasive framing, and simultaneously as a focal concern in the development of artificial intelligence systems. This leads to assumptions and biases around what others hold to be commonsense and contributes to a potentially misplaced trust in this notion. We are measuring how common commonsense actually is through online experiments, and further trying to predict individuals' level of commonsense based on who they are and what else they believe.

Opportunities: Help build components of this experiment and contribute to the data processing and analysis.

3. High throughput virtual lab experiments for group dynamics

Objectives: Traditional social scientific experiments measure a small set of conditions at one time. Each experiment, however, makes many (often unique) assumptions about things it is not studying. As a consequence two experiments measuring the same conditions (but with different assumptions) might see different results. This is particularly problematic when studying team performance because team experiments are subject to more variance than individual experiments. We are studying team performance while manipulating many variables at once (including those others usually make assumptions about). By systematically running repeated experiments with minor variations in these variables, the underlying relationship of individual factors contributing to team performance can be better understood.

Opportunities: Help select variables, run experiments and perform analysis.

4. Network for Open Mobility Analysis and Data (NOMAD)

Objective: NOMAD is a data platform for open mobility analysis at Penn. We build open-source tools and a secure Trusted Research Environment with a curated data catalog to process large-scale GPS traces for social science and public health. The platform provides documented, reproducible pipelines—ingestion, quality control, spatiotemporal transforms, mobility metrics, and privacy-preserving aggregation—accessible both as Python code and through simple web interfaces. This is an multi-faceted open-science project with the goal of increasing access and facilitating analysis of GPS human mobility datasets. By participating in this research project you will have the opportunity to learn how to process massive spatio-temporal data, front-end development, as well as an LLM-related project to automatically classify scientific literature in the field of human mobility science.

Opportunities: Research assistants help extend the NOMAD library and website by implementing and testing modules for mobility data processing, generation of synthetic datasets, dashboards, collaborations with local government agencies for disaster preparedness, epidemiology, sustainable development, and others. The software has an emphasis on scalable computing, so methods should run identically on a laptop or a Spark cluster. Students will be involved in assisting the deployment of a pilot study in which access to sensitive data is given to selected researchers who will be provided with software and infrastructure resources.

The role offers hands-on exposure to front- and back-end development (JavaScript/React, Node, DataBases, LLMs, Python; PySpark), software testing with Cypress, and analysis of spatial and geometry data. Students engage with research on group behavior, human mobility, and epidemic modeling while learning practical development processes in a collaborative lab. The emphasis throughout is on shipping transparent, well-tested tools that other researchers can use and reproduce.