Large-scale data analysis has opened up opportunities in a wide variety of social, scientific, and technological areas and led to a variety of innovations [1, 2]. Yet the increasing focus on purely statistical or computational approaches may fail to capture social nuances, affective relationships, or ethical, value-driven, and other human-centered concerns.
Small scale, qualitative approaches to data collection and analysis offer researchers the opportunity to obtain very rich, deep insights about very specific phenomena —often in a very bounded or limited context . Such studies often face challenges related to generalization, extension, verification, and validation. On the other hand, large scale, quantitative approaches to data collection and analysis offer researchers access to broad assemblages of data, but the insights gleaned are often much more shallow—lacking the rich detail associated with deep study .
But what happens as qualitative data sets grow ever larger? With the ease of collecting qualitative data such as social media text and multimedia photos and videos, such data sets are becoming an increasing challenge to analyze with the same level of detail and depth. How do we preserve the richness associated with traditional qualitative techniques in data-driven research? How can we be sure not to lose the compelling and inspiring stories of individuals in the sea of aggregated data at scale?
There are clear advantages of each perspective—one can choose methods and techniques which facilitate deep, but narrow analysis, or one can be broad, but shallow [5, 6, 7]. Various techniques allow researchers to track down traces of human behaviors, but affective elements and social context might not be well represented. Human interpretation of data is, in the end, still necessary [8, 9, 10, 11].
Human-centered data science includes opportunities for researchers of both qualitative and quantitative traditions. Researchers have addressed this trend and attempted to integrate quantitative research methods into a qualitative research workflow [9, 12, 13]. Digital or virtual ethnography  has gained widespread adoption as qualitative researchers adapt traditional ethnographic methods to online spaces. Data science tools that integrate seamlessly into the sociotechnical ecosystem of the domain they are designed for have demonstrated the greatest success. Human-centered design is particularly effective in the development of software for the analysis of large data sets.
Among the many unanswered questions surrounding human-centered data science include issues of sampling, selection, and privacy. What are the ethical questions raised by the necessity to process vast data sets? How should we treat crowdworkers? Who owns personal medical data, the company whose machines and software collect it, or the individual who generates it? Can design be effectively crowdsourced? What are the policies we need to develop to protect human rights in this new age of “big data”?
The questions are legion and we are only beginning to explore the territory of potential answers [5, 14, 15, 16, 17].
We welcome researchers interested in exploring how data-driven and qualitative research can be integrated to address complex questions in a diverse range of areas, including but not limited to social computing, urban, health, or crisis informatics, scientific, business, policy, technical, and other fields. Researchers and practitioners working with large data sets and/or qualitative data sets looking to expand their methodological toolbox are invited to participate and share their experiences while learning from the broader community.
- Julia Gluesing, Kenneth Riopelle, and James Danowski. 2014. Mixing Ethnography and Information Technology Data Mining to Visualize Innovation Networks in Global Networked Organizations.Mixed Methods Social Networks Research: Design and Applications, 36, 203.
- Markus Luczak-Roesch, Ramine Tinati, Kieron O’Hara, and Nigel Shadbolt. (February, 2015). Socio-technical computation. InProceedings of the 18th ACM Conference Companion on Computer Supported Cooperative Work & Social Computing (pp. 139-142). ACM.
- Kai Zheng, David Hanauer, Nadir Weibel, and Zia Agha. 2015. Computational Ethnography: Automated and Unobtrusive Means for Collecting Data In Situ for Human–Computer Interaction Evaluation Studies. InCognitive Informatics for Biomedicine (pp. 111-140). Springer International Publishing.
- Tera Marie Green, Richard Arias-Hernandez, R., and Brian Fisher. 2014. Individual Differences and Translational Science in the Design of Human-Centered Visualizations. In Handbook of Human Centric Visualization(pp. 93-113). Springer New York.
- danah boyd, and Kate Crawford. 2012. Critical questions for big data: Provocations for a cultural, technological, and scholarly, phenomenon. Information, Communication & Society,15(5), 662-679.
- Michael Brooks, John Robinson, Megan Torkildson, and Cecilia Aragon, 2014. Collaborative Visual Analysis of Sentiment in Twitter Events. InCooperative Design, Visualization, and Engineering (pp. 1-8). Springer International Publishing.
- David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. The Parable of Google Flu: Traps in Big Data Analysis. Science, 343(6176), 1203-1205.
- David Brooks. (2013, February 18). What Data Can’t Do. The New York Times. Retrieved from http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html
- Wanli Xing, Rui Guo, Eva Petakovic, and Sean Goggins. 2015. Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory.Computers in Human Behavior, 47, 168-181.
- Z-Q Liu, and Sadaaki Miyamoto. (Eds.). 2012. Soft computing and human-centered machines. Springer Science & Business Media.
- Wanli Xing, Bob Wadholm, Eva Petakovic, and Sean Goggins, S. 2015. Group Learning Assessment: Developing a Theory-Informed Analytics. Journal of Educational Technology & Society, 18(2), 110-128.
- Dhiraj Murthy. (2011). Emergent Digital Ethnographic Methods for Social Research. In S. Hesse-Biber (Ed.), Handbook of Emergent Technologies in Social Research (pp. 158–179). New York: Oxford University Press.
- Ashok Goel and Michael Helms. 2014. Theories, Models, Programs, and Tools of Design: Views from Artificial Intelligence, Cognitive Science, and Human-Centered Computing. InAn Anthology of Theories and Models of Design (pp. 417-432). Springer London.
- France Bélanger, and Robert Crossler. 2011. Privacy in the digital age: a review of information privacy research in information systems.MIS quarterly, 35(4), 1017-1042.
- Sangita Ganesh, and Rani Malhotra. 2014. Designing to scale: A human centered approach to designing applications for the Internet of Things. In Advanced Computing and Communications (ADCOM), 2014 20th Annual International Conference on(pp. 26-28). IEEE.
- Yang Wang, Yun Huang, and Claudia Louis. (September, 2013). Towards A Framework for Privacy-Aware Mobile Crowdsourcing. In Social Computing (SocialCom), 2013 International Conference on (pp. 454-459). IEEE.
- Yang Wang, Huichuan Xia, and Yun Huang. 2016. Examining American and Chinese Internet Users’ Contextual Privacy Preferences of Behavioral Advertising. To appear in Proceedings of the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2016)