THE EMERGENCE OF "DATA ANALYSIS" AS A SCIENTIFIC DISCIPLINE

Keywords: data analysis, data science, data mining, R, SPSS, statistics

Abstract

The article describes the evolution of data analysis from traditional statistics to data science. Starting with Peter Huber's assertion about the empirical nature of data analysis, where the researcher emphasizes that this stage of development cannot be defined as a new scientific paradigm but rather as a tendency unified under the name «data science». The main focus is on the contributions of John Tukey, who first expressed ideas that laid the foundation for data analysis. The article explores the concepts of «confirmatory» and «exploratory» data analysis, defines their goals and differences, and emphasizes the importance of alternating between these stages in the research process. Tukey's principles for contemporary data analysis, such as «maximum insight into the data» and «visualization of patterns», are considered key approaches for discovering new knowledge. Tukey's works sparked significant debates among statisticians, and his views on data analysis shocked the academic community. The impact of Tukey's works on the development of data science over half a century is examined, including comments from the renowned statistician P. Huber. An essential emphasis is placed on the influence of computational environments on the development of data analysis. The role of various statistical packages and software environments, such as BMDP, SPSS, SAS, Minitab, S, STATA, and R, in the evolution of data analysis is discussed. Their impact is assessed through the analysis of word frequencies in the literature, highlighting that R is currently the dominant programming environment in academic statistics with a large number of enthusiasts. The use of scripts to precisely codify computation steps is noted, and these changes are seen as altering the rules of the game, making the expression «scientific approach to data analysis» more evident, aligning with Tukey's assertion about the possibilities of studying data analysis as a science.

References

Huber P.J. Data Analysis: What Can Be Learned From the Past 50 Years. John Wiley & Sons, 2011.

Tukey J.W. The future of data analysis. Annals of Mathematical Statistics. 1962. Vol. 33. № 1. Р. 1–67.

Donoho D. 50 Years of Data Science. Journal of Computational and Graphic Statistics. 2017. No 26(4). Pp. 745–766. DOI: https://doi.org/10.1080/10618600.2017.1384734 (дата звернення: 08.11.2023).

Mosteller F., Tukey J.W. Data Analysis, Including Statistics. Handbook of Social Psychology / Eds. G. Lindzey, E. Aronson. Vol. 2. Reading, MA : Addison-Wesley, 1968. P. 80–203.

Chambers J.M. Greater or Lesser Statistics: A Choice for Future Research. Statistics and Computing. 1993. No. 3. P. 182–184.

Cleveland W.S. Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics. International Statistical Review. 2001. No. 69. P. 21–26.

Brillinger D.R., Fernholz L.T., Morgenthaler S. The Practice of Data Analysis: Essays in Honor of John W. Tukey. Princeton, New Jersey : Princeton University Press, 1997. 352 р.

Dempster A.P. John W. Tukey as «philosopher». Annals of Mathematical Statistics. 2002. Vol. 30. № 6. Р. 1619–1628. URL: http://surl.li/ntixf (дата звернення: 08.11.2023).

Kafadar К. John Tukey and Robustness. Statistical Science. 2003. Vol. 18. № 3. Р. 319–331. URL: http://surl.li/ntixn (дата звернення: 08.11.2023).

Кислова О.Н. Интеллектуальный анализ данных: история становления термина. Український соціологічний журнал. 2011. № 1–2. С. 83–94. URL: http://surl.li/ntixs (дата звернення: 08.11.2023).

Google’s N-grams viewer. URL: http://surl.li/ntiyc (дата звернення: 08.11.2023).

Google’s N-grams viewer. URL: http://surl.li/ntiyj (дата звернення: 08.11.2023).

Huber P. J. (2011) Data Analysis: What Can Be Learned From the Past 50 Years. John Wiley & Sons.

Tukey J. W. (1962) The future of data analysis. Annals of Mathematical Statistics, vol. 33. no. 1, pp. 1–67.

Donoho D. (2017) 50 Years of Data Science. Journal of Computational and Graphic Statistics, no. 26(4), pp. 745–766. DOI: https://doi.org/10.1080/10618600.2017.1384734

Mosteller F. & Tukey J. W. (1968) Data Analysis, Including Statistics. Handbook of Social Psychology / Eds. G. Lindzey, E. Aronson, vol. 2. Reading, MA: Addison-Wesley.

Chambers J. M. (1993) Greater or Lesser Statistics: A Choice for Future Research. Statistics and Computing, no. 3, pp. 182–184.

Cleveland W. S. (2001) Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics. International Statistical Review, no. 69, pp. 21–26.

Brillinger D. R., Fernholz L. T. & Morgenthaler S. (1997) The Practice of Data Analysis: Essays in Honor of John W. Tukey. Princeton, New Jersey: Princeton University Press.

Dempster A. P. & John W. (2002) Tukey as «philosopher». Annals of Mathematical Statistics, vol. 30, no. 6, pp. 1619–1628. Available at: http://surl.li/ntixf

Kafadar K. (2003) John Tukey and Robustness. Statistical Science, vol. 18, no. 3, pp. 319–331. Available at: http://surl.li/ntixn

Kyslova O. (2011) Intelektualnyy analiz danykh: istoriya rozvytku termina [Data mining: the history of the term]. Ukrayinskyy sotsiolohichnyy zhurnal, no. 1-2, pp. 83–94. Available at: http://surl.li/ntixs

Google’s N-grams viewer. Available at: http://surl.li/ntiyc

Google’s N-grams viewer. Available at: http://surl.li/ntiyj

Article views: 52
PDF Downloads: 65
Published
2023-12-29
How to Cite
Yuskiv, B., Pliashko, O., & Khomych, S. (2023). THE EMERGENCE OF "DATA ANALYSIS" AS A SCIENTIFIC DISCIPLINE. Via Economica, (3), 114-119. https://doi.org/10.32782/2786-8559/2023-3-17
Section
Статті