What is meant by computational statistics?
Computational statistics or statistical computing focuses on the bond between statistics and computer science to transform raw data into knowledge. You could consider it to be the interface between statistics and computer science. Computational statistics is a field of computational science that focuses on the mathematical science of statistics. The field of computational statistics is growing at a tremendous pace and there are a large number of advancements being made in it. This has led to the statistics community urging that a broader concept of computing needs to be included in the curriculum as a part of general statistical education.
The objective of the field of computational statistics is the same as the objective of traditional statistics: transforming raw data into knowledge and deriving valuable insights from it. The main difference, however, between computational statistics and traditional statistical techniques is that computational statistics concentrate on making use of computer-intensive statistical methods, especially in situations where there is an extremely large sample size and there also are non-homogenous datasets.
Even though the terms ‘computational statistics’ and ‘statistical computing’ tend to be used in an interchangeable manner most of the time, one of the former presidents of the International Association for Statistical Computing, Carlo Lauro, suggested that there is a difference between these two terms.
Carlo defined ‘statistical computing’ to be "the application of computer science to statistics” and defined ‘computational statistics’ to be "aiming at the design of an algorithm for implementing statistical methods on computers, including the ones unthinkable before the computer age (e.g. bootstrap, simulation), as well as to cope with analytically intractable problems"
The term ‘computational statistics’ could also refer to computationally intensive statistical techniques like Markov chain Monte Carlo methods, kernel density estimation, resampling methods, local regression, artificial neural networks, as well as generalized additive models.
What are some computational statistics journals worth reading?
Here are some peer-reviewed computational statistics journals that you should consider reading:
- Journal of Computational and Graphical Statistics
- Communications in Statistics - Simulation and Computation
- Computational Statistics
- Computational Statistics & Data Analysis
- Journal of Statistical Software
- Journal of Statistical Computation and Simulation
- The R Journal
- Statistics and Computing
In addition to the journals mentioned above, here are two other journals that deal with computational statistics that you should read:
- The Stata Journal
- Wiley Interdisciplinary Reviews Computational Statistics
Here are some more statistical computing resources that you could refer to:
- Statistical Science Web: This is a portal that is dedicated to statistical science.
- Guide To Statistical Software: This explains how various statistical software work.
- Approach For Computing Statistical Power: This shows you how the statistical power of complex disease genes could be computed.
- Statistical Computing With R: This a computing tutorial that involves using a specific software package.
- Component-Based Environment: This is an article that explains what a component-based statistical computing environment is.
- How to Use The R Environment: This is a guide on using the R-based environment for the purpose of conducting statistical computing.
- Statistical Computing with Python: This contains information on how to conduct statistical computing by using the Python programming language.
What is the difference between computational statistics and machine learning?
The most significant difference between computational statistics and machine learning is that computational statistics deals with and focuses on handling statistical problems and uses computing devices to solve those problems, while machine learning, on the other hand, deals with and focuses on the problem of simulating human learning on machines.
What is the difference between computational statistics and data science?
The main difference between computational statistics and data science is that computational statistics is a subarea of scientific computing that follows scientific rigor, but data science is a field in which data scientists tend to be satisfied with accepting any method that offers the best business value.
What is the role of statistics in computer science?
There are several ways in which the roles of statisticians and computer scientists converge and merge. Let’s take the development of models and data mining as an example. The traditional statistical approach to models would tend to involve random models with prior knowledge of the data. But, the computer science approach, on the contrary, would lean more towards algorithmic models without prior knowledge of the data. These approaches end up coming together in attempts to solve problems.
The computer science data mining processes have statistical counterparts. For example:
- Data acquisition and enrichment work with experimental design for the collection of data or noise reduction
- Data exploration works with discerning the distribution and variability
- Analysis and modeling is in conjunction with group differences, dimension reduction, prediction, and classification
- Representation and reporting is in conjunction with visualization and communication
What are the conflicts between statisticians and computer scientists?
Computer scientists and statisticians tend to have complaints regarding each others’ disciplines. Here are some of these complaints.
Complaints that computer scientists make regarding statisticians
- Statisticians lack in programming sophistication.
- They value standard techniques rather than innovative techniques.
- They care more about theory than about solving real-world problems.
Complaints that statisticians make regarding computer scientists
- They do not have the statistical foundations for data collection and analysis.
- They do not consider the objectives as much as they should.
- They do not care for the representative nature of data.
What is R?
R is a language and an environment that is used for statistical computing and graphics. It has similarities with the S language and environment developed at Bell Laboratories by John Chambers and colleagues. R is a GNU project that can be considered to be a different implementation of S. Even though there are some major differences between R and S, a lot of the code written for S runs unaltered under R.
R offers a vast range of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques, and is rather highly extensible.
The R environment is an integrated suite of software facilities that can be used for data manipulation, calculation and graphical display. It includes:
- A data handling and storage facility.
- A suite of operators that are used for calculations on arrays, specifically matrices.
- A vast, coherent, integrated collection of intermediate tools that can be used for data analysis.
- Graphical facilities to analyze data and display on-screen or even via hardcopy.
- A simple and effective programming language including conditionals, loops, user-defined recursive functions, and input & output facilities.