Type or theory of point estimation solution manual pdf a DOI name into the text box. Do you want to advertise on this Website? How satisfied are you with SAS documentation? How satisfied are you with SAS documentation overall?
Do you have any additional comments or suggestions regarding SAS documentation in general that will help us better serve you? This content is presented in an iframe, which your browser does not support. Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistics seek to provide methods that emulate popular statistical methods, but which are not unduly affected by outliers or other small departures from model assumptions. In statistics, classical estimation methods rely heavily on assumptions which are often not met in practice. The median is a robust measure of central tendency, while the mean is not.
Trimmed estimators and Winsorised estimators are general methods to make statistics more robust. You can help by adding to it. There are various definitions of a “robust statistic. One of the most important cases is distributional robustness.
Thus, in the context of robust statistics, distributionally robust and outlier-resistant are effectively synonymous. A related topic is that of resistant statistics, which are resistant to the effect of extreme scores. The data sets for that book can be found via the Classic data sets page, and the book’s website contains more information on the data. Although the bulk of the data look to be more or less normally distributed, there are two obvious outliers.
These outliers have a large effect on the mean, dragging it towards them, and away from the center of the bulk of the data. Thus, if the mean is intended as a measure of the location of the center of the data, it is, in a sense, biased when outliers are present. Also, the distribution of the mean is known to be asymptotically normal due to the central limit theorem. However, outliers can make the distribution of the mean non-normal even for fairly large data sets.
Besides this non-normality, the mean is also inefficient in the presence of outliers and less variable measures of location are available. The outliers are clearly visible in these plots. Also note that whereas the distribution of the trimmed mean appears to be close to normal, the distribution of the raw mean is quite skewed to the left. So, in this sample of 66 observations, only 2 outliers cause the central limit theorem to be inapplicable. Robust statistical methods, of which the trimmed mean is a simple example, seek to outperform classical statistical methods in the presence of outliers, or, more generally, when underlying parametric assumptions are not quite correct.
Whilst the trimmed mean performs well relative to the mean in this example, better robust estimates are available. In fact, the mean, median and trimmed mean are all special cases of M-estimators. Details appear in the sections below. The distribution of standard deviation is erratic and wide, a result of the outliers. The MAD is better behaved, and Qn is a little bit more efficient than MAD. This simple example demonstrates that when outliers are present, the standard deviation cannot be recommended as an estimate of scale. Traditionally, statisticians would manually screen data for outliers, and remove them, usually checking the source of the data to see whether the outliers were erroneously recorded.
Indeed, in the speed-of-light example above, it is easy to see and remove the two outliers prior to proceeding with any further analysis. Outliers can often interact in such a way that they mask each other. As a simple example, consider a small univariate data set containing one modest and one large outlier. The estimated standard deviation will be grossly inflated by the large outlier. The result is that the modest outlier looks relatively normal.
As soon as the large outlier is removed, the estimated standard deviation shrinks, and the modest outlier now looks unusual. This problem of masking gets worse as the complexity of the data increases. For example, in regression problems, diagnostic plots are used to identify outliers. However, it is common that once a few outliers have been removed, others become visible.
The problem is even worse in higher dimensions. Antarctica were rejected as outliers by non-human screening. Although this article deals with general principles for univariate statistical methods, robust methods also exist for regression problems, generalized linear models, and parameter estimation of various distributions. The basic tools used to describe and measure robustness are, the breakdown point, the influence function and the sensitivity curve.
The higher the breakdown point of an estimator, the more robust it is. Therefore, the maximum breakdown point is 0. 5 and there are estimators which achieve such a breakdown point. Statistics with high breakdown points are sometimes called resistant statistics. In the speed-of-light example, removing the two lowest observations causes the mean to change from 26. The estimate of scale produced by the Qn method is 6. We can divide this by the square root of the sample size to get a robust standard error, and we find this quantity to be 0.