稳健回归的开创者、美国著名的统计学家、前美国总统科技顾问Peter John Huber于1997年11月在北京中国科学院数理统计研究所演讲时说道:“很多数学背景的统计学家习惯于用数学的确定性思维模式来思考和解决统计学领域的非确定性问题,由此犯下了一些严重的错误,导致了很多思想和方法上的混乱。”他并期待着一股来自数学以外的力量能够推动统计学和数学的变革。 听到这个演讲内容和观点后,我的第一感觉是,如果这个力量存在的话,那么,它只能是哲学,因为哲学是人类一切知识的认识论和方法论根源,因而也是一切知识的终极裁决者。
其实,哲学并非什么艰深玄妙的东西,它是一种智慧,引导人们分辨万事万物的性质是怎样的,有何区别和相似之处,进而认识它们是什么,相互间有何关系。因此,一个学统计的,如果不懂哲学或缺乏基本的哲学素养,便如一个在黑暗中摸索的瞎子。对于在黑暗中感到困顿的人,哲学将会开启他的智慧,并赋予他一盏明亮的灯,照亮他前进的道路。 最近试图与几位著名的数学背景的统计学家交流自己的思想,但无一愿意给出有价值的东西,他们基本采取了沉默不语或不屑理睬的态度。为此,我把这个试图与他们交流的东西发表在自己的博客里,作为十四年多来自己对整个系统的持续挑战的组成部分之一。这个挑战将一直存在于这里,以便人们可以观瞻这一科学史上的悲剧。
Dear Dr. XXX,
您能够解答我的以下两个困惑吗? 我在长达近14年多的时间里做的是关于临界回归分析或分段回归分析(segmented regression or piecewise regression)的逻辑与算法的重建。我之所以坚持不懈地这样做,是因为我相信没有一套数学公理系统可以演绎出这个方法论,而当前的方法论存在严重的理论错误。这个领域里最困扰我的问题有以下两个: 第一,在基于样本测量的基础上在样本可测空间上搜索未知临界点时,目前的经典方法论是以随机分段模型组中最小合并预测残差(min(combined residuals))作出一组“最优”的模型决策,也就是所谓的最优化决策。我想请问,这个决策的数学根据是什么?谁已经或能够从概率论上证明那个最小合并预测残差与所谓的“最优临界模型组”的随机参数集合之间的对应是一个“可期望的”或“可靠的”对应,或者说,上述两个随机测度的收敛在各自的可测空间上具有概率上最大且充分的一致性。 我从直觉上看这个对应是不可期望的,因为无论是最小合并预测残差,还是对应于它的随机临界模型组的各个统计量都是随机的“点”测量,它们之间的对应关系就好比我们在一定的样本量条件下得到的一组同质人群的身高与体重之间的随机的点对应一样。如果我们的研究目的是试图用“身高”这个随机变量来对“体重”这个随机变量的某个属性做出统计决策,我们显然是不可能使用min(身高)或max(身高)来做出一个关于“体重”的那个属性的稳定而可靠的决策的。这样的“最优化”在统计学上是绝对不可接受的,因为,If we could use min(X) or max(X) to make a statistical decision for Y, where both X (maybe an optimizer) and Y (maybe a set of parameters of a set of threshold models) are randomly variable, then all the fundamentals of Statistics would be collapsed. 其实,早在1962年,John Tukey就在其著名的长篇文章《The Future of Data Analysis》里警告过人们“最优化”在统计学中的危险性。 第二,关于spline技术在临界回归分析中的应用。这里有一个前提假设,即所谓的enforced continuity,这个假设是以数学函数理论求解临界点的关键条件。没有这个假设的给定,就无法使用解联立方程组的方法求解未知临界点。但是,从统计学的角度,如果一个总体中存在一个临界点,那么,在随机抽样的条件下,在样本临界点(如果它可以被以另外的方法估计出来的话)附近的两个临界模型间将必然存在一个抽样的连接变异(这是一个确定性的存在),至于这个连接变异有多大多小,nobody knows(也即这是一个非确定性的存在),从而,我们不可以强制性地预设那个“连续性”来建立一套方法论。反之,如果坚持采用那个强制连续性的假设,就等于是用一个确定性的假设来否决了一个确定性的存在,并以假定的方式肯定了一个“非确定性的存在”的不存在(非确定性的连接变异 = 0,即肯定了“非确定性的连接变异”的不存在)!这是一个令人惊叹的低级错误。 If the continuity between two adjacent threshold models is not inferred in a probability, it is not a statistical method but a mathematical game with an arbitrary assumption in a certainty for an uncertainty. 所以,我认为以上两个问题可能是统计学方法论发展史上的两个悲剧性错误。我在2007年和2009年的JSM会议上曾两次谈到了这两个错误,也曾试图投稿发表自己的见解,却被所有杂志社拒绝了,但却从来没有人对这类拒绝的理由给出任何专业方面的解释。这些期刊包括(按投稿时间顺序):
Biometrics (2次修稿。唯一评论:目前的方法比这个好)
Statistics in Medicine (1次投稿。唯一评语:没有创新)
JASA (3次修稿。第一个评语:本文的思想确实有趣(definitely interesting),但数学表达不规范,会使审稿者感到burden。最终评语:该文不适合发表)
Biometrika (1次投稿。唯一评语:本刊空间有限)
Annals of Statistics (7次修稿。第一个有意义的评语:本文试图挑战the large body of Statistics and Mathematics,但以本文目前的英语写作水平,不足以令读者信服。最终评语:建议投稍微低一点的刊物)
Computational Statistics and Data Analysis (2次修稿。唯一评语:作者有点妄言)
The American Statistician (1次投稿,唯一评语:无法判断本文的观点和方法是否正确) 上述两个问题我曾请教过哈佛统计系的主任孟晓犁(Xiao-Li Meng)以及当前的Annals of Statistics的副主编蔡天文(Tong Cai),然而,这两位杰出的数学背景的统计学家无一愿意回应。所以,那两个困惑对于我依然待解,我相信没有哪个数学背景的数理统计学家可以给出关于它们的肯定的论证,因为它们本是统计学领域的两个谬论,是由于概念缺失导致的分析逻辑和数学算法上的错误。 人们可以继续无视我所做出的东西,因为作为国内医学院毕业的master-level的我在统计学领域的credit可以被忽略不计,但问题将依然存在。正如Dr. Huber在讨论导致他所说的那些错误的原因时所指出的那样,“一些数学家习惯于以他们的确定性思维模式来解决非确定性领域的问题”,这是统计学领域中一切错误和问题的根源所在。
In a mathematician's eyes, a sample is a given set; and nothing is variable, so they treat the set as a certainty. However, a sample is a random set and variable to population. Nothing is certainty.
The optimization takes the idea of "one-to-one correspondence" to make the model selection. This is a shame for a mathematician doing in this way since nothinig is a one-to-one correspondence in a random sample. Every correspodence in a random sample is random.
All models are wrong, but some are useful
--- Statistician George E P Box, in "Science and statistics", Journal of the
American Statistical Association 71:791-799, quoted in Holling, C S, Stephen R Carpenter, William A Brock, and Lance H Gunderson, “Discoveries for Sustainable Futures”, Ch. 15 in Gunderson, Lance H and C S Holling, Panarchy: Understanding transformations in human and natural systems, Island Press (2002), p. 409
nightrider 发表评论于
回复TNEGI//ETNI的评论:
Please refer to my response below inline between the dotted lines as such:
--------------
my response
---------------
回复nightrider的评论:
Thank you very much for your time and attention. I would like to take this opportunity to clarify something that I might not expressed clearly in this blog article, though they have been clearly stated in my papers in two JSM's proceedings.
> The "segmented regression or piecewise regression" you mentioned refers to this http://en.wikipedia.org/wiki/Segmented_regression, right? <
Exactly I would like to say, the concept of the "segmented regression or piecewise regression (I prefer the latter one as the formal term in the field)" are not referred from that website, but from several formal top journals in Statistics, like JASA, Annals of Statistics, etc.
The classical method in this field was developed from 1959 to 1979, then turned to spline as the modern form with the enforced continuity assumption and smoothing techniques. Although the methodology for piecewise regression has been continuously developed since then, the basic assumption and the computation techniques are almost the same or similar. What are improved are just the computation technqiues for estimating each threshold or change-point or node and for smoothing the connections in spline in different situations. No one had ever doubted the theoretical issues behind the assumptions and the computation techniques untill I began to doubt them in 2007.
-----------------
Good that you provide a little background information. But you still not have not stated clearly what your objection is.
------------------
> Of course the line can be replaced with nonlinear parametric curves.<
No, sometimes we don't need a smoothy non-linear curve to describe the entire process, but need a threshold to change something, i.e. a policy for investment, etc. A smoothy curve may not help to find the critical point to make a decision.
---------------------
You misunderstood my statement. I meant the curves between the break points or discontinuity be smooth parametric curves, linear or not. After all, the discontinuity is what you are after, isn't it? You do need only a finite number of discontinuity, don't you? So the rest of the curve has to be continuous or smooth, doesn't it?
-------------------------
> Does your first question concern with the legitimacy of the least square method for deducing the parameters? <
No, the LSM is correct for estimating model parameters covering a specific whole sample. What I criticized is the computation techniques ba23sed on an optimizational approach to make a decision for the piecewise models, and the assumption of enforced continuity for estimating the thresholds and smoothing the connection between any two adjacent piecewise models in a whole sample space.
In the current methodology, usually we don't know where a threshold or node is, so we have to search it in a sample space based on a real sample. This means that we have to assume each real sample point may be the threshold or node, thus, if the sample size is n; and there is only one threshold, we will have n pairs of piecewise models and n combined sums of squared residuals because of n pairs of piecewise models. Then, which is the pair that we can expect? The current method took the smallest combined sum of squared residuals (this is an optimizational approach) in the n combined sums of squared residuals to make the model selection, then to estimate a theoretical threshold by taking Model_1 = Model_2 (this is the so-called enforced continuity) in the selected pair of the piecewise models.
It sounds extremely solid in a mathemtical point of view, right? However, if the connection variablity at an unknown sampling threshold cannot be assumed to be zero, we cannont take the equation Model_1 = Model_2 to estimate the unknown threshold or node. This will be an ultimate obstacle to a mathematician in Statistics. This means that the curent methodology is a dead end or went onto a dead path! We have to find another way.
--------------------------
You need to be more specific to in explaining the present methodology of "estimating theoretical threshold by taking Model_1 = Model_2 and your objection concerning "connection variability". Could you give a reference for a thorough mathematically rigorous treatment of the present methodology and a link to your "papers in two JSM's proceedings"? The discussion would be much more efficient and concrete looking at the mathematics.
-------------------
> Is the "enforced continuity" in your second question referring to the whole of the regression curve consisting of the segments (straight line or not) having to be continuous? <
Yes!
-------------------
Now you are confusing me. If the curve is piecewise, then discontinuities are allowed and continuity is not enforced. Judging from your comments above, your answer here should be "No".
---------------------
TNEGI//ETNI 发表评论于
回复3722的评论:
>所有的模型都是错的,但是有的模型是有用的 (All models are incorrect, but some models are useful)。<
In my opinion, 这可能是一个无知者的谬论。他不去努力找到一个尽可能充分直至终极正确的途径,却以一种诡辩式的语气为自己开脱责任。
TNEGI//ETNI 发表评论于
回复nightrider的评论:
Thank you very much for your time and attention. I would like to take this opportunity to clarify something that I might not expressed clearly in this blog article, though they have been clearly stated in my papers in two JSM's proceedings.
> The "segmented regression or piecewise regression" you mentioned refers to this http://en.wikipedia.org/wiki/Segmented_regression, right? <
Exactly I would like to say, the concept of the "segmented regression or piecewise regression (I prefer the latter one as the formal term in the field)" are not referred from that website, but from several formal top journals in Statistics, like JASA, Annals of Statistics, etc.
The classical method in this field was developed from 1959 to 1979, then turned to spline as the modern form with the enforced continuity assumption and smoothing techniques. Although the methodology for piecewise regression has been continuously developed since then, the basic assumption and the computation techniques are almost the same or similar. What are improved are just the computation technqiues for estimating each threshold or change-point or node and for smoothing the connections in spline in different situations. No one had ever doubted the theoretical issues behind the assumptions and the computation techniques untill I began to doubt them in 2007.
> Of course the line can be replaced with nonlinear parametric curves.<
No, sometimes we don't need a smoothy non-linear curve to describe the entire process, but need a threshold to change something, i.e. a policy for investment, etc. A smoothy curve may not help to find the critical point to make a decision.
> Does your first question concern with the legitimacy of the least square method for deducing the parameters? <
No, the LSM is correct for estimating model parameters covering a specific whole sample. What I criticized is the computation techniques based on an optimizational approach to make a decision for the piecewise models, and the assumption of enforced continuity for estimating the thresholds and smoothing the connection between any two adjacent piecewise models in a whole sample space.
In the current methodology, usually we don't know where a threshold or node is, so we have to search it in a sample space based on a real sample. This means that we have to assume each real sample point may be the threshold or node, thus, if the sample size is n; and there is only one threshold, we will have n pairs of piecewise models and n combined sums of squared residuals because of n pairs of piecewise models. Then, which is the pair that we can expect? The current method took the smallest combined sum of squared residuals (this is an optimizational approach) in the n combined sums of squared residuals to make the model selection, then to estimate a theoretical threshold by taking Model_1 = Model_2 (this is the so-called enforced continuity) in the selected pair of the piecewise models.
It sounds extremely solid in a mathemtical point of view, right? However, if the connection variablity at an unknown sampling threshold cannot be assumed to be zero, we cannont take the equation Model_1 = Model_2 to estimate the unknown threshold or node. This will be an ultimate obstacle to a mathematician in Statistics. This means that the curent methodology is a dead end or went onto a dead path! We have to find another way.
> Is the "enforced continuity" in your second question referring to the whole of the regression curve consisting of the segments (straight line or not) having to be continuous? <
Yes!
3722 发表评论于
所有的模型都是错的,但是有的模型是有用的 (All models are incorrect, but some models are useful)。(忘了谁说的)
nightrider 发表评论于
TNEGI//ETNI:
I am trying to understand your two questions. As it appears that you have expended so much time effort trying to understand and challenge what you call mistakes in statistics, would it not be helpful for you and for your audience to state clearly and rigorously the problems first? What you have written written here does not appear that you have not done that. If what appears here is what you wrote to the journals and the experts, at least it is not exactly clear to me what you are trying to say. I will have say that some of the review comments you quoted are not that off mark, regarding the clarity of your presentation.
As an attempt at clarification, allow me to ask you a few questions. The "segmented regression or piecewise regression" you mentioned refers to this http://en.wikipedia.org/wiki/Segmented_regression, right? Of course the line can be replaced with nonlinear parametric curves. Does your first question concern with the legitimacy of the least square method for deducing the parameters? Is the "enforced continuity" in your second question referring to the whole of the regression curve consisting of the segments (straight line or not) having to be continuous?
我在长达近14年多的时间里做的是关于临界回归分析或分段回归分析(segmented regression or piecewise regression)的逻辑与算法的重建。我之所以坚持不懈地这样做,是因为我相信没有一套数学公理系统可以演绎出这个方法论,而当前的方法论存在严重的理论错误。这个领域里最困扰我的问题有以下两个:
我从直觉上看这个对应是不可期望的,因为无论是最小合并预测残差,还是对应于它的随机临界模型组的各个统计量都是随机的“点”测量,它们之间的对应关系就好比我们在一定的样本量条件下得到的一组同质人群的身高与体重之间的随机的点对应一样。如果我们的研究目的是试图用“身高”这个随机变量来对“体重”这个随机变量的某个属性做出统计决策,我们显然是不可能使用min(身高)或max(身高)来做出一个关于“体重”的那个属性的稳定而可靠的决策的。这样的“最优化”在统计学上是绝对不可接受的,因为,If we could use min(X) or max(X) to make a statistical decision for Y, where both X (maybe an optimizer) and Y (maybe a set of parameters of a set of threshold models) are randomly variable, then all the fundamentals of Statistics would be collapsed.
If the continuity between two adjacent threshold models is not inferred in a probability, it is not a statistical method but a mathematical game with an arbitrary assumption in a certainty for an uncertainty.
Annals of Statistics (7次修稿。第一个有意义的评语:本文试图挑战the large body of Statistics and Mathematics,但以本文目前的英语写作水平,不足以令读者信服。最终评语:建议投稍微低一点的刊物)
Computational Statistics and Data Analysis (2次修稿。唯一评语:作者有点妄言)
The American Statistics (1次投稿,唯一评语:无法判断本文的观点和方法是否正确)
上述两个问题我曾请教过哈佛统计系的主任孟晓犁(Xiao-Li Meng)以及当前的Annals of Statistics的副主编蔡天文(Tong Cai),然而,这两位杰出的数学背景的统计学家无一愿意回应。所以,那两个困惑对于我依然待解,我相信没有哪个数学背景的数理统计学家可以给出关于它们的肯定的论证,因为它们本是统计学领域的两个谬论,是由于概念缺失导致的分析逻辑和数学算法上的错误。
You are a great hero in the sports you just mentioned below. Hope you can win them. However, if you are a great statistician, please leave your answers for those questions since I have said that this blog is a challenge for anyone in the field of Statistics; otherwise, dream yourself as you wish you were whatever you want to be.
pillar 发表评论于
I tried to challenge Federer on tennis but he did not answer; I tried to defeat Kobi on basketball but he did not show up; I tried to race with Bolts on 100m dash he ignored me. So I decide to record this here so mankind will witness such a great sport man has lived.
TNEGI//ETNI 发表评论于
回复needtime的评论:
The mathematics in Statistics should not be contraditory to itself!(统计学中的数学不应该与其自身相矛盾!)
"我把试图交流的东西发表在自己的博客里,作为对整个系统的挑战。这个挑战将一直存在于这里,以便人们可以观瞻这一科学史上的悲剧。" That's a huge statement. It's only logic that whoever makes such a statement should know the best place to discuss the issues are the leading scientific journals,not here with the laymen.