您的当前位置：首页正文

统计学综述

来源：画鸵萌宠网

StatisticalScience

2011,Vol.26,No.1,1–9DOI:10.1214/10-STS337

©InstituteofMathematicalStatistics,2011

StatisticalInference:TheBigPicture1

RobertE.Kass

Abstract.Statisticshasmovedbeyondthefrequentist-Bayesiancontrover-siesofthepast.Wheredoesthisleaveourabilitytointerpretresults?Isuggestthataphilosophycompatiblewithstatisticalpractice,labeledherestatisti-calpragmatism,servesasafoundationforinference.Statisticalpragmatismisinclusiveandemphasizestheassumptionsthatconnectstatisticalmodelswithobserveddata.IarguethatintroductorycoursesoftenmischaracterizetheprocessofstatisticalinferenceandIproposeanalternative“bigpicture”depiction.

Keywordsandphrases:Bayesian,conﬁdence,frequentist,statisticaledu-cation,statisticalpragmatism,statisticalsigniﬁcance.

1.INTRODUCTION

Theprotractedbattleforthefoundationsofstatis-tics,joinedvociferouslybyFisher,Jeffreys,Neyman,Savageandmanydisciples,hasbeendeeplyilluminat-ing,butithasleftstatisticswithoutaphilosophythatmatchescontemporaryattitudes.Becauseeachcamptookasitsgoalexclusiveownershipofinference,eachwasdoomedtofailure.Wehaveall,ornearlyall,movedpasttheseolddebates,yetourtextbookexpla-nationshavenotcaughtupwiththeeclecticismofsta-tisticalpractice.

Thedifﬁcultiesgobothways.Bayesianshavede-niedtheutilityofconﬁdenceandstatisticalsigniﬁ-cance,attemptingtosweepasidetheobvioussuccessoftheseconceptsinappliedwork.Meanwhile,fortheirpart,frequentistshaveignoredthepossibilityofinfer-enceaboutuniqueeventsdespitetheirubiquitousoc-currencethroughoutscience.Furthermore,interpreta-tionsofposteriorprobabilityintermsofsubjectivebe-lief,orconﬁdenceintermsoflong-runfrequency,givestudentsalimitedandsometimesconfusingviewofthenatureofstatisticalinference.Whenusedtointroducetheexpressionofuncertaintybasedonarandomsam-RobertE.KassisProfessor,DepartmentofStatistics,CenterfortheNeuralBasisofCognitionandMachineLearningDepartment,CarnegieMellonUniversity,Pittsburgh,Pennsylvania15213,USA(e-mail:kass@stat.cmu.edu).

1Discussedin1012.14/11-STS337C,1012.14/11-STS337A,

ple,thesecaricaturesforfeitanopportunitytoarticulateafundamentalattitudeofstatisticalpractice.

Mostmodernpractitionershave,Ithink,anopen-mindedviewaboutalternativemodesofinference,butareacutelyawareoftheoreticalassumptionsandthemanywaystheymaybemistaken.Iwouldsuggestthatitmakesmoresensetoplaceinthecenterofourlog-icalframeworkthematchormismatchoftheoreticalassumptionswiththerealworldofdata.This,itseemstome,isthecommongroundthatBayesianandfre-quentiststatisticsshare;itismorefundamentalthanei-therparadigmtakenseparately;andaswestrivetofos-terwidespreadunderstandingofstatisticalreasoning,itismoreimportantforbeginningstudentstoappreciatetheroleoftheoreticalassumptionsthanforthemtore-citecorrectlythelong-runinterpretationofconﬁdenceintervals.Withthehopeofproddingourdisciplinetorightalingeringimbalance,Iattemptheretodescribethedominantcontemporaryphilosophyofstatistics.

2.STATISTICALPRAGMATISM

Iproposetocallthismodernphilosophystatisticalpragmatism.Ithinkitisbasedonthefollowingatti-tudes:

1.Conﬁdence,statisticalsigniﬁcance,andposteriorprobabilityareallvaluableinferentialtools.

2.Simplechancesituations,wherecountingargu-mentsmaybebasedonsymmetriesthatgenerateequallylikelyoutcomes(sixfacesonafairdie;52cardsinashufﬂeddeck),supplybasicintuitions

1012.14/11-STS337Dand1012.14/11-STS337B;rejoinderat1012.14/11-STS337REJ.

2R.E.KASS

aboutprobability.Probabilitymaybebuiltuptoim-portantbutlessimmediatelyintuitivesituationsus-ingabstractmathematics,muchthewayrealnum-bersaredeﬁnedabstractlybasedonintuitionscom-ingfromfractions.Probabilityisusefullycalibratedintermsoffairbets:anotherwaytosaytheproba-bilityofrollinga3withafairdieis1/6isthat5to1oddsagainstrollinga3wouldbeafairbet.

3.Long-runfrequenciesareimportantmathemati-cally,interpretively,andpedagogically.However,itispossibletoassignprobabilitiestouniqueevents,includingrollinga3withafairdieorhavingacon-ﬁdenceintervalcoverthetruemean,withoutcon-sideringlong-runfrequency.Long-runfrequenciesmayberegardedasconsequencesofthelawoflargenumbersratherthanaspartofthedeﬁnitionofprob-abilityorconﬁdence.

4.Similarly,thesubjectiveinterpretationofposteriorprobabilityisimportantasawayofunderstandingBayesianinference,butitisnotfundamentaltoitsuse:inreportinga95%posteriorintervaloneneednotmakeastatementsuchas,“Mypersonalproba-bilityofthisintervalcoveringthemeanis0.95.”5.Statisticalinferencesofallkindsusestatisticalmodels,whichembodytheoreticalassumptions.AsillustratedinFigure1,likescientiﬁcmodels,sta-tisticalmodelsexistinanabstractframework;todistinguishthisframeworkfromtherealworldin-habitedbydatawemaycallita“theoreticalworld.”Randomvariables,conﬁdenceintervals,andposte-riorprobabilitiesallliveinthistheoreticalworld.Whenweuseastatisticalmodeltomakeastatisti-calinferenceweimplicitlyassertthatthevariationexhibitedbydataiscapturedreasonablywellbythestatisticalmodel,sothatthetheoreticalworldcorrespondsreasonablywelltotherealworld.Con-clusionsaredrawnbyapplyingastatisticalinfer-encetechnique,whichisatheoreticalconstruct,tosomerealdata.Figure1depictstheconclusionsasstraddlingthetheoreticalandrealworlds.Statisti-calinferencesmayhaveimplicationsfortherealworldofnewobservablephenomena,butinscien-tiﬁccontexts,conclusionsmostoftenconcernscien-tiﬁcmodels(ortheories),sothattheir“realworld”implications(involvingnewdata)aresomewhatin-direct(thenewdatawillinvolvenewanddifferentexperiments).ThestatisticalmodelsinFigure1couldinvolvelargefunctionspacesorotherrelativelyweakprobabilisticassumptions.Carefulconsiderationoftheconnection

FIG.1.Thebigpictureofstatisticalinference.Statisticalpro-ceduresareabstractlydeﬁnedintermsofmathematicsbutareused,inconjunctionwithscientiﬁcmodelsandmethods,toexplainobservablephenomena.Thispictureemphasizesthehypotheticallinkbetweenvariationindataanditsdescriptionusingstatisticalmodels.

betweenmodelsanddataisacorecomponentofboththeartofstatisticalpracticeandthescienceofstatis-ticalmethodology.ThepurposeofFigure1istoshiftthegroundsfordiscussion.

Note,inparticular,thatdatashouldnotbeconfusedwithrandomvariables.Randomvariablesliveinthetheoreticalworld.Whenwesaythingslike,“Letusas-sumethedataarenormallydistributed”andwepro-ceedtomakeastatisticalinference,wedonotneedtotakethesewordsliterallyasassertingthatthedataformarandomsample.Instead,thiskindoflanguageisaconvenientandfamiliarshorthandforthemuchweakerassertionthat,forourspeciﬁedpurposes,thevariabilityofthedataisadequatelyconsistentwithvariabilitythatwouldoccurinarandomsample.ThislinguisticamenityisusedroutinelyinbothfrequentistandBayesianframeworks.Historically,thedistinctionbetweendataandrandomvariables,thematchofthemodeltothedata,wassetaside,tobetreatedasaseparatetopicapartfromthefoundationsofinference.Butoncethedatathemselveswereconsideredrandomvariables,thefrequentist-Bayesiandebatemovedintothetheoreticalworld:itbecameadebateaboutthebestwaytoreasonfromrandomvariablestoinferencesaboutparameters.Thiswasconsistentwithdevelop-mentselsewhere.Inotherpartsofscience,thedistinc-tionbetweenquantitiestobemeasuredandtheirthe-oreticalcounterpartswithinamathematicaltheorycanberelegatedtoadifferentsubject—toatheoryofer-rors.Instatistics,wedonothavethatluxury,anditseemstomeimportant,fromapragmaticviewpoint,tobringtocenterstagetheidentiﬁcationofmodelswithdata.Thepurposeofdoingsoisthatitprovidesdif-ferentinterpretationsofbothfrequentistandBayesianinference,interpretationswhich,Ibelieve,areclosertotheattitudeofmodernstatisticalpractitioners.

STATISTICALINFERENCE3

(A)

(B)

FIG.2.(A)BARSﬁtstoapairofperi-stimulustimehistogramsdisplayingneuralﬁringrateofaparticularneuronundertwoalternativeexperimentalconditions.(B)ThetwoBARSﬁtsareoverlaidforeaseofcomparison.

Afamiliarpracticalsituationwheretheseissuesariseisbinaryregression.AclassicexamplecomesfromapsychophysicalexperimentconductedbyHecht,SchlaerandPirenne(1942),whoinvestigatedthesen-sitivityofthehumanvisualsystembyconstructinganapparatusthatwouldemitﬂashesoflightatverylowintensityinadarkenedroom.Thoseauthorspresentedlightofvaryingintensitiesrepeatedlytoseveralsub-jectsanddetermined,foreachintensity,theproportionoftimeseachsubjectwouldrespondthatheorshehadseenaﬂashoflight.Foreachsubjecttheresultingdataarerepeatedbinaryobservations(“yes”perceivedver-sus“no”didnotperceive)ateachofmanyintensitiesand,thesedays,thestandardstatisticaltooltoanalyzesuchdataislogisticregression.Wemight,forinstance,usemaximumlikelihoodtoﬁnda95%conﬁdencein-tervalfortheintensityoflightatwhichthesubjectwouldreportperceptionwithprobabilityp=0.5.Be-causethedatareportedbyHechtetal.involvedfairlylargesamples,wewouldobtainessentiallythesameanswerifinsteadweappliedBayesianmethodstogetanintervalhaving95%posteriorprobability.Buthowshouldsuchanintervalbeinterpreted?

AmorerecentexamplecomesfromDiMatteo,Gen-oveseandKass(2001),whoillustratedanewnon-parametricregressionmethodcalledBayesianadap-tiveregressionsplines(BARS)byanalyzingneuralﬁr-ingratedatafrominferotemporalcortexofamacaque

monkey.Thedatacamefromastudyultimatelyre-portedbyRollenhagenandOlson(2005),whichin-vestigatedthedifferentialresponseofindividualneu-ronsundertwoexperimentalconditions.Figure2dis-playsBARSﬁtsunderthetwoconditions.Onewaytoquantifythediscrepancybetweentheﬁtsistoesti-matethedropinﬁringratefrompeak(themaximalﬁr-ingrate)tothetroughimmediatelyfollowingthepeakineachcondition.Letuscallthesepeakminustroughdifferences,underthetwoconditions,φ1andφ2.Us-ingBARS,DiMatteo,GenoveseandKassreporteda

ˆ1−φˆ2=50.0withposteriorstan-posteriormeanofφ

darddeviation(±20.8).Infollow-upwork,Wallstrom,LiebnerandKass(2008)reportedverygoodfrequen-tistcoverageprobabilityof95%posteriorprobabilityintervalsbasedonBARSforsimilarquantitiesundersimulationconditionschosentomimicsuchexperi-mentaldata.Thus,aBARS-basedposteriorintervalcouldbeconsideredfromeitheraBayesianorfrequen-tistpointofview.Againwemayaskhowsuchanin-ferentialintervalshouldbeinterpreted.

3.INTERPRETATIONS

Statisticalpragmatisminvolvesmildlyalteredinter-pretationsoffrequentistandBayesianinference.FordeﬁnitenessIwilldiscusstheparadigmcaseofconﬁ-denceandposteriorintervalsforanormalmeanbased

4R.E.KASS

onasampleofsizen,withthestandarddeviationbe-ingknown.Supposethatwehaven=49observationsthathaveasamplemeanequalto10.2.

FREQUENTISTASSUMPTIONS.SupposeX1,X2,...,Xnarei.i.d.randomvariablesfromanormaldistri-butionwithmeanμandstandarddeviationσ=1.Inotherwords,supposeX1,X2,...,XnformarandomsamplefromaN(μ,1)distribution.

Notingthatx¯=10.2and√

49=7wedeﬁnethein-ferentialinterval

I=󰀂

10.2−27,10.2+

2󰀃

TheintervalImayberegardedasa95%conﬁdence

interval.Inowcontrastthestandardfrequentistinter-pretationwiththepragmaticinterepretation.

FREQUENTISTINTERPRETATIONOFCONFIDENCEINTERVAL.Undertheassumptionsabove,ifweweretodrawinﬁnitelymanyrandomsamplesfromaN(μ,1)distribution,95%ofthecorrespondingconﬁ-denceintervals(X¯−27,X¯+27)wouldcoverμ.PRAGMATICINTERPRETATIONOFCONFIDENCE

INTERVAL.Ifweweretodrawarandomsampleac-cordingtotheassumptionsdenceinterval(X¯above,theresultingconﬁ-−2,X¯+2)wouldhaveprobability0.95ofcoveringμ.Because77therandomsamplelivesinthetheoreticalworld,thisisatheoreticalstatement.Nonetheless,substituting(1)

X¯=x¯

togetherwith(2)

x¯=10.2weobtaintheintervalI,andareabletodrawuseful

conclusionsaslongasourtheoreticalworldisalignedwellwiththerealworldthatproducedthedata.Themainpointhereisthatwedonotneedalong-runinterpretationofprobability,butwedohavetoberemindedthattheunique-eventprobabilityof0.95remainsatheoreticalstatementbecauseitappliestorandomvariablesratherthandata.LetusturntotheBayesiancase.

BAYESIANASSUMPTIONS.SupposeX1,X2,...,XnformarandomsamplefromaN(μ,1)dis-tributionandthepriordistributionofμisN(μ2󰀌0,τ2)withτ491

and49τ2󰀌|μ0|.

Theposteriordistributionofμisnormal,theposte-riormeanbecomes

μ¯=τ21/49+τ210.2+1/49

1/49+τ2μ0

andtheposteriorvarianceis

󰀄

󰀅v=49+1

−1

butbecauseτ2󰀌491and49τ2󰀌|μ0|wehaveμ¯≈10.2

and

v≈

149

.Therefore,theinferentialintervalIdeﬁnedabovehasposteriorprobability0.95.

BAYESIANINTERPRETATIONOFPOSTERIORIN-TERVAL.Undertheassumptionsabove,theprobabil-itythatμisintheintervalIis0.95.

PRAGMATICINTERPRETATIONOFPOSTERIORIN-TERVAL.Ifthedatawerearandomsampleforwhich(2)holds,thatis,x¯=10.2,andiftheassump-tionsaboveweretohold,thentheprobabilitythatμisintheintervalIwouldbe0.95.Thisreferstheticalvaluex¯toahypo-,andbecauseX

¯¯oftherandomvariableX

livesinthetheoreticalworldthestatementremainstheoretical.Nonetheless,weareabletodrawusefulconclusionsfromthedataaslongasourtheoreticalworldisalignedwellwiththerealworldthatproducedthedata.

Here,althoughtheBayesianapproachescapestheindirectnessofconﬁdencewithinthetheoreticalworld,itcannotescapeitintheworldofdataanalysisbecausethereremainstheadditionallayerofidentifyingdatawithrandomvariables.Accordingtothepragmaticin-terpretation,theposteriorisnot,literally,astatementaboutthewaytheobserveddatarelatetotheunknownparameterμbecausethoseobjectsliveindifferentworlds.ThelanguageofBayesianinference,likethelanguageoffrequentistinference,takesaconvenientshortcutbyblurringthedistinctionbetweendataandrandomvariables.

ThecommonalitybetweenfrequentistandBayesianinferencesistheuseoftheoreticalassumptions,to-getherwithasubjunctivestatement.Inbothapproachesastatisticalmodelisintroduced—intheBayesiancasethepriordistributionsbecomepartofwhatIamherecallingthemodel—andwemaysaythattheinference

STATISTICALINFERENCE5

isbasedonwhatwouldhappenifthedataweretoberandomvariablesdistributedaccordingtothestatisticalmodel.Thismodelingassumptionwouldbereasonableifthemodelweretodescribeaccuratelythevariationinthedata.

4.IMPLICATIONSFORTEACHING

Itisimportantforstudentsinintroductorystatisticscoursestoseethesubjectasacoherent,principledwhole.Instructors,andtextbookauthors,maytrytohelpbyprovidingsomenotionofa“bigpicture.”Oftenthisisdoneliterally,withanillustrationsuchasFig-ure3(e.g.,Lovett,MeyerandThille,2008).Thiskindofillustrationcanbeextremelyusefulifreferencedre-peatedlythroughoutacourse.

Figure3representsastandardstoryaboutstatisticalinference.Fisherintroducedtheideaofarandomsam-pledrawnfromahypotheticalinﬁnitepopulation,andNeymanandPearson’sworkencouragedsubsequentmathematicalstatisticianstodroptheword“hypotheti-cal”andinsteaddescribestatisticalinferenceasanalo-goustosimplerandomsamplingfromaﬁnitepopula-tion.ThisistheconceptthatFigure3triestogetacross.Mycomplaintisthatitisnotagoodgeneraldescrip-tionofstatisticalinference,andmyclaimisthatFig-ure1ismoreaccurate.Forinstance,inthepsychophys-icalexampleofHecht,SchlaerandPirennediscussedinSection2,thereisnopopulationof“yes”or“no”repliesfromwhicharandomsampleisdrawn.Wedonotneedtostruggletomakeananalogywithasimplerandomsample.Furthermore,anythoughtsalongtheselinesmaydrawattentionawayfromthemostimportanttheoreticalassumptions,suchasindependenceamongtheresponses.Figure1issupposedtoremindstudentstolookfortheimportantassumptions,andaskwhethertheydescribethevariationinthedatareasonablyaccu-rately.

FIG.3.Thebigpictureofstatisticalinferenceaccordingtothestandardconception.Here,arandomsampleispicturedasasam-plefromaﬁnitepopulation.

Oneofthereasonsthepopulationandsamplepic-tureinFigure3issoattractivepedagogicallyisthatitreinforcesthefundamentaldistinctionbetweenparam-etersandstatisticsthroughthetermspopulationmeanandsamplemean.Tomywayofthinking,thistermi-nology,inheritedfromFisher,isunfortunate.Insteadof“populationmean”Iwouldmuchprefertheoreticalmean,becauseitcapturesbetterthenotionthatatheo-reticaldistributionisbeingintroduced,anotionthatisreinforcedbyFigure1.

IhavefoundFigure1helpfulinteachingbasicstatis-tics.Forinstance,whentalkingaboutrandomvariablesIliketobeginwithasetofdata,wherevariationisdisplayedinahistogram,andthensaythatprobabil-itymaybeusedtodescribesuchvariation.Ithentellthestudentswemustintroducemathematicalobjectscalledrandomvariables,andindeﬁningthemandap-plyingtheconcepttothedataathand,Iimmediatelyacknowledgethatthisisanabstraction,whilealsostat-ingthat—asthestudentswillseerepeatedlyinmanyexamples—itcanbeanextraordinarilyusefulabstrac-tionwheneverthetheoreticalworldofrandomvari-ablesisalignedwellwiththerealworldofthedata.IhavealsousedFigure1inmyclasseswhende-scribingattitudestowarddataanalysisthatstatisticaltrainingaimstoinstill.Speciﬁcally,Ideﬁnestatisticalthinking,asinthearticlebyBrownandKass(2009),toinvolvetwoprinciples:

1.Statisticalmodelsofregularityandvariabilityindatamaybeusedtoexpressknowledgeanduncer-taintyaboutasignalinthepresenceofnoise,viainductivereasoning.

2.Statisticalmethodsmaybeanalyzedtodeterminehowwelltheyarelikelytoperform.Principle1identiﬁesthesourceofstatisticalinfer-encetobethehypothesizedlinkbetweendataandsta-tisticalmodels.Inexplaining,Iexplicitlydistinguishtheuseofprobabilitytodescribevariationandtoex-pressknowledge.Aprobabilisticdescriptionofvaria-tionwouldbe“Theprobabilityofrollinga3withafairdieis1/6”whileanexpressionofknowledgewouldbe“I’m90%surethecapitalofWyomingisCheyenne.”Thesetwosortsofstatements,whichuseprobabilityindifferentways,aresometimesconsideredtoinvolvetwodifferentkindsofprobability,whichhavebeencalled“aleatoryprobability”and“epistemicprobabil-ity.”Bayesiansmergethese,applyingthelawsofprob-abilitytogofromquantitativedescriptiontoquantiﬁedbelief,butineveryformofstatisticalinferencealeatory

6R.E.KASS

FIG.4.Amoreelaboratebigpicture,reﬂectingingreaterdetailtheprocessofstatisticalinference.AsinFigure1,thereisahypotheticallinkbetweendataandstatisticalmodelsbutherethedataareconnectedmorespeciﬁcallytotheirrepresentationasrandomvariables.

probabilityisused,somehow,tomakeepistemicstate-ments.ThisisPrinciple1.Principle2isthatthesamesortsofstatisticalmodelsmaybeusedtoevaluatesta-tisticalprocedures—thoughintheclassroomIalsoex-plainthatperformanceofproceduresisusuallyinves-tigatedundervaryingcircumstances.

Forsomewhatmoreadvancedaudiencesitispossi-bletoelaborate,describinginmoredetailtheprocesstrainedstatisticiansfollowwhenreasoningfromdata.AbigpictureoftheoverallprocessisgiveninFig-ure4.Thatﬁgureindicatesthehypotheticalconnectionbetweendataandrandomvariables,betweenkeyfea-turesofunobservedmechanismsandparameters,andbetweenreal-worldandtheoreticalconclusions.Itfur-therindicatesthatdatadisplaybothregularity(whichisoftendescribedintheoreticaltermsasa“signal,”some-timesconformingtosimplemathematicaldescriptionsor“laws”)andunexplainedvariability,whichisusu-allytakentobe“noise.”Theﬁgurealsoincludesthecomponentsexploratorydataanalysis—EDA—andal-gorithms,butthemainmessageofFigure4,givenbythelabelsofthetwobigboxes,isthesameasthatinFigure1.

5.DISCUSSION

Accordingtomyunderstanding,laidoutabove,sta-tisticalpragmatismhastwomainfeatures:itiseclecticanditemphasizestheassumptionsthatconnectstatisti-calmodelswithobserveddata.Thepragmaticviewac-knowledgesthatbothsidesofthefrequentist-Bayesiandebatemadeimportantpoints.Bayesiansscoffedatthe

artiﬁcialityinusingsamplingfromaﬁnitepopulationtomotivateallofinference,andinusinglong-runbe-haviortodeﬁnecharacteristicsofprocedures.Withinthetheoreticalworld,posteriorprobabilitiesaremoredirect,andthereforeseemedtooffermuchstrongerinferences.Frequentistsbristled,pointingtothesub-jectivityofpriordistributions.Bayesiansrespondedbytreatingsubjectivityasavirtueonthegroundsthatallinferencesaresubjectiveyet,whilethereisakerneloftruthinthisobservation—weareallhumanbeings,makingourownjudgments—subjectivismwasneversatisfyingasalogicalframework:animportantpur-poseofthescientiﬁcenterpriseistogobeyondper-sonaldecision-making.Nonetheless,fromapragmaticperspective,whiletheselectionofpriorprobabilitiesisimportant,theiruseisnotsoproblematicastodisqual-ifyBayesianmethods,andinlookingbackonhistorytheintroductionofpriordistributionsmaynothavebeenthecentralbothersomeissueitwasmadeouttobe.Instead,itseemstome,thereallytroublingpointforfrequentistshasbeentheBayesianclaimtoaphilo-sophicalhighground,wherecompellinginferencescouldbedeliveredatnegligiblelogicalcost.Frequen-tistshavealwaysfeltthatnosuchthingshouldbepos-sible.Thedifﬁcultybeginsnotwiththeintroductionofpriordistributionsbutwiththegapbetweenmodelsanddata,whichisneitherfrequentistnorBayesian.Statisti-calpragmatismavoidsthisirritationbyacknowledgingexplicitlythetenuousconnectionbetweentherealandtheoreticalworlds.Asaresult,itsinferencesarenec-essarilysubjunctive.Wespeakofwhatwouldbein-ferredifourassumptionsweretohold.Theinferential

STATISTICALINFERENCE7

bridgeistraversed,bybothfrequentistandBayesianmethods,whenweactasifthedataweregeneratedbyrandomvariables.Inthenormalmeanexampledis-cussedinSection4,thekeystepinvolvestheconjunc-tionofthetwoequations(1)and(2).Strictlyspeaking,accordingtostatisticalpragmatism,equation(1)livesinthetheoreticalworldwhileequation(2)livesintherealworld;thebridgeisbuiltbyallowingx¯torefertoboththetheoreticalvalueoftherandomvariableandtheobserveddatavalue.

InponderingthenatureofstatisticalinferenceIam,likeothers,guidedpartlybypastandpresentsages(foranoverviewseeBarnett,1999),butalsobymyownexperienceandbywatchingmanycolleaguesinaction.ManyofthesharpestandmostviciousBayes-frequentistdebatestookplaceduringthedominanceofpuretheoryinacademia.Statisticiansarenowmoreinclinedtoargueabouttheextenttowhichamethodsucceedsinsolvingadataanalyticproblem.Muchsta-tisticalpracticerevolvesaroundgettinggoodestimatesandstandarderrorsincomplicatedsettingswheresta-tisticaluncertaintyissmallerthantheunquantiﬁedag-gregateofmanyotheruncertaintiesinscientiﬁcinves-tigation.Insuchcontexts,thedistinctionbetweenfre-quentistandBayesianlogicbecomesunimportantandcontemporarypractitionersmovefreelybetweenfre-quentistandBayesiantechniquesusingoneortheotherdependingontheproblem.Thus,inareviewofsta-tisticalmethodsinneurophysiologyinwhichmycol-leaguesandIdiscussedbothfrequentistandBayesianmethods(Kass,VenturaandBrown,2005),notonlydidwenotemphasizethisdichotomybutwedidnotevenmentionthedistinctionbetweentheapproachesortheirinferentialinterpretations.

Infact,inmyﬁrstpublicationinvolvinganalysisofneuraldata(Olsonetal.,2001)wereportedmorethanadozendifferentstatisticalanalyses,somefre-quentist,someBayesian.Furthermore,methodsfromthetwoapproachesaresometimesgluedtogetherinasingleanalysis.Forexample,toexamine1severalneu-ralﬁring-rateintensityfunctionsλ(t),...,λp(t),as-sumedtobesmoothfunctionsoftimet,Behsetaetal.(2007)developedafrequentistapproachtotestingthehypothesisH0:λ1(t)=···=λp(t),forallt,thatin-corporatedBARSsmoothing.Suchhybridsarenotun-common,andtheydonotforceapractitionertowalkaroundwithmutuallyinconsistentinterpretationsofstatisticalinference.Figure1providesageneralframe-workthatencompassesbothofthemajorapproachestomethodologywhileemphasizingtheinherentgapbe-tweendataandmodelingassumptions,agapthatisbridgedthroughsubjunctivestatements.Theadvantageofthepragmaticframeworkisthatitconsidersfrequen-tistandBayesianinferencetobeequallyrespectableandallowsustohaveaconsistentinterpretation,with-outfeelingasifwemusthavesplitpersonalitiesinor-dertobecompetentstatisticians.Moretothepoint,thisframeworkseemstometoresemblemorecloselywhatwedoinpractice:statisticiansofferinferencescouchedinacautionaryattitude.Perhapswemightevensaythatmostpractitionersaresubjunctivists.

Ihaveemphasizedsubjunctivestatementspartlybe-cause,onthefrequentistside,theyeliminateanyneedforlong-runinterpretation.ForBayesianmethodstheyeliminaterelianceonsubjectivism.TheBayesianpointofviewwasarticulatedadmirablybyJeffreys(seeRobert,ChopinandRousseau,2009,andaccompany-ingdiscussion)butitbecameclear,especiallyfromtheargumentsofSavageandsubsequentinvestigationsinthe1970s,thattheonlysolidfoundationforBayesian-ismissubjective(seeKassandWasserman,1996,andKass,2006).Statisticalpragmatismpullsusoutofthatsolipsisticquagmire.Ontheotherhand,Idonotmeantoimplythatitreallydoesnotmatterwhatapproachistakeninaparticularinstance.Currentattentionfrequentlyfocusesonchallenging,high-dimensionaldatasetswherefrequentistandBayesianmethodsmaydiffer.Statisticalpragmatismisagnosticonthis.In-stead,proceduresshouldbejudgedaccordingtotheirperformanceundertheoreticalconditionsthoughttocapturerelevantreal-worldvariationinaparticularap-pliedsetting.Thisiswhereourjuxtapositionofthetheoreticalworldwiththerealworldearnsitskeep.IcalledthestoryaboutstatisticalinferencetoldbyFigure3“standard”becauseitisimbeddedinmanyintroductorytexts,suchasthepath-breakingbookbyFreedman,PisaniandPurves(2007)andtheexcellentandverypopularbookbyMooreandMcCabe(2005).Mycriticismisthatthestandardstorymisrepresentsthewaystatisticalinferenceiscommonlyunderstoodbytrainedstatisticians,portrayingitasanalogoustosimplerandomsamplingfromaﬁnitepopulation.AsInoted,thepopulationversussamplingterminologycomesfromFisher,butIbelievetheconceptioninFig-ure1isclosertoFisher’sconceptionoftherelation-shipbetweentheoryanddata.Fisherspokepointedlyofahypotheticalinﬁnitepopulation,butinthestan-dardstoryofFigure3the“hypothetical”partofthisnotion—whichiscrucialtotheconcept—getsdropped(conferalsoLenhard,2006).IunderstandFisher’s“hy-pothetical”toconnotewhatIhaveherecalled“theo-retical.”Fisherdidnotanticipatetheco-optionofhis

8R.E.KASS

frameworkandwas,inlargepartforthisreason,horri-ﬁedbysubsequentdevelopmentsbyNeymanandPear-son.Theterminology“theoretical”avoidsthisconfu-sionandthusmayofferaclearerrepresentationofFisher’sidea.1

WenowrecognizeNeymanandPearsontohavemadepermanent,importantcontributionstostatisticalinferencethroughtheirintroductionofhypothesistest-ingandconﬁdence.Fromtoday’svantagepoint,how-ever,theirbehavioralinterpretationseemsquaint,espe-ciallywhenrepresentedbytheirfamousdictum,“Weareinclinedtothinkthatasfarasaparticularhypothe-sisisconcerned,notestbaseduponthetheoryofprob-abilitycanbyitselfprovideanyvaluableevidenceofthetruthorfalsehoodofthathypothesis.”Nonetheless,thatinterpretationseemstohaveinspiredtheattitudebehindFigure3.Intheextreme,onemaybeledtoin-sistthatstatisticalinferencesarevalidonlywhensomechancemechanismhasgeneratedthedata.Theprob-lemwiththechance-mechanismconceptionisthatitappliestoarathersmallpartoftherealworld,wherethereiseitheractualrandomsamplingorsituationsde-scribedbystatisticalorquantumphysics.Ibelievethechance-mechanismconceptionerrsindeclaringthatdataareassumedtoberandomvariables,ratherthanallowingthegapofFigure1tobebridged2bystate-mentssuchas(2).InsayingthisIamtryingtolistencarefullytothevoiceinmyheadthatcomesfromthelateDavidFreedman(seeFreedmanandZiesel,1988).Iimaginehemightcallcrossingthisbridge,intheab-senceofanexplicitchancemechanism,aleapoffaith.InastrictsenseIaminclinedtoagree.Itseemstome,however,thatitispreciselythisleapoffaiththatmakesstatisticalreasoningpossibleinthevastmajorityofap-plications.

Statisticalmodelsthatgobeyondchancemecha-nismshavebeencentraltostatisticalinferencesinceFisherandJeffreys,andtheirroleinreasoninghasbeenconsideredbymanyauthors(e.g.,Cox,1990;Lehmann,1990).Anoutstandingissueistheextenttowhichstatisticalmodelsarelikethetheoreticalmodelsusedthroughoutscience(seeStanford,2006).Iwouldargue,ontheonehand,thattheyaresimilar:themostfundamentalbeliefofanyscientististhatthetheoreti-calandrealworldsarealigned.Ontheotherhand,as

1Fisheralsointroducedpopulationspartlybecauseheusedlong-

runfrequencyasafoundationforprobability,whichstatisticalpragmatismconsidersunnecessary.

2Becauseprobabilityisintroducedwiththegoalofdrawingcon-clusionsviastatisticalinference,itis,inaphilosophicalsense,“in-strumental.”SeeGlymour(2001).

observedinSection2,statisticsisuniqueinhavingtofacethegapbetweentheoreticalandrealworldseverytimeamodelisappliedand,itseemstome,thisisabigpartofwhatweofferourscientiﬁccollaborators.Sta-tisticalpragmatismrecognizesthatallformsofstatis-ticalinferencemakeassumptions,assumptionswhichcanonlybetestedverycrudely(withsuchthingsasgoodness-of-ﬁtmethods)andcanalmostneverbeveri-ﬁed.Thisisnotonlyattheheartofstatisticalinference,itisalsothegreatwisdomofourﬁeld.

ACKNOWLEDGMENTS

ThisworkwassupportedinpartbyNIHGrantMH064537.TheauthorisgratefulforcommentsonanearlierdraftbyBrianJunker,NancyReid,StevenStigler,LarryWassermanandGordonWeinberg.

REFERENCES

BARNETT,V.(1999).ComparativeStatisticalInference,3rded.Wiley,NewYork.MR0663189

BEHSETA,S.,KASS,R.E.,MOORMAN,D.andOLSON,C.R.(2007).Testingequalityofseveralfunctions:Analysisofsingle-unitﬁringratecurvesacrossmultipleexperimentalconditions.Statist.Med.263958–3975.MR2395881

BROWN,E.N.andKASS,R.E.(2009).Whatisstatistics?(withdiscussion).Amer.Statist.63105–123.

COX,D.R.(1990).Roleofmodelsinstatisticalanalysis.Statist.Sci.5169–174.MR1062575

DIMATTEO,I.,GENOVESE,C.R.andKASS,R.E.(2001).Bayesiancurve-ﬁttingwithfree-knotsplines.Biometrika881055–1071.MR1872219

FREEDMAN,D.,PISANI,R.andPURVES,R.(2007).Statistics,4thed.W.W.Norton,NewYork.

FREEDMAN,D.andZIESEL(1988).Frommouse-to-man:Thequantitativeassessmentofcancerrisks(withdiscussion).Statist.Sci.33–56.

GLYMOUR,C.(2001).Instrumentalprobability.Monist84284–300.

HECHT,S.,SCHLAER,S.andPIRENNE,M.H.(1942).Energy,quantaandvision.J.Gen.Physiol.25819–840.

KASS,R.E.(2006).KindsofBayesians(commentonarti-clesbyBergerandbyGoldstein).BayesianAnal.1437–440.MR2221277

KASS,R.E.,VENTURA,V.andBROWN,E.N.(2005).Statisticalissuesintheanalysisofneuronaldata.J.Neurophysiol.948–25.KASS,R.E.andWASSERMAN,L.A.(1996).Theselectionofpriordistributionsbyformalrules.J.Amer.Statist.Assoc.911343–1370.MR1478684

LEHMANN,E.L.(1990).Modelspeciﬁcation:TheviewsofFisherandNeyman,andlaterdevelopments.Statist.Sci.5160–168.MR1062574

STATISTICALINFERENCE

LENHARD,J.(2006).Modelsandstatisticalinference:Thecontro-versybetweenFisherandNeyman–Pearson.BritishJ.Philos.Sci.5769–91.MR2209772

LOVETT,M.,MEYER,O.andTHILLE,C.(2008).Theopenlearn-inginitiative:MeasuringtheeffectivenessoftheOLIstatis-ticscourseinacceleratingstudentlearning.J.Interact.MediaEduc.14.

MOORE,D.S.andMCCABE,G.(2005).IntroductiontothePrac-ticeofStatistics,5thed.W.H.Freeman,NewYork.

OLSON,C.R.,GETTNER,S.N.,VENTURA,V.,CARTA,R.andKASS,R.E.(2001).Neuronalactivityinmacaquesupplemen-taryeyeﬁeldduringplanningofsaccadesinresponsetopatternandspatialcues.J.Neurophysiol.841369–1384.

ROBERT,C.P.,CHOPIN,N.andROUSSEAU,J.(2009).HaroldJeffreys’theoryofprobabilityrevisited(withdiscussion).Statist.Sci.24141–194.MR2655841

ROLLENHAGEN,J.E.andOLSON,C.R.(2005).Low-frequencyoscillationsarisingfromcompetitiveinteractionsbetweenvi-sualstimuliinmacaqueinferotemporalcortex.J.Neurophysiol.943368–3387.

STANFORD,P.K.(2006).ExceedingOurGrasp.OxfordUniv.Press.

WALLSTROM,G.,LIEBNER,J.andKASS,R.E.(2008).Anim-plementationofBayesianadaptiveregressionsplines(BARS)inCwithSandRwrappers.J.Statist.Software261–21.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文