您的当前位置:首页正文

统计学综述

来源:画鸵萌宠网
StatisticalScience

2011,Vol.26,No.1,1–9DOI:10.1214/10-STS337

©InstituteofMathematicalStatistics,2011

StatisticalInference:TheBigPicture1

RobertE.Kass

Abstract.Statisticshasmovedbeyondthefrequentist-Bayesiancontrover-siesofthepast.Wheredoesthisleaveourabilitytointerpretresults?Isuggestthataphilosophycompatiblewithstatisticalpractice,labeledherestatisti-calpragmatism,servesasafoundationforinference.Statisticalpragmatismisinclusiveandemphasizestheassumptionsthatconnectstatisticalmodelswithobserveddata.IarguethatintroductorycoursesoftenmischaracterizetheprocessofstatisticalinferenceandIproposeanalternative“bigpicture”depiction.

Keywordsandphrases:Bayesian,confidence,frequentist,statisticaledu-cation,statisticalpragmatism,statisticalsignificance.

1.INTRODUCTION

Theprotractedbattleforthefoundationsofstatis-tics,joinedvociferouslybyFisher,Jeffreys,Neyman,Savageandmanydisciples,hasbeendeeplyilluminat-ing,butithasleftstatisticswithoutaphilosophythatmatchescontemporaryattitudes.Becauseeachcamptookasitsgoalexclusiveownershipofinference,eachwasdoomedtofailure.Wehaveall,ornearlyall,movedpasttheseolddebates,yetourtextbookexpla-nationshavenotcaughtupwiththeeclecticismofsta-tisticalpractice.

Thedifficultiesgobothways.Bayesianshavede-niedtheutilityofconfidenceandstatisticalsignifi-cance,attemptingtosweepasidetheobvioussuccessoftheseconceptsinappliedwork.Meanwhile,fortheirpart,frequentistshaveignoredthepossibilityofinfer-enceaboutuniqueeventsdespitetheirubiquitousoc-currencethroughoutscience.Furthermore,interpreta-tionsofposteriorprobabilityintermsofsubjectivebe-lief,orconfidenceintermsoflong-runfrequency,givestudentsalimitedandsometimesconfusingviewofthenatureofstatisticalinference.Whenusedtointroducetheexpressionofuncertaintybasedonarandomsam-RobertE.KassisProfessor,DepartmentofStatistics,CenterfortheNeuralBasisofCognitionandMachineLearningDepartment,CarnegieMellonUniversity,Pittsburgh,Pennsylvania15213,USA(e-mail:kass@stat.cmu.edu).

1Discussedin1012.14/11-STS337C,1012.14/11-STS337A,

ple,thesecaricaturesforfeitanopportunitytoarticulateafundamentalattitudeofstatisticalpractice.

Mostmodernpractitionershave,Ithink,anopen-mindedviewaboutalternativemodesofinference,butareacutelyawareoftheoreticalassumptionsandthemanywaystheymaybemistaken.Iwouldsuggestthatitmakesmoresensetoplaceinthecenterofourlog-icalframeworkthematchormismatchoftheoreticalassumptionswiththerealworldofdata.This,itseemstome,isthecommongroundthatBayesianandfre-quentiststatisticsshare;itismorefundamentalthanei-therparadigmtakenseparately;andaswestrivetofos-terwidespreadunderstandingofstatisticalreasoning,itismoreimportantforbeginningstudentstoappreciatetheroleoftheoreticalassumptionsthanforthemtore-citecorrectlythelong-runinterpretationofconfidenceintervals.Withthehopeofproddingourdisciplinetorightalingeringimbalance,Iattemptheretodescribethedominantcontemporaryphilosophyofstatistics.

2.STATISTICALPRAGMATISM

Iproposetocallthismodernphilosophystatisticalpragmatism.Ithinkitisbasedonthefollowingatti-tudes:

1.Confidence,statisticalsignificance,andposteriorprobabilityareallvaluableinferentialtools.

2.Simplechancesituations,wherecountingargu-mentsmaybebasedonsymmetriesthatgenerateequallylikelyoutcomes(sixfacesonafairdie;52cardsinashuffleddeck),supplybasicintuitions

1

1012.14/11-STS337Dand1012.14/11-STS337B;rejoinderat1012.14/11-STS337REJ.

2R.E.KASS

aboutprobability.Probabilitymaybebuiltuptoim-portantbutlessimmediatelyintuitivesituationsus-ingabstractmathematics,muchthewayrealnum-bersaredefinedabstractlybasedonintuitionscom-ingfromfractions.Probabilityisusefullycalibratedintermsoffairbets:anotherwaytosaytheproba-bilityofrollinga3withafairdieis1/6isthat5to1oddsagainstrollinga3wouldbeafairbet.

3.Long-runfrequenciesareimportantmathemati-cally,interpretively,andpedagogically.However,itispossibletoassignprobabilitiestouniqueevents,includingrollinga3withafairdieorhavingacon-fidenceintervalcoverthetruemean,withoutcon-sideringlong-runfrequency.Long-runfrequenciesmayberegardedasconsequencesofthelawoflargenumbersratherthanaspartofthedefinitionofprob-abilityorconfidence.

4.Similarly,thesubjectiveinterpretationofposteriorprobabilityisimportantasawayofunderstandingBayesianinference,butitisnotfundamentaltoitsuse:inreportinga95%posteriorintervaloneneednotmakeastatementsuchas,“Mypersonalproba-bilityofthisintervalcoveringthemeanis0.95.”5.Statisticalinferencesofallkindsusestatisticalmodels,whichembodytheoreticalassumptions.AsillustratedinFigure1,likescientificmodels,sta-tisticalmodelsexistinanabstractframework;todistinguishthisframeworkfromtherealworldin-habitedbydatawemaycallita“theoreticalworld.”Randomvariables,confidenceintervals,andposte-riorprobabilitiesallliveinthistheoreticalworld.Whenweuseastatisticalmodeltomakeastatisti-calinferenceweimplicitlyassertthatthevariationexhibitedbydataiscapturedreasonablywellbythestatisticalmodel,sothatthetheoreticalworldcorrespondsreasonablywelltotherealworld.Con-clusionsaredrawnbyapplyingastatisticalinfer-encetechnique,whichisatheoreticalconstruct,tosomerealdata.Figure1depictstheconclusionsasstraddlingthetheoreticalandrealworlds.Statisti-calinferencesmayhaveimplicationsfortherealworldofnewobservablephenomena,butinscien-tificcontexts,conclusionsmostoftenconcernscien-tificmodels(ortheories),sothattheir“realworld”implications(involvingnewdata)aresomewhatin-direct(thenewdatawillinvolvenewanddifferentexperiments).ThestatisticalmodelsinFigure1couldinvolvelargefunctionspacesorotherrelativelyweakprobabilisticassumptions.Carefulconsiderationoftheconnection

FIG.1.Thebigpictureofstatisticalinference.Statisticalpro-ceduresareabstractlydefinedintermsofmathematicsbutareused,inconjunctionwithscientificmodelsandmethods,toexplainobservablephenomena.Thispictureemphasizesthehypotheticallinkbetweenvariationindataanditsdescriptionusingstatisticalmodels.

betweenmodelsanddataisacorecomponentofboththeartofstatisticalpracticeandthescienceofstatis-ticalmethodology.ThepurposeofFigure1istoshiftthegroundsfordiscussion.

Note,inparticular,thatdatashouldnotbeconfusedwithrandomvariables.Randomvariablesliveinthetheoreticalworld.Whenwesaythingslike,“Letusas-sumethedataarenormallydistributed”andwepro-ceedtomakeastatisticalinference,wedonotneedtotakethesewordsliterallyasassertingthatthedataformarandomsample.Instead,thiskindoflanguageisaconvenientandfamiliarshorthandforthemuchweakerassertionthat,forourspecifiedpurposes,thevariabilityofthedataisadequatelyconsistentwithvariabilitythatwouldoccurinarandomsample.ThislinguisticamenityisusedroutinelyinbothfrequentistandBayesianframeworks.Historically,thedistinctionbetweendataandrandomvariables,thematchofthemodeltothedata,wassetaside,tobetreatedasaseparatetopicapartfromthefoundationsofinference.Butoncethedatathemselveswereconsideredrandomvariables,thefrequentist-Bayesiandebatemovedintothetheoreticalworld:itbecameadebateaboutthebestwaytoreasonfromrandomvariablestoinferencesaboutparameters.Thiswasconsistentwithdevelop-mentselsewhere.Inotherpartsofscience,thedistinc-tionbetweenquantitiestobemeasuredandtheirthe-oreticalcounterpartswithinamathematicaltheorycanberelegatedtoadifferentsubject—toatheoryofer-rors.Instatistics,wedonothavethatluxury,anditseemstomeimportant,fromapragmaticviewpoint,tobringtocenterstagetheidentificationofmodelswithdata.Thepurposeofdoingsoisthatitprovidesdif-ferentinterpretationsofbothfrequentistandBayesianinference,interpretationswhich,Ibelieve,areclosertotheattitudeofmodernstatisticalpractitioners.

STATISTICALINFERENCE3

(A)

(B)

FIG.2.(A)BARSfitstoapairofperi-stimulustimehistogramsdisplayingneuralfiringrateofaparticularneuronundertwoalternativeexperimentalconditions.(B)ThetwoBARSfitsareoverlaidforeaseofcomparison.

Afamiliarpracticalsituationwheretheseissuesariseisbinaryregression.AclassicexamplecomesfromapsychophysicalexperimentconductedbyHecht,SchlaerandPirenne(1942),whoinvestigatedthesen-sitivityofthehumanvisualsystembyconstructinganapparatusthatwouldemitflashesoflightatverylowintensityinadarkenedroom.Thoseauthorspresentedlightofvaryingintensitiesrepeatedlytoseveralsub-jectsanddetermined,foreachintensity,theproportionoftimeseachsubjectwouldrespondthatheorshehadseenaflashoflight.Foreachsubjecttheresultingdataarerepeatedbinaryobservations(“yes”perceivedver-sus“no”didnotperceive)ateachofmanyintensitiesand,thesedays,thestandardstatisticaltooltoanalyzesuchdataislogisticregression.Wemight,forinstance,usemaximumlikelihoodtofinda95%confidencein-tervalfortheintensityoflightatwhichthesubjectwouldreportperceptionwithprobabilityp=0.5.Be-causethedatareportedbyHechtetal.involvedfairlylargesamples,wewouldobtainessentiallythesameanswerifinsteadweappliedBayesianmethodstogetanintervalhaving95%posteriorprobability.Buthowshouldsuchanintervalbeinterpreted?

AmorerecentexamplecomesfromDiMatteo,Gen-oveseandKass(2001),whoillustratedanewnon-parametricregressionmethodcalledBayesianadap-tiveregressionsplines(BARS)byanalyzingneuralfir-ingratedatafrominferotemporalcortexofamacaque

monkey.Thedatacamefromastudyultimatelyre-portedbyRollenhagenandOlson(2005),whichin-vestigatedthedifferentialresponseofindividualneu-ronsundertwoexperimentalconditions.Figure2dis-playsBARSfitsunderthetwoconditions.Onewaytoquantifythediscrepancybetweenthefitsistoesti-matethedropinfiringratefrompeak(themaximalfir-ingrate)tothetroughimmediatelyfollowingthepeakineachcondition.Letuscallthesepeakminustroughdifferences,underthetwoconditions,φ1andφ2.Us-ingBARS,DiMatteo,GenoveseandKassreporteda

ˆ1−φˆ2=50.0withposteriorstan-posteriormeanofφ

darddeviation(±20.8).Infollow-upwork,Wallstrom,LiebnerandKass(2008)reportedverygoodfrequen-tistcoverageprobabilityof95%posteriorprobabilityintervalsbasedonBARSforsimilarquantitiesundersimulationconditionschosentomimicsuchexperi-mentaldata.Thus,aBARS-basedposteriorintervalcouldbeconsideredfromeitheraBayesianorfrequen-tistpointofview.Againwemayaskhowsuchanin-ferentialintervalshouldbeinterpreted.

3.INTERPRETATIONS

Statisticalpragmatisminvolvesmildlyalteredinter-pretationsoffrequentistandBayesianinference.FordefinitenessIwilldiscusstheparadigmcaseofconfi-denceandposteriorintervalsforanormalmeanbased

4R.E.KASS

onasampleofsizen,withthestandarddeviationbe-ingknown.Supposethatwehaven=49observationsthathaveasamplemeanequalto10.2.

FREQUENTISTASSUMPTIONS.SupposeX1,X2,...,Xnarei.i.d.randomvariablesfromanormaldistri-butionwithmeanμandstandarddeviationσ=1.Inotherwords,supposeX1,X2,...,XnformarandomsamplefromaN(μ,1)distribution.

Notingthatx¯=10.2and√

49=7wedefinethein-ferentialinterval

I=󰀂

10.2−27,10.2+

2󰀃

7.

TheintervalImayberegardedasa95%confidence

interval.Inowcontrastthestandardfrequentistinter-pretationwiththepragmaticinterepretation.

FREQUENTISTINTERPRETATIONOFCONFIDENCEINTERVAL.Undertheassumptionsabove,ifweweretodrawinfinitelymanyrandomsamplesfromaN(μ,1)distribution,95%ofthecorrespondingconfi-denceintervals(X¯−27,X¯+27)wouldcoverμ.PRAGMATICINTERPRETATIONOFCONFIDENCE

INTERVAL.Ifweweretodrawarandomsampleac-cordingtotheassumptionsdenceinterval(X¯above,theresultingconfi-−2,X¯+2)wouldhaveprobability0.95ofcoveringμ.Because77therandomsamplelivesinthetheoreticalworld,thisisatheoreticalstatement.Nonetheless,substituting(1)

X¯=x¯

togetherwith(2)

x¯=10.2weobtaintheintervalI,andareabletodrawuseful

conclusionsaslongasourtheoreticalworldisalignedwellwiththerealworldthatproducedthedata.Themainpointhereisthatwedonotneedalong-runinterpretationofprobability,butwedohavetoberemindedthattheunique-eventprobabilityof0.95remainsatheoreticalstatementbecauseitappliestorandomvariablesratherthandata.LetusturntotheBayesiancase.

BAYESIANASSUMPTIONS.SupposeX1,X2,...,XnformarandomsamplefromaN(μ,1)dis-tributionandthepriordistributionofμisN(μ2󰀌0,τ2)withτ491

,

and49τ2󰀌|μ0|.

Theposteriordistributionofμisnormal,theposte-riormeanbecomes

μ¯=τ21/49+τ210.2+1/49

1/49+τ2μ0

andtheposteriorvarianceis

󰀄

󰀅v=49+1

−1

τ

2

butbecauseτ2󰀌491and49τ2󰀌|μ0|wehaveμ¯≈10.2

and

v≈

149

.Therefore,theinferentialintervalIdefinedabovehasposteriorprobability0.95.

BAYESIANINTERPRETATIONOFPOSTERIORIN-TERVAL.Undertheassumptionsabove,theprobabil-itythatμisintheintervalIis0.95.

PRAGMATICINTERPRETATIONOFPOSTERIORIN-TERVAL.Ifthedatawerearandomsampleforwhich(2)holds,thatis,x¯=10.2,andiftheassump-tionsaboveweretohold,thentheprobabilitythatμisintheintervalIwouldbe0.95.Thisreferstheticalvaluex¯toahypo-,andbecauseX

¯¯oftherandomvariableX

livesinthetheoreticalworldthestatementremainstheoretical.Nonetheless,weareabletodrawusefulconclusionsfromthedataaslongasourtheoreticalworldisalignedwellwiththerealworldthatproducedthedata.

Here,althoughtheBayesianapproachescapestheindirectnessofconfidencewithinthetheoreticalworld,itcannotescapeitintheworldofdataanalysisbecausethereremainstheadditionallayerofidentifyingdatawithrandomvariables.Accordingtothepragmaticin-terpretation,theposteriorisnot,literally,astatementaboutthewaytheobserveddatarelatetotheunknownparameterμbecausethoseobjectsliveindifferentworlds.ThelanguageofBayesianinference,likethelanguageoffrequentistinference,takesaconvenientshortcutbyblurringthedistinctionbetweendataandrandomvariables.

ThecommonalitybetweenfrequentistandBayesianinferencesistheuseoftheoreticalassumptions,to-getherwithasubjunctivestatement.Inbothapproachesastatisticalmodelisintroduced—intheBayesiancasethepriordistributionsbecomepartofwhatIamherecallingthemodel—andwemaysaythattheinference

STATISTICALINFERENCE5

isbasedonwhatwouldhappenifthedataweretoberandomvariablesdistributedaccordingtothestatisticalmodel.Thismodelingassumptionwouldbereasonableifthemodelweretodescribeaccuratelythevariationinthedata.

4.IMPLICATIONSFORTEACHING

Itisimportantforstudentsinintroductorystatisticscoursestoseethesubjectasacoherent,principledwhole.Instructors,andtextbookauthors,maytrytohelpbyprovidingsomenotionofa“bigpicture.”Oftenthisisdoneliterally,withanillustrationsuchasFig-ure3(e.g.,Lovett,MeyerandThille,2008).Thiskindofillustrationcanbeextremelyusefulifreferencedre-peatedlythroughoutacourse.

Figure3representsastandardstoryaboutstatisticalinference.Fisherintroducedtheideaofarandomsam-pledrawnfromahypotheticalinfinitepopulation,andNeymanandPearson’sworkencouragedsubsequentmathematicalstatisticianstodroptheword“hypotheti-cal”andinsteaddescribestatisticalinferenceasanalo-goustosimplerandomsamplingfromafinitepopula-tion.ThisistheconceptthatFigure3triestogetacross.Mycomplaintisthatitisnotagoodgeneraldescrip-tionofstatisticalinference,andmyclaimisthatFig-ure1ismoreaccurate.Forinstance,inthepsychophys-icalexampleofHecht,SchlaerandPirennediscussedinSection2,thereisnopopulationof“yes”or“no”repliesfromwhicharandomsampleisdrawn.Wedonotneedtostruggletomakeananalogywithasimplerandomsample.Furthermore,anythoughtsalongtheselinesmaydrawattentionawayfromthemostimportanttheoreticalassumptions,suchasindependenceamongtheresponses.Figure1issupposedtoremindstudentstolookfortheimportantassumptions,andaskwhethertheydescribethevariationinthedatareasonablyaccu-rately.

FIG.3.Thebigpictureofstatisticalinferenceaccordingtothestandardconception.Here,arandomsampleispicturedasasam-plefromafinitepopulation.

Oneofthereasonsthepopulationandsamplepic-tureinFigure3issoattractivepedagogicallyisthatitreinforcesthefundamentaldistinctionbetweenparam-etersandstatisticsthroughthetermspopulationmeanandsamplemean.Tomywayofthinking,thistermi-nology,inheritedfromFisher,isunfortunate.Insteadof“populationmean”Iwouldmuchprefertheoreticalmean,becauseitcapturesbetterthenotionthatatheo-reticaldistributionisbeingintroduced,anotionthatisreinforcedbyFigure1.

IhavefoundFigure1helpfulinteachingbasicstatis-tics.Forinstance,whentalkingaboutrandomvariablesIliketobeginwithasetofdata,wherevariationisdisplayedinahistogram,andthensaythatprobabil-itymaybeusedtodescribesuchvariation.Ithentellthestudentswemustintroducemathematicalobjectscalledrandomvariables,andindefiningthemandap-plyingtheconcepttothedataathand,Iimmediatelyacknowledgethatthisisanabstraction,whilealsostat-ingthat—asthestudentswillseerepeatedlyinmanyexamples—itcanbeanextraordinarilyusefulabstrac-tionwheneverthetheoreticalworldofrandomvari-ablesisalignedwellwiththerealworldofthedata.IhavealsousedFigure1inmyclasseswhende-scribingattitudestowarddataanalysisthatstatisticaltrainingaimstoinstill.Specifically,Idefinestatisticalthinking,asinthearticlebyBrownandKass(2009),toinvolvetwoprinciples:

1.Statisticalmodelsofregularityandvariabilityindatamaybeusedtoexpressknowledgeanduncer-taintyaboutasignalinthepresenceofnoise,viainductivereasoning.

2.Statisticalmethodsmaybeanalyzedtodeterminehowwelltheyarelikelytoperform.Principle1identifiesthesourceofstatisticalinfer-encetobethehypothesizedlinkbetweendataandsta-tisticalmodels.Inexplaining,Iexplicitlydistinguishtheuseofprobabilitytodescribevariationandtoex-pressknowledge.Aprobabilisticdescriptionofvaria-tionwouldbe“Theprobabilityofrollinga3withafairdieis1/6”whileanexpressionofknowledgewouldbe“I’m90%surethecapitalofWyomingisCheyenne.”Thesetwosortsofstatements,whichuseprobabilityindifferentways,aresometimesconsideredtoinvolvetwodifferentkindsofprobability,whichhavebeencalled“aleatoryprobability”and“epistemicprobabil-ity.”Bayesiansmergethese,applyingthelawsofprob-abilitytogofromquantitativedescriptiontoquantifiedbelief,butineveryformofstatisticalinferencealeatory

6R.E.KASS

FIG.4.Amoreelaboratebigpicture,reflectingingreaterdetailtheprocessofstatisticalinference.AsinFigure1,thereisahypotheticallinkbetweendataandstatisticalmodelsbutherethedataareconnectedmorespecificallytotheirrepresentationasrandomvariables.

probabilityisused,somehow,tomakeepistemicstate-ments.ThisisPrinciple1.Principle2isthatthesamesortsofstatisticalmodelsmaybeusedtoevaluatesta-tisticalprocedures—thoughintheclassroomIalsoex-plainthatperformanceofproceduresisusuallyinves-tigatedundervaryingcircumstances.

Forsomewhatmoreadvancedaudiencesitispossi-bletoelaborate,describinginmoredetailtheprocesstrainedstatisticiansfollowwhenreasoningfromdata.AbigpictureoftheoverallprocessisgiveninFig-ure4.Thatfigureindicatesthehypotheticalconnectionbetweendataandrandomvariables,betweenkeyfea-turesofunobservedmechanismsandparameters,andbetweenreal-worldandtheoreticalconclusions.Itfur-therindicatesthatdatadisplaybothregularity(whichisoftendescribedintheoreticaltermsasa“signal,”some-timesconformingtosimplemathematicaldescriptionsor“laws”)andunexplainedvariability,whichisusu-allytakentobe“noise.”Thefigurealsoincludesthecomponentsexploratorydataanalysis—EDA—andal-gorithms,butthemainmessageofFigure4,givenbythelabelsofthetwobigboxes,isthesameasthatinFigure1.

5.DISCUSSION

Accordingtomyunderstanding,laidoutabove,sta-tisticalpragmatismhastwomainfeatures:itiseclecticanditemphasizestheassumptionsthatconnectstatisti-calmodelswithobserveddata.Thepragmaticviewac-knowledgesthatbothsidesofthefrequentist-Bayesiandebatemadeimportantpoints.Bayesiansscoffedatthe

artificialityinusingsamplingfromafinitepopulationtomotivateallofinference,andinusinglong-runbe-haviortodefinecharacteristicsofprocedures.Withinthetheoreticalworld,posteriorprobabilitiesaremoredirect,andthereforeseemedtooffermuchstrongerinferences.Frequentistsbristled,pointingtothesub-jectivityofpriordistributions.Bayesiansrespondedbytreatingsubjectivityasavirtueonthegroundsthatallinferencesaresubjectiveyet,whilethereisakerneloftruthinthisobservation—weareallhumanbeings,makingourownjudgments—subjectivismwasneversatisfyingasalogicalframework:animportantpur-poseofthescientificenterpriseistogobeyondper-sonaldecision-making.Nonetheless,fromapragmaticperspective,whiletheselectionofpriorprobabilitiesisimportant,theiruseisnotsoproblematicastodisqual-ifyBayesianmethods,andinlookingbackonhistorytheintroductionofpriordistributionsmaynothavebeenthecentralbothersomeissueitwasmadeouttobe.Instead,itseemstome,thereallytroublingpointforfrequentistshasbeentheBayesianclaimtoaphilo-sophicalhighground,wherecompellinginferencescouldbedeliveredatnegligiblelogicalcost.Frequen-tistshavealwaysfeltthatnosuchthingshouldbepos-sible.Thedifficultybeginsnotwiththeintroductionofpriordistributionsbutwiththegapbetweenmodelsanddata,whichisneitherfrequentistnorBayesian.Statisti-calpragmatismavoidsthisirritationbyacknowledgingexplicitlythetenuousconnectionbetweentherealandtheoreticalworlds.Asaresult,itsinferencesarenec-essarilysubjunctive.Wespeakofwhatwouldbein-ferredifourassumptionsweretohold.Theinferential

STATISTICALINFERENCE7

bridgeistraversed,bybothfrequentistandBayesianmethods,whenweactasifthedataweregeneratedbyrandomvariables.Inthenormalmeanexampledis-cussedinSection4,thekeystepinvolvestheconjunc-tionofthetwoequations(1)and(2).Strictlyspeaking,accordingtostatisticalpragmatism,equation(1)livesinthetheoreticalworldwhileequation(2)livesintherealworld;thebridgeisbuiltbyallowingx¯torefertoboththetheoreticalvalueoftherandomvariableandtheobserveddatavalue.

InponderingthenatureofstatisticalinferenceIam,likeothers,guidedpartlybypastandpresentsages(foranoverviewseeBarnett,1999),butalsobymyownexperienceandbywatchingmanycolleaguesinaction.ManyofthesharpestandmostviciousBayes-frequentistdebatestookplaceduringthedominanceofpuretheoryinacademia.Statisticiansarenowmoreinclinedtoargueabouttheextenttowhichamethodsucceedsinsolvingadataanalyticproblem.Muchsta-tisticalpracticerevolvesaroundgettinggoodestimatesandstandarderrorsincomplicatedsettingswheresta-tisticaluncertaintyissmallerthantheunquantifiedag-gregateofmanyotheruncertaintiesinscientificinves-tigation.Insuchcontexts,thedistinctionbetweenfre-quentistandBayesianlogicbecomesunimportantandcontemporarypractitionersmovefreelybetweenfre-quentistandBayesiantechniquesusingoneortheotherdependingontheproblem.Thus,inareviewofsta-tisticalmethodsinneurophysiologyinwhichmycol-leaguesandIdiscussedbothfrequentistandBayesianmethods(Kass,VenturaandBrown,2005),notonlydidwenotemphasizethisdichotomybutwedidnotevenmentionthedistinctionbetweentheapproachesortheirinferentialinterpretations.

Infact,inmyfirstpublicationinvolvinganalysisofneuraldata(Olsonetal.,2001)wereportedmorethanadozendifferentstatisticalanalyses,somefre-quentist,someBayesian.Furthermore,methodsfromthetwoapproachesaresometimesgluedtogetherinasingleanalysis.Forexample,toexamine1severalneu-ralfiring-rateintensityfunctionsλ(t),...,λp(t),as-sumedtobesmoothfunctionsoftimet,Behsetaetal.(2007)developedafrequentistapproachtotestingthehypothesisH0:λ1(t)=···=λp(t),forallt,thatin-corporatedBARSsmoothing.Suchhybridsarenotun-common,andtheydonotforceapractitionertowalkaroundwithmutuallyinconsistentinterpretationsofstatisticalinference.Figure1providesageneralframe-workthatencompassesbothofthemajorapproachestomethodologywhileemphasizingtheinherentgapbe-tweendataandmodelingassumptions,agapthatisbridgedthroughsubjunctivestatements.Theadvantageofthepragmaticframeworkisthatitconsidersfrequen-tistandBayesianinferencetobeequallyrespectableandallowsustohaveaconsistentinterpretation,with-outfeelingasifwemusthavesplitpersonalitiesinor-dertobecompetentstatisticians.Moretothepoint,thisframeworkseemstometoresemblemorecloselywhatwedoinpractice:statisticiansofferinferencescouchedinacautionaryattitude.Perhapswemightevensaythatmostpractitionersaresubjunctivists.

Ihaveemphasizedsubjunctivestatementspartlybe-cause,onthefrequentistside,theyeliminateanyneedforlong-runinterpretation.ForBayesianmethodstheyeliminaterelianceonsubjectivism.TheBayesianpointofviewwasarticulatedadmirablybyJeffreys(seeRobert,ChopinandRousseau,2009,andaccompany-ingdiscussion)butitbecameclear,especiallyfromtheargumentsofSavageandsubsequentinvestigationsinthe1970s,thattheonlysolidfoundationforBayesian-ismissubjective(seeKassandWasserman,1996,andKass,2006).Statisticalpragmatismpullsusoutofthatsolipsisticquagmire.Ontheotherhand,Idonotmeantoimplythatitreallydoesnotmatterwhatapproachistakeninaparticularinstance.Currentattentionfrequentlyfocusesonchallenging,high-dimensionaldatasetswherefrequentistandBayesianmethodsmaydiffer.Statisticalpragmatismisagnosticonthis.In-stead,proceduresshouldbejudgedaccordingtotheirperformanceundertheoreticalconditionsthoughttocapturerelevantreal-worldvariationinaparticularap-pliedsetting.Thisiswhereourjuxtapositionofthetheoreticalworldwiththerealworldearnsitskeep.IcalledthestoryaboutstatisticalinferencetoldbyFigure3“standard”becauseitisimbeddedinmanyintroductorytexts,suchasthepath-breakingbookbyFreedman,PisaniandPurves(2007)andtheexcellentandverypopularbookbyMooreandMcCabe(2005).Mycriticismisthatthestandardstorymisrepresentsthewaystatisticalinferenceiscommonlyunderstoodbytrainedstatisticians,portrayingitasanalogoustosimplerandomsamplingfromafinitepopulation.AsInoted,thepopulationversussamplingterminologycomesfromFisher,butIbelievetheconceptioninFig-ure1isclosertoFisher’sconceptionoftherelation-shipbetweentheoryanddata.Fisherspokepointedlyofahypotheticalinfinitepopulation,butinthestan-dardstoryofFigure3the“hypothetical”partofthisnotion—whichiscrucialtotheconcept—getsdropped(conferalsoLenhard,2006).IunderstandFisher’s“hy-pothetical”toconnotewhatIhaveherecalled“theo-retical.”Fisherdidnotanticipatetheco-optionofhis

8R.E.KASS

frameworkandwas,inlargepartforthisreason,horri-fiedbysubsequentdevelopmentsbyNeymanandPear-son.Theterminology“theoretical”avoidsthisconfu-sionandthusmayofferaclearerrepresentationofFisher’sidea.1

WenowrecognizeNeymanandPearsontohavemadepermanent,importantcontributionstostatisticalinferencethroughtheirintroductionofhypothesistest-ingandconfidence.Fromtoday’svantagepoint,how-ever,theirbehavioralinterpretationseemsquaint,espe-ciallywhenrepresentedbytheirfamousdictum,“Weareinclinedtothinkthatasfarasaparticularhypothe-sisisconcerned,notestbaseduponthetheoryofprob-abilitycanbyitselfprovideanyvaluableevidenceofthetruthorfalsehoodofthathypothesis.”Nonetheless,thatinterpretationseemstohaveinspiredtheattitudebehindFigure3.Intheextreme,onemaybeledtoin-sistthatstatisticalinferencesarevalidonlywhensomechancemechanismhasgeneratedthedata.Theprob-lemwiththechance-mechanismconceptionisthatitappliestoarathersmallpartoftherealworld,wherethereiseitheractualrandomsamplingorsituationsde-scribedbystatisticalorquantumphysics.Ibelievethechance-mechanismconceptionerrsindeclaringthatdataareassumedtoberandomvariables,ratherthanallowingthegapofFigure1tobebridged2bystate-mentssuchas(2).InsayingthisIamtryingtolistencarefullytothevoiceinmyheadthatcomesfromthelateDavidFreedman(seeFreedmanandZiesel,1988).Iimaginehemightcallcrossingthisbridge,intheab-senceofanexplicitchancemechanism,aleapoffaith.InastrictsenseIaminclinedtoagree.Itseemstome,however,thatitispreciselythisleapoffaiththatmakesstatisticalreasoningpossibleinthevastmajorityofap-plications.

Statisticalmodelsthatgobeyondchancemecha-nismshavebeencentraltostatisticalinferencesinceFisherandJeffreys,andtheirroleinreasoninghasbeenconsideredbymanyauthors(e.g.,Cox,1990;Lehmann,1990).Anoutstandingissueistheextenttowhichstatisticalmodelsarelikethetheoreticalmodelsusedthroughoutscience(seeStanford,2006).Iwouldargue,ontheonehand,thattheyaresimilar:themostfundamentalbeliefofanyscientististhatthetheoreti-calandrealworldsarealigned.Ontheotherhand,as

1Fisheralsointroducedpopulationspartlybecauseheusedlong-

runfrequencyasafoundationforprobability,whichstatisticalpragmatismconsidersunnecessary.

2Becauseprobabilityisintroducedwiththegoalofdrawingcon-clusionsviastatisticalinference,itis,inaphilosophicalsense,“in-strumental.”SeeGlymour(2001).

observedinSection2,statisticsisuniqueinhavingtofacethegapbetweentheoreticalandrealworldseverytimeamodelisappliedand,itseemstome,thisisabigpartofwhatweofferourscientificcollaborators.Sta-tisticalpragmatismrecognizesthatallformsofstatis-ticalinferencemakeassumptions,assumptionswhichcanonlybetestedverycrudely(withsuchthingsasgoodness-of-fitmethods)andcanalmostneverbeveri-fied.Thisisnotonlyattheheartofstatisticalinference,itisalsothegreatwisdomofourfield.

ACKNOWLEDGMENTS

ThisworkwassupportedinpartbyNIHGrantMH064537.TheauthorisgratefulforcommentsonanearlierdraftbyBrianJunker,NancyReid,StevenStigler,LarryWassermanandGordonWeinberg.

REFERENCES

BARNETT,V.(1999).ComparativeStatisticalInference,3rded.Wiley,NewYork.MR0663189

BEHSETA,S.,KASS,R.E.,MOORMAN,D.andOLSON,C.R.(2007).Testingequalityofseveralfunctions:Analysisofsingle-unitfiringratecurvesacrossmultipleexperimentalconditions.Statist.Med.263958–3975.MR2395881

BROWN,E.N.andKASS,R.E.(2009).Whatisstatistics?(withdiscussion).Amer.Statist.63105–123.

COX,D.R.(1990).Roleofmodelsinstatisticalanalysis.Statist.Sci.5169–174.MR1062575

DIMATTEO,I.,GENOVESE,C.R.andKASS,R.E.(2001).Bayesiancurve-fittingwithfree-knotsplines.Biometrika881055–1071.MR1872219

FREEDMAN,D.,PISANI,R.andPURVES,R.(2007).Statistics,4thed.W.W.Norton,NewYork.

FREEDMAN,D.andZIESEL(1988).Frommouse-to-man:Thequantitativeassessmentofcancerrisks(withdiscussion).Statist.Sci.33–56.

GLYMOUR,C.(2001).Instrumentalprobability.Monist84284–300.

HECHT,S.,SCHLAER,S.andPIRENNE,M.H.(1942).Energy,quantaandvision.J.Gen.Physiol.25819–840.

KASS,R.E.(2006).KindsofBayesians(commentonarti-clesbyBergerandbyGoldstein).BayesianAnal.1437–440.MR2221277

KASS,R.E.,VENTURA,V.andBROWN,E.N.(2005).Statisticalissuesintheanalysisofneuronaldata.J.Neurophysiol.948–25.KASS,R.E.andWASSERMAN,L.A.(1996).Theselectionofpriordistributionsbyformalrules.J.Amer.Statist.Assoc.911343–1370.MR1478684

LEHMANN,E.L.(1990).Modelspecification:TheviewsofFisherandNeyman,andlaterdevelopments.Statist.Sci.5160–168.MR1062574

STATISTICALINFERENCE

9

LENHARD,J.(2006).Modelsandstatisticalinference:Thecontro-versybetweenFisherandNeyman–Pearson.BritishJ.Philos.Sci.5769–91.MR2209772

LOVETT,M.,MEYER,O.andTHILLE,C.(2008).Theopenlearn-inginitiative:MeasuringtheeffectivenessoftheOLIstatis-ticscourseinacceleratingstudentlearning.J.Interact.MediaEduc.14.

MOORE,D.S.andMCCABE,G.(2005).IntroductiontothePrac-ticeofStatistics,5thed.W.H.Freeman,NewYork.

OLSON,C.R.,GETTNER,S.N.,VENTURA,V.,CARTA,R.andKASS,R.E.(2001).Neuronalactivityinmacaquesupplemen-taryeyefieldduringplanningofsaccadesinresponsetopatternandspatialcues.J.Neurophysiol.841369–1384.

ROBERT,C.P.,CHOPIN,N.andROUSSEAU,J.(2009).HaroldJeffreys’theoryofprobabilityrevisited(withdiscussion).Statist.Sci.24141–194.MR2655841

ROLLENHAGEN,J.E.andOLSON,C.R.(2005).Low-frequencyoscillationsarisingfromcompetitiveinteractionsbetweenvi-sualstimuliinmacaqueinferotemporalcortex.J.Neurophysiol.943368–3387.

STANFORD,P.K.(2006).ExceedingOurGrasp.OxfordUniv.Press.

WALLSTROM,G.,LIEBNER,J.andKASS,R.E.(2008).Anim-plementationofBayesianadaptiveregressionsplines(BARS)inCwithSandRwrappers.J.Statist.Software261–21.

因篇幅问题不能全部显示,请点此查看更多更全内容

Top