UsingImageComparisonMethods
PhilippeDreuw,DanielKeysers,ThomasDeselaers,andHermannNey
Lehrstuhlf¨urInformatikVI–ComputerScienceDepartment,RWTHAachenUniversity–D-52056Aachen,Germany
{dreuw,keysers,deselaers,ney}@informatik.rwth-aachen.de
Abstract.Weintroducetheuseofappearance-basedfeatures,andtan-gentdistanceortheimagedistortionmodeltoaccountforimagevariabil-itywithinthehiddenMarkovmodelemissionprobabilitiestorecognizegestures.Notracking,segmentationofthehandorshapemodelshavetobedefined.Thedistancemeasuresalsoperformwellfortemplatematch-ingclassifiers.WeobtainpromisingfirstresultsonanewdatabasewiththeGermanfinger-spellingalphabet.Thisnewlyrecordeddatabaseisfreelyavailableforfurtherresearch.
1Introduction
Workinthefieldofvision-basedgesturerecognitionusuallyfirstsegmentspartsoftheinputimages,forexamplethehand,andthenusesfeaturescalculatedfromthissegmentedinputlikeshapeormotion.Problemswiththisapproacharetracking,occlusion,lightingorclothingconstraints.Resultsinthefieldofobjectrecognitioninimagessuggestthatthisintermediatesegmentationstepisnotnecessaryandevenhindering,ase.g.segmentationortrackingisneverperfect.Thequestionaddressedinourresearchisifappearancebasedfeaturesarecompetitiveforgesturerecognitionandifwecanusesimilarmodelsofimagevariabilityasinobjectrecognition.Wehaveintegrateddistancemeasuresknownfromimageandopticalcharacterrecognition(e.g.beinginvariantagainstaffinetransformations)intothehiddenMarkovmodelclassifiers.
Mostofthecommonsystems[2,8,9,10]assumeaconstantenvironment,e.g.personswearingnon-skin-coloredclotheswithlongsleevesandafixedcamerapositionunderconstantlightingconditions.Thepresentedsystemsareoftenhighlyperson-dependentandthegesturesusedexhibitgreatdifferencestobeeasilyrecognizable.Weaimatovercomingtheseshortcomingswiththiswork.
2Appearance-BasedFeaturesforGestureRecognition
Inappearance-basedapproachstheimageitselfandsimpletransformations(fil-tering,scaling,etc.)oftheimageareusuallyusedasfeatures.Inthispaper,wedenoteanoriginalimageXinasequenceattimet=1,...,TbyXt,andthepixelvalueattheposition(x,y)byXt(x,y).
(a)(b)(c)(d)(e)(f)(g)
Fig.1.Infraredimagesofthegesture“Five”.(a)-(d):originalandspatialderiva-tivesimagefeatures.(e)-(g)areexamplesofthei6-Gesturedatabase.
Whenworking,forexample,withgrayvaluedimages(e.g.infraredimageslikeinFig.1(c)),originalimagesortheirspatialderivativescanbeusedasfeatures.Skinprobabilityimageshavebeencreatedaccordingtotheirskinprobabilitymaps[5].Otherfeatureshavebeenanalyzedin[3].
3HiddenMarkovModels
TheabilityofhiddenMarkovmodels(HMM)tocompensatetimeandampli-tudevariationshasbeenprovenforspeechrecognition,gesturerecognition,signlanguagerecognitionandhumanactions[4,8,9,10].Inparticularwefocusondistancemeasuresbeinginvariantagainstslightaffinetransformationsordistor-tions.TheideaofaHMMistorepresentasignalbyastateofastochasticfinitestatemachine.Amoredetaileddescriptioncanbefoundin[4].
IneachstatesofanHMM,adistanceiscalculated.Weassumepooledvariancesoverallclassesandstates,i.e.weuseσsdk=σd.Thenegativelogarithmofp(X|s)canbeinterpretedasadistanced(p(X|s))andisusedasemissionscore:
D21Xd−µsd2
−log(p(X|s))=+log(2πσd)
2σd
d=1normalizationfactor
distance
Whenworkingwithimagesequences,wecalculateadistancebetweentwoim-ages,e.g.wecomparethecurrentobservationimageXt(oranytransformed
t)withthemeanimageµsatthisstate.SimplycomparingthepixelimageX
valuesisquiteoftenusedinobjectrecognitionbutdifferentmethodshavebeenproposedtodothis.
TangentDistance.BecausetheEuclidiandistancedoesnotaccountforaffinetransformationssuchasscaling,translationandrotation,thetangentdistance(TD),asdescribedin[7],isoneapproachtoincorporateinvariancewithrespecttocertaintransformationsintoaclassificationsystem.Here,invariantmeansthatimagetransformationsthatdonotchangetheclassoftheimageshouldnothavealargeimpactonthedistancebetweentheimages.Patternsthatalllieinthesamesubspacecanthereforeberepresentedbyoneprototypeandthecorrespondingtangentvectors.Thus,theTDbetweentheoriginalimageandanyofthetransformationsiszero,whiletheEuclideandistanceissignificantlygreaterthanzero.
ImageDistortionModel.Theimagedistortionmodel[6]isamethodwhichallowsforsmalllocaldeformationsofanimage.Eachpixelisalignedtothe
pixelwiththesmallestsquareddistancefromitsneighborhood.Thesesquareddistancesaresummedupforthecompleteimagetogettheglobaldistance.Thismethodcanbeimprovedbyenhancingthepixeldistancetocomparesubimagesinsteadofsinglepixelsonly.Furtherimprovementisachievedbyusingspatialderivativesinsteadofthepixelvaluesdirectly.
4Databases
LTI-GestureDatabase.TheLTI-GesturedatabasewascreatedattheChairofTechnicalComputerScienceattheRWTHAachen[1].Itcontains14dynamicgestures,140trainingand140testingsequences.Anerrorrateof4.3%wasachievedonthisdatabase.Fig.1(c)showsanexampleofagesture.
i6-GestureDatabase.WerecordedanewdatabaseoffingerspellinglettersofGermanSignLanguage.Ourdatabaseisfreelyavailableonourwebsite1.Thedatabasecontains35gesturesandconsistsof700trainingand700testse-quences.20differentpersonswererecordedundernon-uniformdaylightlightingconditions,withoutanyrestrictionsontheclothingwhilegesturing.Thegestureswererecordedbyonewebcam(320x240at25fps)andonecamcorder(352x288at25fps),fromtwodifferentpointsofview.Fig.1(e)-Fig.1(g)showsomeexamplesofdifferentgestures.Moreinformationisavailableonourwebsite.
5Results
In[1],anerrorrateof4.3%hasbeenachievedusingshapeandmotionfeaturesincombinationwithforearmsegmentation.Usingthecentroidfeaturesaspresentedin[8],wehaveachievedanerrorrateof14.2%,andwecanconcludethatthesefeaturesshouldonlybeusedtodescribemotionpatternsinsteadofmorecomplexhandshapes.UsingoriginalimagefeaturesontheLTI-Gesturedatabase,wehaveachievedanerrorrateof5.7%whichhasbeenimprovedto1.4%incombinationwiththetangentdistance[3]ortheimagedistortionmodel(seeTab.1).
Onthei6-Gesturedatabase,wehaveusedonlythewebcamimagestotestoursystem.Itisobviousthatthisdatabasecontainsgesturesofveryhighcomplexity,andthatadditionalmethodsareneededforfeatureextractionorotherdistancemeasures.Usingacamshifttrackertoextractpositionindependentfeatures(notethatwedonottrytosegmentthehand),wecouldimprovetheerrorratefrom87.1%to44.0%.
Usingatwo-sidedtangentdistancewehaveimprovedtheerrorratetothecurrentlybestresultof35.7%,whichshowstheadvantageofusingdistancemea-1
http://www-i6.informatik.rwth-aachen.de/~dreuw/database.html
Table1.Errorrates[%]ontheLTI-Gesturedatabase.
Features
EuclidianTangentIDM
COG[8]original
magnitudeSobel14.25.77.1––1.41.41.41.4
Table2.ErrorRates[%]onthei6-Gesturedatabase.
Feature
EuclidianTangent
originalthresholdedbyskincolorprob.+camshifttracking(nosegmentation)87.144.0-35.7
suresthatareinvariantagainstsmallaffinetransformationsandthepossibilityofrecognizinggesturesbyappearance-basedfeatures(seeTab.2).
6Conclusion
Atthispoint,somequestionsstillremainunanswered,e.g.notalldistancemea-suresandcamerastreamswerecompletelyanalyzedonthei6-Gesturedatabasewhichareexpectedtoimprovetheerrorrate.Thebestachievederrorrateonthei6-Gesturedatabaseis35.7%andshowsthehighcomplexityofthisdatabase.Nevertheless,thisresultispromisingbecauseonlyasimplewebcamwithoutanyrestrictionforthesignerhasbeenusedandsomesignsarevisuallyverysimilar,asforexamplethesignsfor“M”,“N”,“A”,and“S”.
Theuseoftangentdistanceandimagedistortionmodelsasappropriatemod-elsofimagevariabilityincombinationwithappearance-basedfeatureshasbeeninvestigatedandcomparedtotheEuclidiandistanceonotherdatabases.Us-ingthesedistancemeasures,theerrorratehasbeenreducedonallregardeddatabases,especiallyontheLTI-Gesturedatabase.ThisshowsthepowerofintegratingthesedistancemeasuresintotheHMMemissionprobabilitiesforrecognizinggestures.
References
1.S.Akyol,U.Canzler,K.Bengler,andW.Hahn.GestureControlforUseinAutomobiles.InIAPRMVAWorkshop,Tokyo,Japan,pages349–352,Nov.2000.3
2.R.Bowden,D.Windridge,T.Kadir,A.Zisserman,andM.Brady.ALinguisticFeatureVectorfortheVisualInterpretationofSignLanguage.InJ.M.TomasPa-jdla,editor,ECCV,volume1,Prague,CzechRepublic,pages391–401,May2004.1
3.P.Dreuw.Appearance-BasedGestureRecognition.Diplomathesis,RWTHAachenUniversity,Aachen,Germany,Jan.2005.2,3
4.F.Jelinek.StatisticalMethodsforSpeechRecognition.Cambridge,MA,Jan.1998.2
5.M.JonesandJ.Rehg.StatisticalColorModelswithApplicationtoSkinColorDetection.TechnicalReportCRL98/11,CompaqCambridgeResearchLab,1998.2
6.D.Keysers,J.Dahmen,H.Ney,B.Wein,andT.Lehmann.StatisticalFrameworkforModel-basedImageRetrievalinMedicalApplications.JournalofElectronicImaging,12(1):59–68,Jan.2003.2
7.D.Keysers,W.Macherey,H.Ney,andJ.Dahmen.AdaptationinStatisticalPatternRecognitionusingTangentVectors.PAMI,26(2):269–274,Feb.2004.28.G.Rigoll,A.Kosmala,andS.Eickeler.HighPerformanceReal-TimeGestureRecognitionusingHiddenMarkovModels.InInt.GestureWorkshop,volume1371,Bielefeld,Germany,pages69–80,Sep.1998.1,2,3
9.T.Starner,J.Weaver,andA.Pentland.Real-timeASLrecognitionusingdeskandwearablecomputerbasedvideo.PAMI,20(12):1371–1375,Dec.1998.1,210.M.Zobl,R.Nieschulz,M.Geiger,M.Lang,andG.Rigoll.GestureComponents
forNaturalInteractionwithIn-CarDevices.InInt.GestureWorkshop,volume2915ofLNAI,Gif-sur-Yvette,France,pages448–459,Mar.2004.1,2
因篇幅问题不能全部显示,请点此查看更多更全内容