您的当前位置:首页正文

Gesture Recognition Using Image Comparison Methods

来源:画鸵萌宠网
GestureRecognition

UsingImageComparisonMethods

PhilippeDreuw,DanielKeysers,ThomasDeselaers,andHermannNey

Lehrstuhlf¨urInformatikVI–ComputerScienceDepartment,RWTHAachenUniversity–D-52056Aachen,Germany

{dreuw,keysers,deselaers,ney}@informatik.rwth-aachen.de

Abstract.Weintroducetheuseofappearance-basedfeatures,andtan-gentdistanceortheimagedistortionmodeltoaccountforimagevariabil-itywithinthehiddenMarkovmodelemissionprobabilitiestorecognizegestures.Notracking,segmentationofthehandorshapemodelshavetobedefined.Thedistancemeasuresalsoperformwellfortemplatematch-ingclassifiers.WeobtainpromisingfirstresultsonanewdatabasewiththeGermanfinger-spellingalphabet.Thisnewlyrecordeddatabaseisfreelyavailableforfurtherresearch.

1Introduction

Workinthefieldofvision-basedgesturerecognitionusuallyfirstsegmentspartsoftheinputimages,forexamplethehand,andthenusesfeaturescalculatedfromthissegmentedinputlikeshapeormotion.Problemswiththisapproacharetracking,occlusion,lightingorclothingconstraints.Resultsinthefieldofobjectrecognitioninimagessuggestthatthisintermediatesegmentationstepisnotnecessaryandevenhindering,ase.g.segmentationortrackingisneverperfect.Thequestionaddressedinourresearchisifappearancebasedfeaturesarecompetitiveforgesturerecognitionandifwecanusesimilarmodelsofimagevariabilityasinobjectrecognition.Wehaveintegrateddistancemeasuresknownfromimageandopticalcharacterrecognition(e.g.beinginvariantagainstaffinetransformations)intothehiddenMarkovmodelclassifiers.

Mostofthecommonsystems[2,8,9,10]assumeaconstantenvironment,e.g.personswearingnon-skin-coloredclotheswithlongsleevesandafixedcamerapositionunderconstantlightingconditions.Thepresentedsystemsareoftenhighlyperson-dependentandthegesturesusedexhibitgreatdifferencestobeeasilyrecognizable.Weaimatovercomingtheseshortcomingswiththiswork.

2Appearance-BasedFeaturesforGestureRecognition

Inappearance-basedapproachstheimageitselfandsimpletransformations(fil-tering,scaling,etc.)oftheimageareusuallyusedasfeatures.Inthispaper,wedenoteanoriginalimageXinasequenceattimet=1,...,TbyXt,andthepixelvalueattheposition(x,y)byXt(x,y).

(a)(b)(c)(d)(e)(f)(g)

Fig.1.Infraredimagesofthegesture“Five”.(a)-(d):originalandspatialderiva-tivesimagefeatures.(e)-(g)areexamplesofthei6-Gesturedatabase.

Whenworking,forexample,withgrayvaluedimages(e.g.infraredimageslikeinFig.1(c)),originalimagesortheirspatialderivativescanbeusedasfeatures.Skinprobabilityimageshavebeencreatedaccordingtotheirskinprobabilitymaps[5].Otherfeatureshavebeenanalyzedin[3].

3HiddenMarkovModels

TheabilityofhiddenMarkovmodels(HMM)tocompensatetimeandampli-tudevariationshasbeenprovenforspeechrecognition,gesturerecognition,signlanguagerecognitionandhumanactions[4,8,9,10].Inparticularwefocusondistancemeasuresbeinginvariantagainstslightaffinetransformationsordistor-tions.TheideaofaHMMistorepresentasignalbyastateofastochasticfinitestatemachine.Amoredetaileddescriptioncanbefoundin[4].

IneachstatesofanHMM,adistanceiscalculated.Weassumepooledvariancesoverallclassesandstates,i.e.weuseσsdk=σd.Thenegativelogarithmofp(X|s)canbeinterpretedasadistanced(p(X|s))andisusedasemissionscore:

󰀂D󰀂󰀊󰀃󰀃󰀁2󰀄1Xd−µsd2

−log(p(X|s))=+log(2πσd)

󰀈󰀇󰀆󰀉2σd

d=1󰀇󰀆󰀉normalizationfactor󰀈

distance

Whenworkingwithimagesequences,wecalculateadistancebetweentwoim-ages,e.g.wecomparethecurrentobservationimageXt(oranytransformed

󰀅t)withthemeanimageµsatthisstate.SimplycomparingthepixelimageX

valuesisquiteoftenusedinobjectrecognitionbutdifferentmethodshavebeenproposedtodothis.

TangentDistance.BecausetheEuclidiandistancedoesnotaccountforaffinetransformationssuchasscaling,translationandrotation,thetangentdistance(TD),asdescribedin[7],isoneapproachtoincorporateinvariancewithrespecttocertaintransformationsintoaclassificationsystem.Here,invariantmeansthatimagetransformationsthatdonotchangetheclassoftheimageshouldnothavealargeimpactonthedistancebetweentheimages.Patternsthatalllieinthesamesubspacecanthereforeberepresentedbyoneprototypeandthecorrespondingtangentvectors.Thus,theTDbetweentheoriginalimageandanyofthetransformationsiszero,whiletheEuclideandistanceissignificantlygreaterthanzero.

ImageDistortionModel.Theimagedistortionmodel[6]isamethodwhichallowsforsmalllocaldeformationsofanimage.Eachpixelisalignedtothe

pixelwiththesmallestsquareddistancefromitsneighborhood.Thesesquareddistancesaresummedupforthecompleteimagetogettheglobaldistance.Thismethodcanbeimprovedbyenhancingthepixeldistancetocomparesubimagesinsteadofsinglepixelsonly.Furtherimprovementisachievedbyusingspatialderivativesinsteadofthepixelvaluesdirectly.

4Databases

LTI-GestureDatabase.TheLTI-GesturedatabasewascreatedattheChairofTechnicalComputerScienceattheRWTHAachen[1].Itcontains14dynamicgestures,140trainingand140testingsequences.Anerrorrateof4.3%wasachievedonthisdatabase.Fig.1(c)showsanexampleofagesture.

i6-GestureDatabase.WerecordedanewdatabaseoffingerspellinglettersofGermanSignLanguage.Ourdatabaseisfreelyavailableonourwebsite1.Thedatabasecontains35gesturesandconsistsof700trainingand700testse-quences.20differentpersonswererecordedundernon-uniformdaylightlightingconditions,withoutanyrestrictionsontheclothingwhilegesturing.Thegestureswererecordedbyonewebcam(320x240at25fps)andonecamcorder(352x288at25fps),fromtwodifferentpointsofview.Fig.1(e)-Fig.1(g)showsomeexamplesofdifferentgestures.Moreinformationisavailableonourwebsite.

5Results

In[1],anerrorrateof4.3%hasbeenachievedusingshapeandmotionfeaturesincombinationwithforearmsegmentation.Usingthecentroidfeaturesaspresentedin[8],wehaveachievedanerrorrateof14.2%,andwecanconcludethatthesefeaturesshouldonlybeusedtodescribemotionpatternsinsteadofmorecomplexhandshapes.UsingoriginalimagefeaturesontheLTI-Gesturedatabase,wehaveachievedanerrorrateof5.7%whichhasbeenimprovedto1.4%incombinationwiththetangentdistance[3]ortheimagedistortionmodel(seeTab.1).

Onthei6-Gesturedatabase,wehaveusedonlythewebcamimagestotestoursystem.Itisobviousthatthisdatabasecontainsgesturesofveryhighcomplexity,andthatadditionalmethodsareneededforfeatureextractionorotherdistancemeasures.Usingacamshifttrackertoextractpositionindependentfeatures(notethatwedonottrytosegmentthehand),wecouldimprovetheerrorratefrom87.1%to44.0%.

Usingatwo-sidedtangentdistancewehaveimprovedtheerrorratetothecurrentlybestresultof35.7%,whichshowstheadvantageofusingdistancemea-1

http://www-i6.informatik.rwth-aachen.de/~dreuw/database.html

Table1.Errorrates[%]ontheLTI-Gesturedatabase.

Features

EuclidianTangentIDM

COG[8]original

magnitudeSobel14.25.77.1––1.41.41.41.4

Table2.ErrorRates[%]onthei6-Gesturedatabase.

Feature

EuclidianTangent

originalthresholdedbyskincolorprob.+camshifttracking(nosegmentation)87.144.0-35.7

suresthatareinvariantagainstsmallaffinetransformationsandthepossibilityofrecognizinggesturesbyappearance-basedfeatures(seeTab.2).

6Conclusion

Atthispoint,somequestionsstillremainunanswered,e.g.notalldistancemea-suresandcamerastreamswerecompletelyanalyzedonthei6-Gesturedatabasewhichareexpectedtoimprovetheerrorrate.Thebestachievederrorrateonthei6-Gesturedatabaseis35.7%andshowsthehighcomplexityofthisdatabase.Nevertheless,thisresultispromisingbecauseonlyasimplewebcamwithoutanyrestrictionforthesignerhasbeenusedandsomesignsarevisuallyverysimilar,asforexamplethesignsfor“M”,“N”,“A”,and“S”.

Theuseoftangentdistanceandimagedistortionmodelsasappropriatemod-elsofimagevariabilityincombinationwithappearance-basedfeatureshasbeeninvestigatedandcomparedtotheEuclidiandistanceonotherdatabases.Us-ingthesedistancemeasures,theerrorratehasbeenreducedonallregardeddatabases,especiallyontheLTI-Gesturedatabase.ThisshowsthepowerofintegratingthesedistancemeasuresintotheHMMemissionprobabilitiesforrecognizinggestures.

References

1.S.Akyol,U.Canzler,K.Bengler,andW.Hahn.GestureControlforUseinAutomobiles.InIAPRMVAWorkshop,Tokyo,Japan,pages349–352,Nov.2000.3

2.R.Bowden,D.Windridge,T.Kadir,A.Zisserman,andM.Brady.ALinguisticFeatureVectorfortheVisualInterpretationofSignLanguage.InJ.M.TomasPa-jdla,editor,ECCV,volume1,Prague,CzechRepublic,pages391–401,May2004.1

3.P.Dreuw.Appearance-BasedGestureRecognition.Diplomathesis,RWTHAachenUniversity,Aachen,Germany,Jan.2005.2,3

4.F.Jelinek.StatisticalMethodsforSpeechRecognition.Cambridge,MA,Jan.1998.2

5.M.JonesandJ.Rehg.StatisticalColorModelswithApplicationtoSkinColorDetection.TechnicalReportCRL98/11,CompaqCambridgeResearchLab,1998.2

6.D.Keysers,J.Dahmen,H.Ney,B.Wein,andT.Lehmann.StatisticalFrameworkforModel-basedImageRetrievalinMedicalApplications.JournalofElectronicImaging,12(1):59–68,Jan.2003.2

7.D.Keysers,W.Macherey,H.Ney,andJ.Dahmen.AdaptationinStatisticalPatternRecognitionusingTangentVectors.PAMI,26(2):269–274,Feb.2004.28.G.Rigoll,A.Kosmala,andS.Eickeler.HighPerformanceReal-TimeGestureRecognitionusingHiddenMarkovModels.InInt.GestureWorkshop,volume1371,Bielefeld,Germany,pages69–80,Sep.1998.1,2,3

9.T.Starner,J.Weaver,andA.Pentland.Real-timeASLrecognitionusingdeskandwearablecomputerbasedvideo.PAMI,20(12):1371–1375,Dec.1998.1,210.M.Zobl,R.Nieschulz,M.Geiger,M.Lang,andG.Rigoll.GestureComponents

forNaturalInteractionwithIn-CarDevices.InInt.GestureWorkshop,volume2915ofLNAI,Gif-sur-Yvette,France,pages448–459,Mar.2004.1,2

因篇幅问题不能全部显示,请点此查看更多更全内容

Top