ArtificialConsciousnessforImproving
ReinforcementLearning
MartinNilsson
RWCPNovelFunctionsSICSLaboratory,POB1263,S-128Kista,Sweden
E-mail:mn@sics.se
Abstract:Reinforcementlearningmethodsareusefulforrobotlearning,butbecomeslowwhenrobotspossessmanydegreesoffreedom.Wesuggestequippingrobotswithfaston-boardsimulators,inordertoacceleratelearning.Suchsimulatorswillresembleformsofconsciousness,enablingtherobotstoperformrun-timetrialsinasimulatedworld,ratherthantediouslyperformingtheminpractice.Wehaveappliedthismethodtolocomotionforafriction-propelledsnake-likerobot.Thesimulatoronthisrobotusesanaccuratenon-linearmodelofisotropicfrictionthatisfastenoughtobeexecutableinrealtime.Althoughouroriginalgoalwastoproposeamethodforrobotprogramming,theapproachappearsusefulforreinforcementlearninginageneralcontext.
Keywords:reinforcementlearning,conscious,snake-like,mobile,autonomous,robot,locomotion,friction,simulation.
1Introduction
Programmingautonomousrobotsisdifficultforhumans,especiallywhentherearemanydegreesoffreedom,andtheinteractionwiththeenvironmentandthedegreesoffreedomiscomplex.Inordertocontrolsucharobot,somekindoflearningoradaptationisnecessary.OneclassofmethodssuitableforsuchapplicationsaretheReinforcementLearningmethods.However,thesemethodshavetheseriousdrawbackofbeingslowforlargestatespaces,whichmakesthemimpracticalforlearningbyphysicaltrials.
2Method
Analternativetolearnbyphysicaltrialsistolearninasimulator,providedthattrialscanbeperformedfasterinthesimulatorthaninreality,andthatsimulationreflectsthe
realworldaccuratelyenough.Ifarobotisabletousesuchasimulatorduringrun-time,simulatingtherobotitself,anditsinteractionwiththeenvironment,thesimulatorcanbesaidtoconstituteaformofrobot“consciousness.”
Theadvantagewiththeconsciousnessmodelisthatitcanbothacceleratereinforcementlearning,andreducememoryrequirements.Learningisspeededup,sincetrialscanbeperformedwithoutphysicalexecution.Lessmemoryisrequired,sincetentativeactionscanbereevaluatedratherthantabulatingtheresultofpreviousevaluations.Ifthesimulatorisadaptive,itmayalsobeusedforproblemsolvinginadynamicenvironment.Itisimportanttonotethatthesimulatordoesn’thavetobeperfectlyaccurate–itissufficientthatitisaccurateenoughtobeabletosearchthelocalstatespaceofinterest.Sensorfeedbackcanservetocorrectsimulationdrift.However,itisveryimportantthatsimulatedtrialsareatleastasfastasphysicaltrials.Here,itseemsthatitisoftenadvantageoustotradesimulationaccuracyforspeed.
3Results
Wehavetestedourmodelonasnake-like,creepingrobot.Thisrobotiscomposedofanumberofstraightlinks,connectedbyjoints.Therobotmovesinaplane,andpropelsitselfbychangingtheanglesofthejoints,usingfrictionastheonlymeansoflocomotion.Evenwithonlytwojointsandthreelinks,thisisadifficultprogrammingproblemforahumanprogrammer,duetothenon-linearnatureoffriction.Equippedwithconsciousnessintheformofarudimentarysimulator,therobotisabletocrawlefficientlyinrealtime.Resultsoftheseexperimentshavebeendescribedin[1].
4ConclusionsandDiscussion
Thisworkisstillatanearlystageofdevelopment,butresultssofarseemtoconfirmthatconsciousnesscanbeausefulconceptforimprovingrobotlearning.Althoughouroriginalgoalwastoproposeatentativelyviablemethodforprogrammingmotionofsnake-likerobotsandotherhyper-redundantrobots,theapproachappearsusefulforlearningagentsinamoregeneralcontext.
Ourproposedideaofconsciousnessseemstoagreefairlywellwiththeintuitiveideaofanimateconsciousness.Itspotentialuseasalearningacceleratorcouldperhapsalsoserveasanexplanationofwhyconsciousnessdevelopedduringevolution.
References
[1]Nilsson,M.andOjala,J.:TowardConsciousRobots:“Self-awareness”Speeds
Learning.InProc.Robotikdagarna1995.Link¨opingUniversity,Sweden.June1995.