您的当前位置：首页 Artificial Consciousness for Improving Reinforcement Learning

Artificial Consciousness for Improving Reinforcement Learning

来源：华佗小知识

ArtiﬁcialConsciousnessforImproving

ReinforcementLearning

MartinNilsson

RWCPNovelFunctionsSICSLaboratory,POB1263,S-128Kista,Sweden

E-mail:mn@sics.se

Abstract:Reinforcementlearningmethodsareusefulforrobotlearning,butbecomeslowwhenrobotspossessmanydegreesoffreedom.Wesuggestequippingrobotswithfaston-boardsimulators,inordertoacceleratelearning.Suchsimulatorswillresembleformsofconsciousness,enablingtherobotstoperformrun-timetrialsinasimulatedworld,ratherthantediouslyperformingtheminpractice.Wehaveappliedthismethodtolocomotionforafriction-propelledsnake-likerobot.Thesimulatoronthisrobotusesanaccuratenon-linearmodelofisotropicfrictionthatisfastenoughtobeexecutableinrealtime.Althoughouroriginalgoalwastoproposeamethodforrobotprogramming,theapproachappearsusefulforreinforcementlearninginageneralcontext.

Keywords:reinforcementlearning,conscious,snake-like,mobile,autonomous,robot,locomotion,friction,simulation.

1Introduction

Programmingautonomousrobotsisdifﬁcultforhumans,especiallywhentherearemanydegreesoffreedom,andtheinteractionwiththeenvironmentandthedegreesoffreedomiscomplex.Inordertocontrolsucharobot,somekindoflearningoradaptationisnecessary.OneclassofmethodssuitableforsuchapplicationsaretheReinforcementLearningmethods.However,thesemethodshavetheseriousdrawbackofbeingslowforlargestatespaces,whichmakesthemimpracticalforlearningbyphysicaltrials.

2Method

Analternativetolearnbyphysicaltrialsistolearninasimulator,providedthattrialscanbeperformedfasterinthesimulatorthaninreality,andthatsimulationreﬂectsthe

realworldaccuratelyenough.Ifarobotisabletousesuchasimulatorduringrun-time,simulatingtherobotitself,anditsinteractionwiththeenvironment,thesimulatorcanbesaidtoconstituteaformofrobot“consciousness.”

Theadvantagewiththeconsciousnessmodelisthatitcanbothacceleratereinforcementlearning,andreducememoryrequirements.Learningisspeededup,sincetrialscanbeperformedwithoutphysicalexecution.Lessmemoryisrequired,sincetentativeactionscanbereevaluatedratherthantabulatingtheresultofpreviousevaluations.Ifthesimulatorisadaptive,itmayalsobeusedforproblemsolvinginadynamicenvironment.Itisimportanttonotethatthesimulatordoesn’thavetobeperfectlyaccurate–itissufﬁcientthatitisaccurateenoughtobeabletosearchthelocalstatespaceofinterest.Sensorfeedbackcanservetocorrectsimulationdrift.However,itisveryimportantthatsimulatedtrialsareatleastasfastasphysicaltrials.Here,itseemsthatitisoftenadvantageoustotradesimulationaccuracyforspeed.

3Results

Wehavetestedourmodelonasnake-like,creepingrobot.Thisrobotiscomposedofanumberofstraightlinks,connectedbyjoints.Therobotmovesinaplane,andpropelsitselfbychangingtheanglesofthejoints,usingfrictionastheonlymeansoflocomotion.Evenwithonlytwojointsandthreelinks,thisisadifﬁcultprogrammingproblemforahumanprogrammer,duetothenon-linearnatureoffriction.Equippedwithconsciousnessintheformofarudimentarysimulator,therobotisabletocrawlefﬁcientlyinrealtime.Resultsoftheseexperimentshavebeendescribedin[1].

4ConclusionsandDiscussion

Thisworkisstillatanearlystageofdevelopment,butresultssofarseemtoconﬁrmthatconsciousnesscanbeausefulconceptforimprovingrobotlearning.Althoughouroriginalgoalwastoproposeatentativelyviablemethodforprogrammingmotionofsnake-likerobotsandotherhyper-redundantrobots,theapproachappearsusefulforlearningagentsinamoregeneralcontext.

Ourproposedideaofconsciousnessseemstoagreefairlywellwiththeintuitiveideaofanimateconsciousness.Itspotentialuseasalearningacceleratorcouldperhapsalsoserveasanexplanationofwhyconsciousnessdevelopedduringevolution.

References

[1]Nilsson,M.andOjala,J.:TowardConsciousRobots:“Self-awareness”Speeds

Learning.InProc.Robotikdagarna1995.Link¨opingUniversity,Sweden.June1995.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文