您好,欢迎来到华佗小知识。
搜索
您的当前位置:首页Finding related pages in the world wide web

Finding related pages in the world wide web

来源:华佗小知识
ComputerNetworks31(1999)1467–1479

FindingrelatedpagesintheWorldWideWeb

JeffreyDeanŁ,1,MonikaR.Henzinger2

CompaqSystemsResearchCenter,130LyttonAve.,PaloAlto,CA94301,USA

Abstract

Whenusingtraditionalsearchengines,usershavetoformulatequeriestodescribetheirinformationneed.ThispaperdiscussesadifferentapproachtoWebsearchingwheretheinputtothesearchprocessisnotasetofqueryterms,butinsteadistheURLofapage,andtheoutputisasetofrelatedWebpages.ArelatedWebpageisonethataddressesthesametopicastheoriginalpage.Forexample,www.washingtonpost.comisapagerelatedtowww.nytimes.com,sincebothareonlinenewspapers.

WedescribetwoalgorithmstoidentifyrelatedWebpages.ThesealgorithmsuseonlytheconnectivityinformationintheWeb(i.e.,thelinksbetweenpages)andnotthecontentofpagesorusageinformation.Wehaveimple-mentedbothalgorithmsandmeasuredtheirruntimeperformance.Toevaluatetheeffectivenessofouralgorithms,weperformedauserstudycomparingouralgorithmswithNetscape’s‘What’sRelated’service(http://home.netscape.com/escapes/related/).Ourstudyshowedthattheprecisionat10forourtwoalgorithmsare73%betterand51%betterthanthatofNetscape,despitethefactthatNetscapeusesbothcontentandusagepatterninformationinadditiontoconnectivityinformation.©1999PublishedbyElsevierScienceB.V.Allrightsreserved.Keywords:Searchengines;Relatedpages;Searchingparadigms

1.Introduction

TraditionalWebsearchenginestakeaqueryasinputandproduceasetof(hopefully)relevantpagesthatmatchthequeryterms.Whileusefulinmanycircumstances,searchengineshavethedisadvantagethatusershavetoformulatequeriesthatspecifytheirinformationneed,whichispronetoerrors.ThispaperdiscusseshowtofindrelatedWebpages,adifferentapproachtoWebsearching.Inourapproach

author.Presentaddress:mySimon,Inc.,Santa

Clara,CA,USA.E-mail:jdean@mysimon.com

1ThisworkwasdonewhiletheauthorwasattheCompaqWesternResearchLaboratory.2E-mail:monika@pa.dec.com

ŁCorrespondingtheinputtothesearchprocessisnotasetofqueryterms,buttheURLofapage,andtheoutputisasetofrelatedWebpages.ArelatedWebpageisonethataddressesthesametopicastheoriginalpage,butisnotnecessarilysemanticallyidentical.Forexample,givenwww.nytimes.com,thetoolshouldfindothernewspapersandnewsorganizationsontheWeb.Ofcourse,incontrasttosearchengines,ourapproachrequiresthattheuserhasalreadyfoundapageofinterest.

RecentworkininformationretrievalontheWebhasrecognizedthatthehyperlinkstructurecanbeveryvaluableforlocatinginformation[1,3,4,8,12,16,17,20,22,23].Thisassumesthatifthereisalinkfrompagevandw,thentheauthorofvrecommendspagew,andlinksoftenconnect

13-1286/99/$–seefrontmatter©1999PublishedbyElsevierScienceB.V.Allrightsreserved.PII:S13-1286(99)00022-5

1468J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479

Table1

ExampleresultsproducedbytheCompanionalgorithmInput:www.nytimes.com

www.usatoday.com

USATodaynewspaperwww.washingtonpost.comWashingtonPostnewspaperwww.cnn.com

CableNewsNetwork

www.latimes.comLosAngelesTimesnewspaperwww.wsj.comWallStreetJournalnewspaperwww.msnbc.com

MSNBCcablenewsstationwww.sjmercury.com

SanJoseMercuryNewsnewspaper

www.chicago.tribune.comChicagoTribunenewspaperwww.nando.netNandoTimeson-linenews

service

www.the-times.co.ukTheTimesnewspaperrelatedpages.Inthispaper,wedescribetheCom-panionandCocitationalgorithms,twoalgorithmswhichuseonlythehyperlinkstructureoftheWebtoidentifyrelatedWebpages.Forexample,Table1showstheoutputoftheCompanionalgorithmwhengivenwww.nytimes.comasinput(inthiscase,theresultsfortheCocitationalgorithmareidenticalandtheresultsforNetscapeareverysimilar,althoughthisisnotalwaystrue).

Oneofourgoalswastodesignalgorithmswithhighprecisionthatareveryfastandthatdonotrequirealargenumberofdifferentkindsofinputdata.SincewehaveatoolthatgivesusaccesstothehyperlinkstructureoftheWeb(theConnectivityServer[2]),wefocusedonalgorithmsthatonlyuseconnectivityinformationtoidentifyrelatedpages.Ouralgorithmsuseonlytheinformationaboutthelinksthatappearoneachpageandtheorderinwhichthelinksappear.Theyneitherexaminethecontentofpages,nordotheyexaminepatternsofhowuserstendtonavigateamongpages.

OurCompanionalgorithmisderivedfromtheHITSalgorithmproposedbyKleinbergforrankingsearchenginequeries[12].KleinbergsuggestedthattheHITSalgorithmcouldbeusedforfindingrelatedpagesaswell,andprovidedanecdotalevidencethatitmightworkwell.Inthispaper,weextendthealgorithmtoexploitnotonlylinksbutalsotheirorderonapage(seeSection2.1.1)andpresenttheresultsofauser-studyshowingthattheresultingalgorithmworksverywell.

TheCocitationalgorithmfindspagesthatarefrequentlycocitedwiththeinputURLu(thatis,itfindsotherpagesthatarepointedtobymanyotherpagesthatallalsopointtou).

NetscapeCommunicatorVersion4.06introducedarelatedpagesservicethatisbuiltintothebrowser[15](seeSection2.3foramoredetaileddiscus-sion).Onthebrowserscreen,thereisa‘What’sRe-lated’button,whichpresentsamenuofrelatedWebpagesinsomecases.The‘What’sRelated’algo-rithminNetscapeisbasedontechnologydevelopedbyAlexa,Inc.,andcomputesitsanswersbasedonconnectivityinformation,contentinformation,andusageinformation[14].

Tocomparetheperformanceofourtwoalgo-rithmsandNetscape’salgorithm,weperformedauserstudyon59URLschosenby18volunteers.Ourstudyresultsshowthattheprecisionat10com-putedoverall59URLsofourtwoalgorithmsare73%betterand51%betterthanNetscape’s.NotallalgorithmsgaveanswersforallURLsinourstudy.Ifwerestrictthecomparisontoonlythe37URLsforwhichallthreealgorithmsreturnedanswers,thentheprecisionat10ofourtwoalgorithmsare40%and22%betterthanNetscape’salgorithm.Thisissurprisingsinceouralgorithmsarebasedonlyonconnectivityinformation.

Netscape’salgorithmgivesanswersforabout17millionURLs[14],whileouralgorithmscangiveanswersforamuchlargersetofURLs(wehavecon-nectivityinformationon180millionURLs).ThisisimportantbecauseitmeansthatwecangiverelatedURLinformationformoreURLs.Ouralgorithmsarealsofast:inourenvironment,bothaveragelessthan200msecofcomputationperinputURL.

TheexampleshowninTable1isforaURLwithaveryhighlevelofconnectivity(www.nytimes.comcontains47,751inlinksinourConnectivityServer),andallthreealgorithmsgenerallyperformquitewellforwell-connectedURLs.Ouralgorithmscanalsoworkwellwhenthereismuchlessconnectivity,asshownbytheexampleinTable2.ThistableshowstheanswersfortheCompanionandNetscapealgo-rithmsforlinas.org/linux/corba.html,oneoftheinputURLschosenbyauseraspartofouruserstudy.Alongsideeachansweristheuser’srat-ingforeachanswer,witha‘1’meaningthattheuserconsideredthepagerelated,‘0’meaningthatthe

J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479

Table2

ComparisonoftheresultsoftheCompanionandNetscapealgorithmsInput:linas.org/linux/corba.htmlCompanion1111111011

www.cs.wustl.edu/¾schmidt/TAO.htmldsg.harvard.edu/public/arachnejorba.castle.net.au/

www-b0.fnal.gov:8000/ROBIN

www.paragon-software.com/products/oakwww.tatanka.com/orb1.htm

www.oot.co.uk/dome-index.htmlwww.intellisoft.com/¾mark

www.dstc.edu.au/AU/staff/mart...www.inf.fu-berlin.de/¾brose/jacorb

Netscape00––000100

labserver.kuntrynet.com/¾linuxweb.singnet.com.sg/¾siuyin

www.clark.net/pub/srokicki/linuxwww.earth.demon.co.uk/linux/ukwww.emry.net/webwatcher

www.interl.net/¾jasoneng/NLL/lwrwww.jnpcs.com/mkb/linuxwww.linuxresources.com/www.liszt.com/

www.local.net/¾jgo/linuxhelp.html

1469

A‘1’meansthatthepagewasvaluable,a‘0’meansthatthepagewasnotvaluable,a‘–’meansthatthepagecouldnotbeaccessed.

userconsideredthepageunrelated,and‘–’meaningthattheuserwasunabletoaccessthepageatall.TheoriginalpagewasaboutCORBAimplementa-tionsforLinux,andtherewere123pagespointingtothispageinourConnectivityServer.NineofthetenanswersgivenbytheCompanionalgorithmweredeemedrelatedbyouruser,whileonlyonepagefromNetscape’ssetofanswerswasdeemedrelated.MostofNetscape’sanswerswereaboutthemuchbroadertopicofLinux,ratherthanspecificallyaboutCORBAimplementationsonLinux.

Section2presentsouralgorithmsindetailandde-scribesNetscape’sservice,whileSection3discussestheimplementationofouralgorithms.Section4de-scribestheuserstudyweperformedandpresentsitsresults,andalsoprovidesabriefperformanceeval-uationofouralgorithms.Finally,Section6presentsourconclusions.

2.Relatedpagealgorithms

Inthissectionwedescribeourtwoalgorithms(theCompanionalgorithmandtheCocitationalgorithm),aswellasNetscape’salgorithm.UnlikeNetscape’salgorithm,bothofouralgorithmsexploitonlythehyperlink-structure(i.e.graphconnectivity)oftheWebanddonotexaminetheinformationaboutthecontentorusageofpages.Netscape’salgorithmusesallthreekindsofinformationtoarriveatitsresults.

Inthesectionsbelow,weusethetermsparentandchild.Ifthereisahyperlinkfrompagewtopagev,wesaythatwisaparentofvandthatvisachildofw.

2.1.Companionalgorithm

TheCompanionalgorithmtakesasinputastartingURLuandconsistsoffoursteps:(1)Buildavicinitygraphforu.

(2)Contractduplicatesandnear-duplicatesinthis

graph.

(3)Computeedgeweightsbasedonhosttohost

connections.

(4)Computeahubscoreandanauthorityscorefor

eachnodeinthegraphandreturnthetoprankedauthoritynodes(ourimplementationreturnsthetop10).Thisphaseofthealgorithmusesamod-ifiedversionoftheHITSalgorithmoriginallyproposedbyKleinberg[12].

Thesestepsaredescribedinmoredetailinthesubsectionsbelow.Onlystep1exploitstheorderoflinksonapage.

2.1.1.Step1:buildingthevicinitygraph

GivenaqueryURLuwebuildadirectedgraphofnodesthatarenearbytouintheWebgraph.GraphnodescorrespondtoWebpagesandgraphedgescorrespondtohyperlinks.Thegraphconsistsofthefollowingnodes(andtheedgesbetweenthesenodes):

1470J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479

(1)u,

(2)uptoBparentsofu,andforeachparentupto

BFofitschildrendifferentfromu,and

(3)uptoFchildrenofu,andforeachchildupto

FBofitsparentsdifferentfromu.

Hereishowwechoosethesenodesindetail:ThereisastoplistSTOPofURLsthatareunrelatedtomostqueriesandthathaveveryhighindegree.Ourstoplistwasdevelopedbyexperimentation,andcurrentlycontains21URLs.Examplesofnodesthatappearonourstoplistarewww.yahoo.comandwww.microsoft.com/ie/download.html.IfthestartingURLuisnotoneofthenodesonourstoplist,thenweignorealltheURLsonthestoplistwhenformingthevicinitygraph.Ifudoesappearonthestoplist,however,thenwedisablethestoplist(i.e.setSTOPtotheemptylist)andfreelyincludeanynodesinthevicinitygraph.WedisablethestoplistwhentheinputURLappearsonthestoplistbecausemanynodesonthestoplistarepopularsearchenginesandportals,andwewanttopermitthesenodestobeconsideredwhentheinputURLisanothersuchpopularsite.

Goback(B),andback-forward(BF):IfuhasmorethanBparents,addBrandomparentsnotonSTOPtothegraph;otherwiseaddallofu’sparents.IfaparentxofuhasmorethanBFC1children,adduptoBF=2childrenpointedtobytheBF=2linksonximmediatelyprecedingthelinktouanduptoBF=2childrenpointedtobytheBF=2linksonximmediatelysucceedingthelinktou(ignoringduplicatelinks).IfpagexhasfewerthanBFchildren,weaddallofitschildrentothegraph.Notethatthisexploitstheorderoflinksonpagex.Goforward(F),andforward-back(FB):IfuhasmorethanFchildren,addthechildrenpointedtobythefirstFlinksofu;otherwise,addallofu’schil-dren.IfachildofuhasmorethanBFparents,addtheBFparentsnotonSTOPwithhighestindegree;otherwise,addallofthechild’sparents.

Ifthereisahyperlinkfromapagerepresentedbynodevinthegraphtoapagerepresentedbynodew,andvandwdonotbelongtothesamehost,thenthereisadirectededgefromvtowinthegraph(weomitedgesbetweennodesonthesamehost).

Inourexperience,wehavefoundthatusingalargevalueforB(2000)andasmallvalueforBF(8)worksbetterinpracticethanusingmoderatevalues

foreach(say,50and50).Wehaveobservedthatlinkstopagesonasimilartopictendtobeclusteredtogether,whilelinksthatarefartherapartonapagearelesslikelytobeonthesametopic(forexample,mosthotlistsaregroupedintocategories).Thishasalsobeenobservedbyotherresearchers[6].UsingalargervalueforBalsomeansthatthelikelihoodofthecomputationbeingdominatedbyasingleparentpageisreduced.

2.1.2.Step2:duplicateelimination

Afterbuildingthisgraphwecombinenear-dupli-cates.Wesaytwonodesarenear-duplicatesif(a)theyeachhavemorethan10linksand(b)theyhaveatleast95%oftheirlinksincommon.Tocombinetwonear-duplicateswereplacetheirtwonodesbyanodewhoselinksaretheunionofthelinksofthetwonear-duplicates.Thisduplicateeliminationphaseisimportantbecausemanypagesareduplicatedacrosshosts(e.g.mirrorsites,differentaliasesforthesamepage),andwehaveobservedthatallowingthemtoremainseparatecangreatlydistorttheresults.2.1.3.Step3:assignedgeweights

Inthisstage,weassignaweighttoeachedge,usingtheedgeweightingschemeofBharatandHen-zinger[3]whichwerepeathereforcompleteness.Anedgebetweentwonodesonthesamehost3hasweight0.Iftherearekedgesfromdocumentsonafirsthosttoasingledocumentonasecondhostwegiveeachedgeanauthorityweightof1=k.Thisweightisusedwhencomputingtheauthorityscoreofthedocumentonthesecondhost.Ifthereareledgesfromasingledocumentonafirsthosttoasetofdocumentsonasecondhost,wegiveeachedgeahubweightof1=l.Weperformthisscalingtopreventasinglehostfromhavingtoomuchinfluenceonthecomputation.

Wecalltheresultingweightedgraphthevicinitygraphofu.

2.1.4.Step4:computehubandauthorityscoresInthisstep,weruntheimpalgorithm[3]onthegraphtocomputehubandauthorityscores.Theimp

3

Weassumethroughoutthepaperthatthehostcanbedeter-minedfromtheURL-string.

J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–14791471

algorithmisastraightforwardextensionoftheHITSalgorithmwithedgeweights.

TheintuitionbehindtheHITSalgorithmisthatadocumentthatpointstomanyothersisagoodhub,andadocumentthatmanydocumentspointtoisagoodauthority.Transitively,adocumentthatpointstomanygoodauthoritiesisanevenbetterhub,andsimilarlyadocumentpointedtobymanygoodhubsisanevenbetterauthority.TheHITScomputationrepeatedlyupdateshubandauthorityscoressothatdocumentswithhighauthorityscoresareexpectedtohaverelevantcontent,whereasdocumentswithhighhubscoresareexpectedtocontainlinkstorelevantcontent.Thecomputationofhubandauthorityscoresisdoneasfollows:

InitializeallelementsofthehubvectorHto1.0.InitializeallelementsoftheauthorityvectorAto1.0.

WhilethevectorsHandAhavenotconverged:ForallnodesninthevicinitygraphN,A[n]:DP

.n0;n/2edges.N/H[n0]

ðauthorityweight.n0;n/

ForallninNH[n]:DP

.n;n0/2edges.N/A[n0]

ðhubweight.n;n0/

NormalizetheHandAvectors.Notethatthealgorithmdoesnotclaimtofindallrelevantpages,sincetheremaybesomethathavegoodcontentbuthavenotbeenlinkedtobymanyauthors.

TheCompanionalgorithmthenreturnsthenodeswiththetenhighestauthorityscores(excludinguitself)asthepagesthataremostrelatedtothestartpageu.

2.2.Cocitationalgorithm

AnalternativeapproachforfindingrelatedpagesistoexaminethesiblingsofastartingnodeuintheWebgraph.Twonodesarecocitediftheyhaveacommonparent.Thenumberofcommonparentsoftwonodesistheirdegreeofcocitation.AsanalternativetotheCompanionalgorithm,wehavedevelopedaverysimplealgorithmthatdeterminesrelatedpagesbylookingforsiblingnodeswiththehighestdegreeofcocitation.TheCocitationalgo-

rithmfirstchoosesuptoBarbitraryparentsofu.Foreachoftheseparentsp,itaddstoasetSuptoBFchildrenofpthatsurroundthelinkfromptou.TheelementsofSaresiblingsofu.ForeachnodesinS,wedeterminethedegreeofcocitationofswithu.Finally,thealgorithmreturnsthe10mostfrequentlycocitednodesinSastherelatedpages.

Insomecasesthereisaninsufficientlevelofcoc-itationwithutoprovidemeaningfulresults.Inourimplementation,iftherearenotatleast15nodesinSthatarecocitedwithuatleasttwice,thenwerestartthealgorithmusingthenodecorrespondingtou’sURLwithonepathelementremoved.Forexam-ple,ifu’sURLisa.com/X/Y/ZandaninsufficientnumberofcocitednodesexistforthisURL,thenwerestartthealgorithmwiththeURLa.com/X/Y(iftheresultingURLisinvalid,wecontinuetochopelementsuntilweareleftwithjustahostname,orwefindavalidURL).

Inourimplementation,wechoseBtobe2000andBFtobe8(thesameparametervaluesweusedforourimplementationoftheCompanionalgorithm).OnewayoflookingattheCocitationalgorithmisthatitfinds‘maximal’nð2bipartitesubgraphsinthevicinitygraph.2.3.Netscape’sapproach

Netscapeintroducedanew‘What’sRelated?’fea-tureinversion4.06oftheNetscapeCommunicatorbrowser.Detailsabouttheapproachusedtoidentifyrelatedpagesintheiralgorithmaresketchy.How-ever,theWhat’sRelatedFAQpageindicatesthatthealgorithmusesconnectivityinformation,usageinformation,andcontentanalysisofthepagestodeterminerelationships.Wequotefromthe‘What’sRelated’FAQpage:

TheWhat’sRelateddataiscreatedbyAlexaInternet.Alexausescrawling,archiving,categorizinganddataminingtechniquestobuildtherelatedsiteslistsformillionsofWebURLs.Forexample,Alexauseslinksonthecrawledpagestofindrelatedsites.Theday-to-dayuseofWhat’sRelatedalsohelpsbuildandrefinethedata.Astheserviceisused,therequestedURLsarelogged.Bylookingathigh-leveltrends,NetscapeandAlexacandeducerelationshipsbetweenWebsites.Forexample,ifthousandsofusersgodirectlyfromsiteAtositeB,thetwositesarelikelytoberelated.

1472J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479

Next,AlexachecksalltheURLstomakesuretheyarelivelinks.Thisprocessremoveslinksthatwouldtrytoreturnpagesthatdon’texist(404errors),aswellasanylinkstoserversthatarenotavailabletothegeneralInternetpopulation,suchasserversthatarenolongeractiveorarebehindfirewalls.Finally,oncealloftherelationshipsareestablishedandthelinksarechecked,thetoptenrelatedsitesforeachURLarechosenbylookingatthestrengthoftherelationshipbetweenthesites.

Eachmonth,AlexarecrawlstheWebandrebuildsthedatatopullinnewsitesandtorefinetherelationshipsbetweentheexistingsites.Newsiteswithstrongre-lationshipstoasitewillautomaticallyappearintheWhat’sRelatedlistforthatsitebydisplacinganysiteswithweakerrelationships.

Notethatsincetherelationshipsbetweensitesarebasedonstrength,What’sRelatedlistsarenotnec-essarilybalanced.SiteAmayappearinthelistforSiteB,butSiteBmaynotbeinthelistforSiteA.Generally,thishappenswhenthenumberofsiteswithstrongrelationshipsisgreaterthanten,orwhensitesdonothavesimilarenoughcontent.

3.Implementation

Inexperimentingwiththesealgorithms,wewerefortunatetohaveaccesstoCompaq’sConnectivityServer[2].TheConnectivityServerprovideshighspeedaccesstothegraphstructureof180millionURLs(nodes)andthelinks(edges)thatconnectthem.TheentiregraphstructureiskeptinmemoryonaCompaqAlphaServer4100with8GBofmainmemoryanddual466MHzAlphaprocessors.Therandomaccesspatternsengenderedbytheconnec-tivity-basedalgorithmsdescribedinthispapermeanthatitisimportantformostorallofthegraphtofitinmainmemorytopreventhighlevelsofpagingactivity.

Weimplementedamulti-threadedserverthatac-ceptsaURLanduseseithertheCocitationalgorithmortheCompanionalgorithmtofindpagesrelatedtothegivenURL.Ourserverimplementationconsistsofapproximately5500linesofcommentedCcode,ofwhichapproximately1000linesimplementtheCompanionalgorithm,400linesimplementtheCoc-itationalgorithm,andtheremainderaresharedcodetoperformtaskssuchasparsingHTTPqueryre-

quests,printingresults,andloggingstatusmessages.WelinkourservercodedirectlywiththeCon-nectivityServerlibrary,andaccesstheconnectivityinformationbymmappingthegraphinformationintotheaddressspaceofourserver.

OurimplementationoftheCompanionalgorithmhasbeensubjectedtoamoderateamountofper-formancetuning,mostlyindesigningtheneighbor-hoodgraphdatastructurestoimprovedata-cacheperformance.TheimplementationoftheCocitationalgorithmhasnotbeentunedextensively,althoughitdoesshareafairamountofcodewiththeCompan-ionalgorithm,andthissharedcodehasbeentunedsomewhat.4.Evaluation

Inthissectionwedescribetheevaluationweperformedforthealgorithms.Section4.1describesouruserstudy,whileSection4.2discussestheresultsofthestudy.Section4.3evaluatestheruntimeperformanceofouralgorithms.4.1.Experimentalsetup

Tocomparethedifferentapproaches,weper-formedauserstudy.Weasked18volunteerstosupplyuswithatleasttwoURLsforwhichtheywantedtofindrelatedpages.Ourvolunteersincluded14computerscienceprofessionals(mostlyourcol-leaguesatCompaq’sresearchlaboratories),aswellas4peoplewithotherprofessionalcareers.Were-ceived69URLsandusedeachofthealgorithmstodeterminethetop10answersforeachURL.Weputtheanswersinrandomorderandreturnedthemtothevolunteersforrating.Thevolunteerswereinstructedasfollows:

Table3

SummaryofallanswersforthealgorithmsAlgorithmNo.ofURLsNo.ofNo.ofwithanswersanswersdeadlinksCompanion5049842Cocitation5858062Netscape

40

3

29

J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–14791473

Wewanttomeasurehowwelleachalgorithmper-forms.TomeasureperformancewewanttoknowthepercentageofvaluableURLsreturnedbyeachalgo-rithm.TobevaluabletheURLmustbebothrelevanttothetopicyouareinterestedinandahighqualitypage.Forexample,ifyourURLwaswww.audi.comandyougetbackanewsgrouppostingwheresomebodytalksabouthisnewAudicar,thenthepagewasontopic,butprobablynothighquality.Ontheotherhand,ifyougetwww.jaguar.comasananswer,thenitisuptoyoutodecidewhetherthisanswerisontopicornot.Scoring:

0:Pagewasnotvaluable=useful1:Pagewasvaluable=useful

–:Pagecouldnotbeaccessed(i.e.didnotexist,orserverwasdown)Pleaseignoretheorderinwhichthepagesarereturned.Soifalaterpagecontainssimilarcontenttoanearlierpagepleaseratethelatterpageasifyouhadnotseentheearlierpage.Thiswillimplythatwedonotmeasurehow‘happy’youarewithasetofanswersreturnedbyanalgorithm.Insteadwemeasurehowmanyvaluableanswerseachalgorithmgives.”

4.2.Userstudyresults

WereceivedresponsesratingtheanswerURLsfor59oftheinputURLs.These59inputURLsformthebasisofourstudy.Table3showshowmanyofthesequeriesthealgorithmsansweredandhowmanyanswerURLstheyreturned.Inmanycases,thealgorithmsreturnedlinksthatourusersratedasinaccessible.ThecolumnlabeledNo.ofdeadlinksshowsthenumberofinaccessiblepagesamongalltheanswersforeachalgorithm.Forthepurposesofourevaluation,wetreataninaccessiblelinkasascoreof‘0’,sinceinaccessiblepagesarenotvaluable=useful.

TheCocitationalgorithmreturnedresultsforallbutoneoftheURLs.ThereasonwhyitreturnedresultsforalmostallinputURLsisthatwheninsuf-ficientconnectivitywasfoundsurroundinganinputURL(e.g.a.com/X/Y),theCocitationalgorithmusedachoppedURLasinput(e.g.a.com/X).Al-thoughwedidnotincludethischoppingfeatureinourimplementationoftheCompanionalgorithm,itisdirectlyapplicableandwouldenabletheCompan-ionalgorithmtoreturnanswersformoreURLs.WehaveempiricallyobservedthatNetscape’salgorithm

alsoappliesasimilarchoppingheuristicinsomecases.

Table4containsalistingofthe59URLsinourstudy.ForeachURL,thethreecolumnslabeledCp,Ct,andNshowtheURLsforwhichtheCompanion,Cocitation,andNetscapealgorithmsreturnedresults,respectively.ThetablealsoshowsthenumberofhyperlinkspointingtotheURLintheConnectivityServer(Inlinks).FortheCompanionalgorithm,itshowsthenumberofnodesandedgesinthevicinitygraph,aswellasthewallclocktimeinmillisecondstakentocomputethesetofrelatedpages(computedbysurroundingthecomputationwithgettimeof-daysystemcalls).Forthecocitationalgorithm,itshowsthenumberofsiblingsfound,thenumberofsiblingscocitedatleasttwice(Cocited),andthewallclocktimetakentocomputetheanswers.

Thethreealgorithmsreturnanswersfordiffer-entsubsetsofour59URLs.Tocomparethesealgorithms,wecansubdividetheURLsintosev-eralgroups.TheintersectiongroupconsistsofthoseURLswhereallalgorithmsreturnedatleastoneanswer.Therewere37URLsinthisgroup.Thenon-NetscapegroupconsistsoftheURLswhereNetscape’sapproachdidnotreturnanyanswers.Itconsistsof19URLs.

Toquantifytheperformanceofthethreealgo-rithms,wenowdefinetwometrics.Theprecisionatrforagivenalgorithmisthetotalnumberofanswersreceivinga‘1’scorewithinthefirstranswers,di-videdbyrtimesthenumberofqueryURLs.NoticethatwhenanalgorithmdoesnotgiveanyanswersforaURL,thisisasifitgaveallnon-relevantanswersforthatURL.

ForagivenURLu,theaverageprecisionforuofanalgorithmisthesumoftheprecisionateachrankwheretheanswerofthealgorithmforureceiveda‘1’scoredividedbythetotalnumberoftheanswersofthealgorithmforureceivinga‘1’score.Ifthealgorithmdoesnotreturnanyanswersforu,itsaverageprecisionforuis0.TheoverallaverageprecisionforanalgorithmisthesumofalltheaverageprecisionsforallthequeryURLsdividedbythetotalnumberofqueryURLs.

ForeachofthethreegroupsofURLs(all,in-tersection,andnon-Netscape),Table5showstheaverageprecisionandtheprecisionat10foreachalgorithm.Fig.1showstheprecisionatrforeachof

1474J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479

Table4

User-providedURLsforevaluation(andstatistics)URL

Cp

Ct

N

InCompanionalgorithmCocitationalgorithm1.babelfish.altavista.digital.com/cgi...2.developer.intel.com/design/strong/t...3.english-server.hss.cmu.edu/4.hack.box.sk/

5.ieor.berkeley.edu/¾hochbaum/html/bo...6.linas.org/linux/corba.html

7.members.tripod.com/¾src-fall-regatt...8.metroactive.com/music/

9.travelwithkids.miningco.com/10.www-db.stanford.edu/¾wiener11.www.acf.dhhs.gov/

12.www.adventurewest.com/pub/NASTC/13.www.amazon.com/exec/obidos/cache/br...14.www.anglia.ac.uk/¾systimk/Humour/Hi...15.www.ayso26.org/

16.www.babynames.com/17.www.braintumor.org/

18.www.bris.ac.uk/Depts/Union/BTS/Scri...19.www.carrier.com/

20.www.chesschampions.com/kasparov.htm...21.www.cl.cam.ac.uk/users/rja14/tamper...22.www.davecentral.com/

23.www.duofold.com/stepout/ski-wsa.htm24.www.ebay.com/25.www.etoys.com/26.www.fifa.com/27.www.focus.de/

28.www.geocities.com/Paris/Metro/1324/29.www.geocities.com/TheTropics/2442/d...30.www.harappa.com/har/har0.html

31.www.harmony-central.com/MIDI/Doc/tu...32.www.hotelres.com/

33.www.innovation.ch/java/HTTPClient/34.www.inquizit.com/35.www.israelidance.com/36.www.jewishmusic.com/37.www.joh.cam.ac.uk/38.www.levenger.com/

39.www.mdl.sandia.gov/micromachine/gal...40.www.midiweb.com/41.www.minimed.com/

42.www.mit.edu/people/mkgray/net/43.www.mot-sps.com/44.www.movielink.com/

45.www.netusa1.net/¾spost/bench.html46.www.nsc.com/catalog/sg708.html47.www.odci.gov/cia/publications/factb...48.www.paccc.com/

49.www.perl.com/perl/index.html50.www.pianospot.com/1700305.htm

ppppppppppppppppppppp

pppppppp

pppp

pppppppppppppppppppppppppppppppppppp

pppppppppppppppppppppppppppppppppppppppppppppppppppppppp

pp

p

links

Nodes292846382193333774419716667963512320280013201997442171822552213650000514252826952017791315059275511571197416909470300465843257922941281561057662488111924503823531184708307273341126240210399161716255725913670019675304217778258113839188312274

62051900376543521840914530

0

EdgesTime(ms)10305269479128263215140904887344222770030267105127183149673949240000169847811101763351084162567212390312562711391

26300866019961531111236034914039361138768519318862497603614863913954323931116610272320835100155493401663332124441638115

33793008322221625159240

0

SiblingsCocited66011079853966302063609914934915444138680015231685845912881630136591401521611494111032700182965512123791521837580535220452155434911481856659511314335110055214524208130139151846520237153272380821568541817340123239740911411162741436133701141573223628749920656221207944229277411496708196221391839

23

Time(ms)5731758451517487363147110042414262901676872165553772549342752252530252321196211516134378715408697254514863336137585

J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479

Table4(continued)URL

Cp

Ct

N

Inlinks

CompanionalgorithmNodes105312580296546073458490

Edges16366106201105608515650

Time(ms)6311014030420130

CocitationalgorithmSiblings8137210607209633552344961846

Cocited16126074241684118486516709

1475

Time(ms)16386216118147823010214

51.www.rei-outlet.com/

52.www.sddt.com/files/library/94headli...53.www.sultry.arts.su.edu.au/links/lin...54.www.telemarque.com/articles/andrnch...55.www.trane.com/56.www.traxxx.de/57.www.us-soccer.com/

58.www.wisdom.weizmann.ac.il/¾naor/puz...59.www.wwa.com/¾android7/pilot/index.h...

ppppppp

ppppppppp

pppp

2095409001219117580

thesegroupsofURLsingraphsa,bandc.Fig.1aandbillustratethattheCompanionandCocitationalgorithmssubstantiallyoutperformNetscape’sal-gorithmatallranks,andtheCompanionalgorithmalmostalwaysoutperformstheCocitationalgorithm.Theintersectiongroupisthemostinterestingcomparison,sinceitavoidspenalizinganalgorithmfornotreturningatleastoneanswer.Fortheinter-sectiongroup,Netscape’salgorithmachievesapreci-sionat10of0.357,whiletheCompanionalgorithmachievesaprecisionat10of0.501(40%better),andtheCocitationalgorithmachievesaprecisionat10of0.435(22%better).Theaverageprecisionintheintersectiongroupdoesnotpenalizeanalgorithmforreturningfewerthan10answers.Underthismet-ric,theCompanionalgorithmis32%betterthanNetscape’salgorithm,whiletheCocitationalgorithmis20%betterthanNetscape’salgorithm.

InthegroupthatincludesallURLs,allthreeal-gorithmshaddropsintheirprecisionat10values.Therearetworeasonsforthis.Thefirstisthatalgo-rithmsweregivenaprecisionof0foragivenURLiftheydidnotreturnanyanswers.ThismostlyaffectedtheNetscapeandCompanionalgorithms.Thesec-Table5

PrecisionmetricsforeachalgorithmforthreegroupsofURLsAlgorithm

All

Averageprecision

CompanionCocitationNetscape

0.5410.5180.343

Precisionat100.4170.3630.241

Intersection

ondreasonisthatfortheURLsinthenon-Netscapeset,boththeCompanionandCocitationalgorithmsdidnotperformaswellastheydidforURLsintheIntersectionset.Despitethesedropsinabsoluteaverageprecision,theaverageprecisionoftheCom-panionalgorithmis57%betterthanthatofNetscape,andtheaverageprecisionoftheCocitationalgorithmis51%betterthanthatofNetscape.Similarresultsholdwhenexaminingaverageprecisionratherthanprecisionat10.

Toevaluatethestatisticalsignificanceofourre-sults,wecomputedthesigntestandtheWilcoxsonsumsofrankstestforeachpairofalgorithms[18].TheseresultsareshowninTable6andshowthatthedifferencebetweentheCompanionandNetscapealgorithmsandbetweentheCocitationandNetscapealgorithmsarestatisticallysignificant.

WealsowantedtoevaluatewhetherornotthealgorithmsweregenerallyreturningthesameresultsforagivenURLorwhethertheywerereturninglargelydisjointsetsofURLs.Table7showstheamountofoverlapintheanswersreturnedbyeachpairofalgorithms.Thepercentageinparenthesesistheoverlapdividedbythetotalnumberofan-

Non-Netscape

Precisionat100.5010.4350.357

Averageprecision0.5400.434n=a

Precisionat100.4010.325n=a

Averageprecision0.6660.6050.502

1476J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479

1.00.8

CompanionCocitationNetscape

1.00.80.60.40.20.0

1.00.80.60.40.20.0

110110Precision0.60.40.20.0

110rrr

(a) All(b) Intersection

Fig.1.PrecisionatrforthethreegroupsofURLs.

(c) Non-Netscape

Table6

SigntestandWilcoxonsumofrankstestforalgorithmpairsAlgorithm

AllSign

CompanionbetterthanNetscapeCocitationbetterthanNetscapeCompanionbetterthanCocitation

<0.00010.01360.1922

Ranksum0.00260.010.38

IntersectionSign0.00410.16850.0793

Ranksum0.03400.23400.2628

Non-NetscapeSignn=an=a0.23

Ranksumn=an=a0.4180

Table7

OverlapbetweenanswersreturnedbyalgorithmsAlgorithmCompanionCocitationNetscape

Companion253(44%)55(15%)

Cocitation253(51%)56(15%)

Netscape55(11%)56(10%)

swersreturnedbythealgorithminthatrow.Asthetableshows,thereisalargeoverlapbetweentheanswersreturnedbytheCompanionandCocitationalgorithms.Thisisnotsurprising,sincethetwoalgorithmsarebothbasedonconnectivityinforma-tionsurroundingtheinputURLandsincebothusesimilarparameterstochoosethesurroundingnodes.ThereisrelativelylittleoverlapbetweentheanswersreturnedbyNetscapeandtheothertwoalgorithms.4.3.Run-timeperformance

Inthissection,wepresentdataabouttherun-timeperformanceoftheCompanionandCocitationalgorithms.SincewedonothavedirectaccesstoNetscape’salgorithmandonlyaccessitthroughthepublicWebinterface,weareunabletopresentper-

formanceinformationforNetscape’salgorithm.AllmeasurementswereperformedonaCompaqAl-phaServer4100with8GBofmainmemoryanddual466MHzAlphaprocessors.ThemeasuredrunningtimesarewallclocktimesfromthetimetheinputURLisgiventotheserveruntilthetimetheanswersarereturned.ThesetimesdonotincludethetimetakentoformattheresultsasanHTMLpage,sincethatwasdonebyaserverprocessrunningonanothermachine(andthetimetodothiswasnegligible).TheaveragerunningtimefortheCompanionalgorithmonthe50URLsforwhichitreturnedanswerswas109msec,whiletheaveragerunningtimefortheCocitationalgorithmonthe58URLsforwhichitprovidedanswerswas195msec.Theperformanceofboththesealgorithmsissufficientlyfastthateitheronecouldhandlealargeamountoftraffic(closeto800,000requestsperdayfortheCompanionalgorithm).Furthermore,theaverageperformancecouldprobablybeimprovedbycachinganswersforfrequentlyrequestedURLs.

Althoughwedidnotexplicitlyincludethisfactorinouruserstudy,wehaveinformallyobservedthatthesubjectivequalityofanswersreturnedforboththeCompanionandtheCocitationalgorithmsdoes

J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–14791477

Companion time (msec)40030020010000500010000Cocitation time (msec)600

400

200

(a)1500000200040006000(b)# of siblings

Graph edges

Fig.2.GraphsizeversusrunningtimeofCompanionandCocitationalgs.

notdecreasewhenwesomewhatdecreasetheparam-eterB(thenumberofinlinksconsidered)duringthebuildingofthevicinitygraph.Thisisimportantforon-lineservicesbecauseitmeansthatthegraphsizecouldbereducedduringtimesofhighload,therebyreducingtheamountoftimetakentoserviceeachrequest.Underconditionsoflowload,thegraphsizecouldbeincreased.

TheCompanionalgorithmgenerallyconvergesonitsanswerswithinafewiterations(typicallylessthan10iterations),butthenumberofiterationsincreaseswiththegraphsize.Eachiterationtakestimethatislinearinthenumberofedgesinthevicinitygraph.WeplottherunningtimeversusthenumberofgraphedgesinFig.2a.

TherunningtimeoftheCocitationalgorithmisO.nlogn/,wherenisthenumberofsiblingsexaminedforcocitation,sinceitsortsthesiblingsbythedegreeofcocitation.ThiseffectisillustratedinFig.2b.Inourexperience,therunningtimesforthecocitationandcompanionalgorithmsaregenerallycorrelated,sinceURLswhichhavealargenumberofsiblingstoconsiderinthecocitationalgorithmalsogenerallyproducealargeneighborhoodgraphforprocessinginthecompanionalgorithm.5.Relatedwork

Manyresearchershaveproposedschemesforus-ingthehyperlinkstructureoftheWeb[1,3,4,8,12,16,17,20,22,23].Forthemostpart,thisworkdoesnotdiscussthefindingofrelatedpages,withfourexceptionsdiscussedbelow.

Weknowofonlyonepreviousworkthatexploitstheorderoflinks:Chakrabartietal.[6]usethelinksandtheirordertocategorizeWebpagesandtheyshowthatthelinksthatarenearagivenlinkinpageorderfrequentlypointtopagesonthesametopic.PreviousauthorshavesuggestedusingcocitationandotherformsofconnectivitytoidentifyrelatedWebpages.Spertusobservedthatcocitationcanin-dicatethattwopagesarerelated[20].Thatis,ifpageApointstobothpagesBandC,thenBandCmightberelated.Variousresearchersinthefieldofbibliometricshavealsoobservedthis[9–11,19],andthisobservationformsthebasisofourCocitationalgorithm.Thenotionofcollaborativefiltering,al-thoughitisbasedonuser’srecommendationsratherthanhyperlinks,alsoreliesonthisobservation[21].PitkowandPirolli[16]clusterWebpagesbasedoncocitationanalysis.TerveenandHill[22]usetheconnectivitystructureoftheWebtofindgroupsofrelatedWebsites.

OurcompanionalgorithmdescendedfromtheHITSalgorithmdevelopedbyKleinberg[12].TheHITSalgorithmwasoriginallyproposedbyKlein-bergasawayofusingconnectivitystructuretoiden-tifythemostauthoritativesourcesofinformationonaparticulartopic,wherethetopicwasdefinedbythecombinedlinkstructureofalargenumberofstartingnodesonthetopic.Kleinbergalsoproposedthatthe

1478J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479

HITSalgorithmcouldbeusedtofindrelatedpageswhenthetopicwasdefinedbyjustasinglenode.TheCompanionalgorithmusedHITSalgorithmasastartingpointandextendedandmodifieditinfourmainways:

(1)Kleinbergsuggestedusingthefollowinggraph

tofindrelatedpages:Takeafixednumber(say200)ofparentsofthegivenURLandcallthesetconsistingoftheURLandtheseparentsthestartset.Nowbuildthegraphconsistingofallnodespointingtoanodeinthestartsetorpointedtobyanodeinthestartset.Thismeansthat‘grandparents’ofuareincludedinthegraph,whilenodesthatshareachildwithuarenotincludedinthegraph.Webelievethatthelatternodesaremorelikelytoberelatedtouthanarethe‘grandparent’nodes.Thereforeourvicinitygraphisstructuredtoexcludegrandparentnodesbuttoincludenodesthatshareachildwithu.(2)Weexploittheorderofthelinksonapage

todeterminewhich‘siblings’ofutoinclude.Whenweaddedthisfeature,theprecisionofouralgorithmimprovednoticably.

(3)TheoriginalHITSalgorithmdidnothaveedge

weights.Weuseedgeweightstoreducetheinfluenceofpagesthatallresideononehost,sinceBharatandHenzingerhaveshownthatedgeweightsimprovetheprecision[3].

(4)Wealsomergenodesinourvicinitygraphthat

havealargenumberofduplicatelinks.Dupli-catenodesarenotsuchaseriousproblemwhenusingtheHITSorimpalgorithmstorankqueryresults,sincethestartsetconsistsofalargenumberofURLs.However,whenformingthevicinitygraphstartingwithjustasingleURL,theinfluenceofduplicatenodesisincreasedbe-causeduplicatenodeswithalargenumberofoutlinkswillquicklydominatethehubandauthoritycomputation.

KleinbergalsoshowedthatHITScomputestheprincipaleigenvectorofthematrixAAT,whereAistheadjacencymatrixoftheabovegraph,andsug-gestedusingnon-principaleigenvectorsforfindingrelatedpages.Finally,hegaveanecdotalevidencethatHITSmightworkwell.

Consecutively,asequenceofpapers[5,7]pre-sentedimprovementsonHITSandusedittopopu-lateagivenhierarchyofcategories.

6.Conclusion

WehavepresentedtwodifferentalgorithmsforfindingrelatedpagesintheWorldWideWeb.TheysignificantlyoutperformNetscape’salgorithmforfindingrelatedpages.Thealgorithmscanbeimple-mentedefficientlyandaresuitableforusewithinalargescaleWebserviceprovidingarelatedpagesfeature.

OurtwoalgorithmscanbeextendedtohandlemorethanoneinputURL.Inthiscase,thealgo-rithmswouldcomputepagesthatarerelatedtoallinputURLs.Wearecurrentlyexploringtheseexten-sions.

Acknowledgements

ThisworkhasbenefitedgreatlyfromdiscussionswehavehadwithKrishnaBharat,AndreiBroder,PuneetKumar,andHannesMarais.WearealsoindebtedtoPuneetforhisworkontheConnec-tivityServer.Assomeoftheearliestusersoftheserver,Puneetansweredourmanyquestionsandimplementedmanyimprovementstotheserverinresponsetooursuggestions.WearealsogratefultoHannesMaraisfordevelopingWebL,aWebscript-inglanguage[13].UsingWebL,wewereabletoquicklydevelopaprototypeuserinterfaceforourrelatedpagesserver.KrishnaBharat,AllanHeydon,andHannesMaraisprovidedusefulfeedbackonear-lierdraftsofthispaper.Finally,wewouldliketothankalltheparticipantsinouruserstudy.

References

[1]G.O.Arocena,A.O.MendelzonandG.A.Mihaila,Appli-cationsofawebquerylanguage,in:Proc.oftheSixthInternationalWorldWideWebConference,pp.587–595,SantaClara,CA,April,1997.

[2]K.Bharat,A.Z.Broder,M.Henzinger,P.KumarandS.

Venkatasubramanian,Theconnectivityserver:fastaccesstolinkageinformationontheWeb,in:Proc.ofthe7thInternationalWorldWideWebConference,pp.469–477,Brisbane,Qld,April1998.

[3]K.BharatandM.Henzinger,Improvedalgorithmsfortopic

distillationinhyperlinkedenvironments,in:Proc.ofthe21stInternationalACMSIGIRConferenceonResearch

J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479

1479

andDevelopmentinInformationRetrieval(SIGIR’98),pp.104–111,1998.

[4]

S.BrinandL.Page,Theanatomyofalarge-scalehyper-textualWebsearchengine,in:Proc.ofthe7thInternationalWorldWideWebConference,pp.107–117,Brisbane,Qld.,April1998.

[5]

S.Chakrabarti,B.Dom,D.Gibson,S.R.Kumar,P.Ragha-van,S.RajagopalanandA.Tomkins,Experimentsintopicdistillation,in:ACM–SIGIR’98Post-ConferenceWorkshoponHypertextInformationRetrievalfortheWeb,1998.

[6]

S.Chakrabarti,B.DomandP.Indyk,Enhancedhypertextcategorizationusinghyperlinks,in:Proc.oftheACMSIG-MODInternationalConferenceonManagementofData,pp.307–318,1998.

[7]

S.Chakrabarti,B.Dom,P.Raghavan,S.Rajagopalan,D.GibsonandJ.Kleinberg,Automaticresourcecompilationbyanalyzinghyperlinkstructureandassociatedtext,in:Proc.oftheSixthInternationalWorldWideWebConfer-ence,pp.65–74,SantaClara,CA,April1997.

[8]

J.CarriereandR.Kazman,WebQuery:Searchingandvisu-alizingthewebthroughconnectivity,in:Proc.oftheSixthInternationalWorldWideWebConference,pp.701–711,SantaClara,CA,April1998.

[9]E.Garfield,Citationanalysisasatoolinjournalevaluation,Science178(1972).

[10]E.Garfield,CitationIndexing,ISIPress,Philadelphia,PA,1979.

[11]M.M.Kessler,Bibliographiccouplingbetweenscientificpapers,AmericanDocumentation14(1963).

[12]

J.Kleinberg,Authoritativesourcesinahyperlinkedenvi-ronment,in:Proc.ofthe9thAnnualACM–SIAMSympo-siumonDiscreteAlgorithms,pp.668–677,January1998.[13]

T.KistlerandH.Marais,WebL—Aprogramminglan-guagefortheWeb,in:Proc.ofthe7thInternationalWorldWideWebConference,pp.259–270,Brisbane,Qld.,April1998.

[14]

NetscapeCommunicationsCorporation,‘What’sRelatedFAQ’webpage,http://home.netscape.com/escapes/related/faq.html

[15]NetscapeCommunicationsCorporation,‘What’sRelated’webpage,http://home.netscape.com/escapes/related/

[16]

J.PitkowandP.Pirolli,Life,death,andlawfulnessontheelectronicfrontier,in:Proc.oftheConferenceonHumanFactorsinComputingSystems(CHI97),pp.383–390,March1997.

[17]

P.Pirolli,J.PitkowandR.Rao,Silkfromasow’sear:

ExtractingusablestructuresfromtheWeb,in:Proc.oftheConferenceonHumanFactorsinComputingSystems(CHI96),pp.118–125,April1996.

[18]S.M.Ross,IntroductoryStatistics,McGraw-Hill,NewYork,1996.

[19]

H.Small,Co-citationinthescientificliterature:anewmea-sureoftherelationshipbetweentwodocuments,JournalAmericanSocietyInformationScience24(1973).

[20]

E.Spertus,ParaSite:MiningstructuralinformationontheWeb,in:Proc.oftheSixthInternationalWorldWideWebConference,pp.587–595,SantaClara,CA,April1997.[21]

U.ShardanandandP.Maes,Socialinformationfiltering:Algorithmsforautomating‘WordofMouth’,in:Proc.ofthe1995ConferenceonHumanFactorsinComputingSystems(CHI’95),1995.

[22]

L.TerveenandW.Hill,Findingandvisualizinginter-siteclangraphs,in:Proc.oftheConferenceonHumanFactorsinComputingSystems(CHI-98):MakingtheImpossiblePossible,pp.448–455,ACMPress,NewYork,April18–23,1998.

[23]

L.TerveenandW.Hill,Evaluatingemergentcollabora-tionontheWeb,in:Proc.ofACMCSCW’98ConferenceonComputer-SupportedCooperativeWork,pp.355–362,SocialFiltering,SocialInfluences,1998.

JeffreyDeanreceivedhisPh.D.fromtheUniversityofWashing-tonin1996,workingonefficientimplementationtechniquesforobject-orientedlanguagesunderProfessorCraigChambers.HejoinedCompaq’sWesternResearchLaboratoryin1996,whereheworkedonprofilingtechniques,performancemonitoringhard-ware,compileralgorithmsandinformationretrieval.InFebruary,1999,hejoinedmySimon,Inc.,whereheiscurrentlyworkingonscalablecomparisonshoppingsystemsfortheWorldWideWeb.HiscurrentresearchinterestsincludeinformationretrievalandthedevelopmentofscalablesystemsfortheWorldWideWeb.Heistwocontinentsshyofhisgoalofplayingbasketballoneverycontinent.

MonikaR.HenzingerreceivedherPh.D.fromPrincetonUni-versityin1993underthesupervisionofRobertE.Tarjan.Af-terwards,shewasanassistantprofessorinComputerScienceatCornellUniversity.ShejoinedtheDigitalSystemsResearchCenter(nowCompaqComputerCorporation’sSystemsResearchCenter)in1996.HercurrentresearchinterestsareinformationretrievalontheWorldWideWebandalgorithmicproblemsarisinginthiscontext.

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- huatuo0.cn 版权所有 湘ICP备2023017654号-2

违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务