FindingrelatedpagesintheWorldWideWeb
JeffreyDeanŁ,1,MonikaR.Henzinger2
CompaqSystemsResearchCenter,130LyttonAve.,PaloAlto,CA94301,USA
Abstract
Whenusingtraditionalsearchengines,usershavetoformulatequeriestodescribetheirinformationneed.ThispaperdiscussesadifferentapproachtoWebsearchingwheretheinputtothesearchprocessisnotasetofqueryterms,butinsteadistheURLofapage,andtheoutputisasetofrelatedWebpages.ArelatedWebpageisonethataddressesthesametopicastheoriginalpage.Forexample,www.washingtonpost.comisapagerelatedtowww.nytimes.com,sincebothareonlinenewspapers.
WedescribetwoalgorithmstoidentifyrelatedWebpages.ThesealgorithmsuseonlytheconnectivityinformationintheWeb(i.e.,thelinksbetweenpages)andnotthecontentofpagesorusageinformation.Wehaveimple-mentedbothalgorithmsandmeasuredtheirruntimeperformance.Toevaluatetheeffectivenessofouralgorithms,weperformedauserstudycomparingouralgorithmswithNetscape’s‘What’sRelated’service(http://home.netscape.com/escapes/related/).Ourstudyshowedthattheprecisionat10forourtwoalgorithmsare73%betterand51%betterthanthatofNetscape,despitethefactthatNetscapeusesbothcontentandusagepatterninformationinadditiontoconnectivityinformation.©1999PublishedbyElsevierScienceB.V.Allrightsreserved.Keywords:Searchengines;Relatedpages;Searchingparadigms
1.Introduction
TraditionalWebsearchenginestakeaqueryasinputandproduceasetof(hopefully)relevantpagesthatmatchthequeryterms.Whileusefulinmanycircumstances,searchengineshavethedisadvantagethatusershavetoformulatequeriesthatspecifytheirinformationneed,whichispronetoerrors.ThispaperdiscusseshowtofindrelatedWebpages,adifferentapproachtoWebsearching.Inourapproach
author.Presentaddress:mySimon,Inc.,Santa
Clara,CA,USA.E-mail:jdean@mysimon.com
1ThisworkwasdonewhiletheauthorwasattheCompaqWesternResearchLaboratory.2E-mail:monika@pa.dec.com
ŁCorrespondingtheinputtothesearchprocessisnotasetofqueryterms,buttheURLofapage,andtheoutputisasetofrelatedWebpages.ArelatedWebpageisonethataddressesthesametopicastheoriginalpage,butisnotnecessarilysemanticallyidentical.Forexample,givenwww.nytimes.com,thetoolshouldfindothernewspapersandnewsorganizationsontheWeb.Ofcourse,incontrasttosearchengines,ourapproachrequiresthattheuserhasalreadyfoundapageofinterest.
RecentworkininformationretrievalontheWebhasrecognizedthatthehyperlinkstructurecanbeveryvaluableforlocatinginformation[1,3,4,8,12,16,17,20,22,23].Thisassumesthatifthereisalinkfrompagevandw,thentheauthorofvrecommendspagew,andlinksoftenconnect
13-1286/99/$–seefrontmatter©1999PublishedbyElsevierScienceB.V.Allrightsreserved.PII:S13-1286(99)00022-5
1468J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479
Table1
ExampleresultsproducedbytheCompanionalgorithmInput:www.nytimes.com
www.usatoday.com
USATodaynewspaperwww.washingtonpost.comWashingtonPostnewspaperwww.cnn.com
CableNewsNetwork
www.latimes.comLosAngelesTimesnewspaperwww.wsj.comWallStreetJournalnewspaperwww.msnbc.com
MSNBCcablenewsstationwww.sjmercury.com
SanJoseMercuryNewsnewspaper
www.chicago.tribune.comChicagoTribunenewspaperwww.nando.netNandoTimeson-linenews
service
www.the-times.co.ukTheTimesnewspaperrelatedpages.Inthispaper,wedescribetheCom-panionandCocitationalgorithms,twoalgorithmswhichuseonlythehyperlinkstructureoftheWebtoidentifyrelatedWebpages.Forexample,Table1showstheoutputoftheCompanionalgorithmwhengivenwww.nytimes.comasinput(inthiscase,theresultsfortheCocitationalgorithmareidenticalandtheresultsforNetscapeareverysimilar,althoughthisisnotalwaystrue).
Oneofourgoalswastodesignalgorithmswithhighprecisionthatareveryfastandthatdonotrequirealargenumberofdifferentkindsofinputdata.SincewehaveatoolthatgivesusaccesstothehyperlinkstructureoftheWeb(theConnectivityServer[2]),wefocusedonalgorithmsthatonlyuseconnectivityinformationtoidentifyrelatedpages.Ouralgorithmsuseonlytheinformationaboutthelinksthatappearoneachpageandtheorderinwhichthelinksappear.Theyneitherexaminethecontentofpages,nordotheyexaminepatternsofhowuserstendtonavigateamongpages.
OurCompanionalgorithmisderivedfromtheHITSalgorithmproposedbyKleinbergforrankingsearchenginequeries[12].KleinbergsuggestedthattheHITSalgorithmcouldbeusedforfindingrelatedpagesaswell,andprovidedanecdotalevidencethatitmightworkwell.Inthispaper,weextendthealgorithmtoexploitnotonlylinksbutalsotheirorderonapage(seeSection2.1.1)andpresenttheresultsofauser-studyshowingthattheresultingalgorithmworksverywell.
TheCocitationalgorithmfindspagesthatarefrequentlycocitedwiththeinputURLu(thatis,itfindsotherpagesthatarepointedtobymanyotherpagesthatallalsopointtou).
NetscapeCommunicatorVersion4.06introducedarelatedpagesservicethatisbuiltintothebrowser[15](seeSection2.3foramoredetaileddiscus-sion).Onthebrowserscreen,thereisa‘What’sRe-lated’button,whichpresentsamenuofrelatedWebpagesinsomecases.The‘What’sRelated’algo-rithminNetscapeisbasedontechnologydevelopedbyAlexa,Inc.,andcomputesitsanswersbasedonconnectivityinformation,contentinformation,andusageinformation[14].
Tocomparetheperformanceofourtwoalgo-rithmsandNetscape’salgorithm,weperformedauserstudyon59URLschosenby18volunteers.Ourstudyresultsshowthattheprecisionat10com-putedoverall59URLsofourtwoalgorithmsare73%betterand51%betterthanNetscape’s.NotallalgorithmsgaveanswersforallURLsinourstudy.Ifwerestrictthecomparisontoonlythe37URLsforwhichallthreealgorithmsreturnedanswers,thentheprecisionat10ofourtwoalgorithmsare40%and22%betterthanNetscape’salgorithm.Thisissurprisingsinceouralgorithmsarebasedonlyonconnectivityinformation.
Netscape’salgorithmgivesanswersforabout17millionURLs[14],whileouralgorithmscangiveanswersforamuchlargersetofURLs(wehavecon-nectivityinformationon180millionURLs).ThisisimportantbecauseitmeansthatwecangiverelatedURLinformationformoreURLs.Ouralgorithmsarealsofast:inourenvironment,bothaveragelessthan200msecofcomputationperinputURL.
TheexampleshowninTable1isforaURLwithaveryhighlevelofconnectivity(www.nytimes.comcontains47,751inlinksinourConnectivityServer),andallthreealgorithmsgenerallyperformquitewellforwell-connectedURLs.Ouralgorithmscanalsoworkwellwhenthereismuchlessconnectivity,asshownbytheexampleinTable2.ThistableshowstheanswersfortheCompanionandNetscapealgo-rithmsforlinas.org/linux/corba.html,oneoftheinputURLschosenbyauseraspartofouruserstudy.Alongsideeachansweristheuser’srat-ingforeachanswer,witha‘1’meaningthattheuserconsideredthepagerelated,‘0’meaningthatthe
J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479
Table2
ComparisonoftheresultsoftheCompanionandNetscapealgorithmsInput:linas.org/linux/corba.htmlCompanion1111111011
www.cs.wustl.edu/¾schmidt/TAO.htmldsg.harvard.edu/public/arachnejorba.castle.net.au/
www-b0.fnal.gov:8000/ROBIN
www.paragon-software.com/products/oakwww.tatanka.com/orb1.htm
www.oot.co.uk/dome-index.htmlwww.intellisoft.com/¾mark
www.dstc.edu.au/AU/staff/mart...www.inf.fu-berlin.de/¾brose/jacorb
Netscape00––000100
labserver.kuntrynet.com/¾linuxweb.singnet.com.sg/¾siuyin
www.clark.net/pub/srokicki/linuxwww.earth.demon.co.uk/linux/ukwww.emry.net/webwatcher
www.interl.net/¾jasoneng/NLL/lwrwww.jnpcs.com/mkb/linuxwww.linuxresources.com/www.liszt.com/
www.local.net/¾jgo/linuxhelp.html
1469
A‘1’meansthatthepagewasvaluable,a‘0’meansthatthepagewasnotvaluable,a‘–’meansthatthepagecouldnotbeaccessed.
userconsideredthepageunrelated,and‘–’meaningthattheuserwasunabletoaccessthepageatall.TheoriginalpagewasaboutCORBAimplementa-tionsforLinux,andtherewere123pagespointingtothispageinourConnectivityServer.NineofthetenanswersgivenbytheCompanionalgorithmweredeemedrelatedbyouruser,whileonlyonepagefromNetscape’ssetofanswerswasdeemedrelated.MostofNetscape’sanswerswereaboutthemuchbroadertopicofLinux,ratherthanspecificallyaboutCORBAimplementationsonLinux.
Section2presentsouralgorithmsindetailandde-scribesNetscape’sservice,whileSection3discussestheimplementationofouralgorithms.Section4de-scribestheuserstudyweperformedandpresentsitsresults,andalsoprovidesabriefperformanceeval-uationofouralgorithms.Finally,Section6presentsourconclusions.
2.Relatedpagealgorithms
Inthissectionwedescribeourtwoalgorithms(theCompanionalgorithmandtheCocitationalgorithm),aswellasNetscape’salgorithm.UnlikeNetscape’salgorithm,bothofouralgorithmsexploitonlythehyperlink-structure(i.e.graphconnectivity)oftheWebanddonotexaminetheinformationaboutthecontentorusageofpages.Netscape’salgorithmusesallthreekindsofinformationtoarriveatitsresults.
Inthesectionsbelow,weusethetermsparentandchild.Ifthereisahyperlinkfrompagewtopagev,wesaythatwisaparentofvandthatvisachildofw.
2.1.Companionalgorithm
TheCompanionalgorithmtakesasinputastartingURLuandconsistsoffoursteps:(1)Buildavicinitygraphforu.
(2)Contractduplicatesandnear-duplicatesinthis
graph.
(3)Computeedgeweightsbasedonhosttohost
connections.
(4)Computeahubscoreandanauthorityscorefor
eachnodeinthegraphandreturnthetoprankedauthoritynodes(ourimplementationreturnsthetop10).Thisphaseofthealgorithmusesamod-ifiedversionoftheHITSalgorithmoriginallyproposedbyKleinberg[12].
Thesestepsaredescribedinmoredetailinthesubsectionsbelow.Onlystep1exploitstheorderoflinksonapage.
2.1.1.Step1:buildingthevicinitygraph
GivenaqueryURLuwebuildadirectedgraphofnodesthatarenearbytouintheWebgraph.GraphnodescorrespondtoWebpagesandgraphedgescorrespondtohyperlinks.Thegraphconsistsofthefollowingnodes(andtheedgesbetweenthesenodes):
1470J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479
(1)u,
(2)uptoBparentsofu,andforeachparentupto
BFofitschildrendifferentfromu,and
(3)uptoFchildrenofu,andforeachchildupto
FBofitsparentsdifferentfromu.
Hereishowwechoosethesenodesindetail:ThereisastoplistSTOPofURLsthatareunrelatedtomostqueriesandthathaveveryhighindegree.Ourstoplistwasdevelopedbyexperimentation,andcurrentlycontains21URLs.Examplesofnodesthatappearonourstoplistarewww.yahoo.comandwww.microsoft.com/ie/download.html.IfthestartingURLuisnotoneofthenodesonourstoplist,thenweignorealltheURLsonthestoplistwhenformingthevicinitygraph.Ifudoesappearonthestoplist,however,thenwedisablethestoplist(i.e.setSTOPtotheemptylist)andfreelyincludeanynodesinthevicinitygraph.WedisablethestoplistwhentheinputURLappearsonthestoplistbecausemanynodesonthestoplistarepopularsearchenginesandportals,andwewanttopermitthesenodestobeconsideredwhentheinputURLisanothersuchpopularsite.
Goback(B),andback-forward(BF):IfuhasmorethanBparents,addBrandomparentsnotonSTOPtothegraph;otherwiseaddallofu’sparents.IfaparentxofuhasmorethanBFC1children,adduptoBF=2childrenpointedtobytheBF=2linksonximmediatelyprecedingthelinktouanduptoBF=2childrenpointedtobytheBF=2linksonximmediatelysucceedingthelinktou(ignoringduplicatelinks).IfpagexhasfewerthanBFchildren,weaddallofitschildrentothegraph.Notethatthisexploitstheorderoflinksonpagex.Goforward(F),andforward-back(FB):IfuhasmorethanFchildren,addthechildrenpointedtobythefirstFlinksofu;otherwise,addallofu’schil-dren.IfachildofuhasmorethanBFparents,addtheBFparentsnotonSTOPwithhighestindegree;otherwise,addallofthechild’sparents.
Ifthereisahyperlinkfromapagerepresentedbynodevinthegraphtoapagerepresentedbynodew,andvandwdonotbelongtothesamehost,thenthereisadirectededgefromvtowinthegraph(weomitedgesbetweennodesonthesamehost).
Inourexperience,wehavefoundthatusingalargevalueforB(2000)andasmallvalueforBF(8)worksbetterinpracticethanusingmoderatevalues
foreach(say,50and50).Wehaveobservedthatlinkstopagesonasimilartopictendtobeclusteredtogether,whilelinksthatarefartherapartonapagearelesslikelytobeonthesametopic(forexample,mosthotlistsaregroupedintocategories).Thishasalsobeenobservedbyotherresearchers[6].UsingalargervalueforBalsomeansthatthelikelihoodofthecomputationbeingdominatedbyasingleparentpageisreduced.
2.1.2.Step2:duplicateelimination
Afterbuildingthisgraphwecombinenear-dupli-cates.Wesaytwonodesarenear-duplicatesif(a)theyeachhavemorethan10linksand(b)theyhaveatleast95%oftheirlinksincommon.Tocombinetwonear-duplicateswereplacetheirtwonodesbyanodewhoselinksaretheunionofthelinksofthetwonear-duplicates.Thisduplicateeliminationphaseisimportantbecausemanypagesareduplicatedacrosshosts(e.g.mirrorsites,differentaliasesforthesamepage),andwehaveobservedthatallowingthemtoremainseparatecangreatlydistorttheresults.2.1.3.Step3:assignedgeweights
Inthisstage,weassignaweighttoeachedge,usingtheedgeweightingschemeofBharatandHen-zinger[3]whichwerepeathereforcompleteness.Anedgebetweentwonodesonthesamehost3hasweight0.Iftherearekedgesfromdocumentsonafirsthosttoasingledocumentonasecondhostwegiveeachedgeanauthorityweightof1=k.Thisweightisusedwhencomputingtheauthorityscoreofthedocumentonthesecondhost.Ifthereareledgesfromasingledocumentonafirsthosttoasetofdocumentsonasecondhost,wegiveeachedgeahubweightof1=l.Weperformthisscalingtopreventasinglehostfromhavingtoomuchinfluenceonthecomputation.
Wecalltheresultingweightedgraphthevicinitygraphofu.
2.1.4.Step4:computehubandauthorityscoresInthisstep,weruntheimpalgorithm[3]onthegraphtocomputehubandauthorityscores.Theimp
3
Weassumethroughoutthepaperthatthehostcanbedeter-minedfromtheURL-string.
J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–14791471
algorithmisastraightforwardextensionoftheHITSalgorithmwithedgeweights.
TheintuitionbehindtheHITSalgorithmisthatadocumentthatpointstomanyothersisagoodhub,andadocumentthatmanydocumentspointtoisagoodauthority.Transitively,adocumentthatpointstomanygoodauthoritiesisanevenbetterhub,andsimilarlyadocumentpointedtobymanygoodhubsisanevenbetterauthority.TheHITScomputationrepeatedlyupdateshubandauthorityscoressothatdocumentswithhighauthorityscoresareexpectedtohaverelevantcontent,whereasdocumentswithhighhubscoresareexpectedtocontainlinkstorelevantcontent.Thecomputationofhubandauthorityscoresisdoneasfollows:
InitializeallelementsofthehubvectorHto1.0.InitializeallelementsoftheauthorityvectorAto1.0.
WhilethevectorsHandAhavenotconverged:ForallnodesninthevicinitygraphN,A[n]:DP
.n0;n/2edges.N/H[n0]
ðauthorityweight.n0;n/
ForallninNH[n]:DP
.n;n0/2edges.N/A[n0]
ðhubweight.n;n0/
NormalizetheHandAvectors.Notethatthealgorithmdoesnotclaimtofindallrelevantpages,sincetheremaybesomethathavegoodcontentbuthavenotbeenlinkedtobymanyauthors.
TheCompanionalgorithmthenreturnsthenodeswiththetenhighestauthorityscores(excludinguitself)asthepagesthataremostrelatedtothestartpageu.
2.2.Cocitationalgorithm
AnalternativeapproachforfindingrelatedpagesistoexaminethesiblingsofastartingnodeuintheWebgraph.Twonodesarecocitediftheyhaveacommonparent.Thenumberofcommonparentsoftwonodesistheirdegreeofcocitation.AsanalternativetotheCompanionalgorithm,wehavedevelopedaverysimplealgorithmthatdeterminesrelatedpagesbylookingforsiblingnodeswiththehighestdegreeofcocitation.TheCocitationalgo-
rithmfirstchoosesuptoBarbitraryparentsofu.Foreachoftheseparentsp,itaddstoasetSuptoBFchildrenofpthatsurroundthelinkfromptou.TheelementsofSaresiblingsofu.ForeachnodesinS,wedeterminethedegreeofcocitationofswithu.Finally,thealgorithmreturnsthe10mostfrequentlycocitednodesinSastherelatedpages.
Insomecasesthereisaninsufficientlevelofcoc-itationwithutoprovidemeaningfulresults.Inourimplementation,iftherearenotatleast15nodesinSthatarecocitedwithuatleasttwice,thenwerestartthealgorithmusingthenodecorrespondingtou’sURLwithonepathelementremoved.Forexam-ple,ifu’sURLisa.com/X/Y/ZandaninsufficientnumberofcocitednodesexistforthisURL,thenwerestartthealgorithmwiththeURLa.com/X/Y(iftheresultingURLisinvalid,wecontinuetochopelementsuntilweareleftwithjustahostname,orwefindavalidURL).
Inourimplementation,wechoseBtobe2000andBFtobe8(thesameparametervaluesweusedforourimplementationoftheCompanionalgorithm).OnewayoflookingattheCocitationalgorithmisthatitfinds‘maximal’nð2bipartitesubgraphsinthevicinitygraph.2.3.Netscape’sapproach
Netscapeintroducedanew‘What’sRelated?’fea-tureinversion4.06oftheNetscapeCommunicatorbrowser.Detailsabouttheapproachusedtoidentifyrelatedpagesintheiralgorithmaresketchy.How-ever,theWhat’sRelatedFAQpageindicatesthatthealgorithmusesconnectivityinformation,usageinformation,andcontentanalysisofthepagestodeterminerelationships.Wequotefromthe‘What’sRelated’FAQpage:
TheWhat’sRelateddataiscreatedbyAlexaInternet.Alexausescrawling,archiving,categorizinganddataminingtechniquestobuildtherelatedsiteslistsformillionsofWebURLs.Forexample,Alexauseslinksonthecrawledpagestofindrelatedsites.Theday-to-dayuseofWhat’sRelatedalsohelpsbuildandrefinethedata.Astheserviceisused,therequestedURLsarelogged.Bylookingathigh-leveltrends,NetscapeandAlexacandeducerelationshipsbetweenWebsites.Forexample,ifthousandsofusersgodirectlyfromsiteAtositeB,thetwositesarelikelytoberelated.
1472J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479
Next,AlexachecksalltheURLstomakesuretheyarelivelinks.Thisprocessremoveslinksthatwouldtrytoreturnpagesthatdon’texist(404errors),aswellasanylinkstoserversthatarenotavailabletothegeneralInternetpopulation,suchasserversthatarenolongeractiveorarebehindfirewalls.Finally,oncealloftherelationshipsareestablishedandthelinksarechecked,thetoptenrelatedsitesforeachURLarechosenbylookingatthestrengthoftherelationshipbetweenthesites.
Eachmonth,AlexarecrawlstheWebandrebuildsthedatatopullinnewsitesandtorefinetherelationshipsbetweentheexistingsites.Newsiteswithstrongre-lationshipstoasitewillautomaticallyappearintheWhat’sRelatedlistforthatsitebydisplacinganysiteswithweakerrelationships.
Notethatsincetherelationshipsbetweensitesarebasedonstrength,What’sRelatedlistsarenotnec-essarilybalanced.SiteAmayappearinthelistforSiteB,butSiteBmaynotbeinthelistforSiteA.Generally,thishappenswhenthenumberofsiteswithstrongrelationshipsisgreaterthanten,orwhensitesdonothavesimilarenoughcontent.
3.Implementation
Inexperimentingwiththesealgorithms,wewerefortunatetohaveaccesstoCompaq’sConnectivityServer[2].TheConnectivityServerprovideshighspeedaccesstothegraphstructureof180millionURLs(nodes)andthelinks(edges)thatconnectthem.TheentiregraphstructureiskeptinmemoryonaCompaqAlphaServer4100with8GBofmainmemoryanddual466MHzAlphaprocessors.Therandomaccesspatternsengenderedbytheconnec-tivity-basedalgorithmsdescribedinthispapermeanthatitisimportantformostorallofthegraphtofitinmainmemorytopreventhighlevelsofpagingactivity.
Weimplementedamulti-threadedserverthatac-ceptsaURLanduseseithertheCocitationalgorithmortheCompanionalgorithmtofindpagesrelatedtothegivenURL.Ourserverimplementationconsistsofapproximately5500linesofcommentedCcode,ofwhichapproximately1000linesimplementtheCompanionalgorithm,400linesimplementtheCoc-itationalgorithm,andtheremainderaresharedcodetoperformtaskssuchasparsingHTTPqueryre-
quests,printingresults,andloggingstatusmessages.WelinkourservercodedirectlywiththeCon-nectivityServerlibrary,andaccesstheconnectivityinformationbymmappingthegraphinformationintotheaddressspaceofourserver.
OurimplementationoftheCompanionalgorithmhasbeensubjectedtoamoderateamountofper-formancetuning,mostlyindesigningtheneighbor-hoodgraphdatastructurestoimprovedata-cacheperformance.TheimplementationoftheCocitationalgorithmhasnotbeentunedextensively,althoughitdoesshareafairamountofcodewiththeCompan-ionalgorithm,andthissharedcodehasbeentunedsomewhat.4.Evaluation
Inthissectionwedescribetheevaluationweperformedforthealgorithms.Section4.1describesouruserstudy,whileSection4.2discussestheresultsofthestudy.Section4.3evaluatestheruntimeperformanceofouralgorithms.4.1.Experimentalsetup
Tocomparethedifferentapproaches,weper-formedauserstudy.Weasked18volunteerstosupplyuswithatleasttwoURLsforwhichtheywantedtofindrelatedpages.Ourvolunteersincluded14computerscienceprofessionals(mostlyourcol-leaguesatCompaq’sresearchlaboratories),aswellas4peoplewithotherprofessionalcareers.Were-ceived69URLsandusedeachofthealgorithmstodeterminethetop10answersforeachURL.Weputtheanswersinrandomorderandreturnedthemtothevolunteersforrating.Thevolunteerswereinstructedasfollows:
Table3
SummaryofallanswersforthealgorithmsAlgorithmNo.ofURLsNo.ofNo.ofwithanswersanswersdeadlinksCompanion5049842Cocitation5858062Netscape
40
3
29
J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–14791473
Wewanttomeasurehowwelleachalgorithmper-forms.TomeasureperformancewewanttoknowthepercentageofvaluableURLsreturnedbyeachalgo-rithm.TobevaluabletheURLmustbebothrelevanttothetopicyouareinterestedinandahighqualitypage.Forexample,ifyourURLwaswww.audi.comandyougetbackanewsgrouppostingwheresomebodytalksabouthisnewAudicar,thenthepagewasontopic,butprobablynothighquality.Ontheotherhand,ifyougetwww.jaguar.comasananswer,thenitisuptoyoutodecidewhetherthisanswerisontopicornot.Scoring:
0:Pagewasnotvaluable=useful1:Pagewasvaluable=useful
–:Pagecouldnotbeaccessed(i.e.didnotexist,orserverwasdown)Pleaseignoretheorderinwhichthepagesarereturned.Soifalaterpagecontainssimilarcontenttoanearlierpagepleaseratethelatterpageasifyouhadnotseentheearlierpage.Thiswillimplythatwedonotmeasurehow‘happy’youarewithasetofanswersreturnedbyanalgorithm.Insteadwemeasurehowmanyvaluableanswerseachalgorithmgives.”
4.2.Userstudyresults
WereceivedresponsesratingtheanswerURLsfor59oftheinputURLs.These59inputURLsformthebasisofourstudy.Table3showshowmanyofthesequeriesthealgorithmsansweredandhowmanyanswerURLstheyreturned.Inmanycases,thealgorithmsreturnedlinksthatourusersratedasinaccessible.ThecolumnlabeledNo.ofdeadlinksshowsthenumberofinaccessiblepagesamongalltheanswersforeachalgorithm.Forthepurposesofourevaluation,wetreataninaccessiblelinkasascoreof‘0’,sinceinaccessiblepagesarenotvaluable=useful.
TheCocitationalgorithmreturnedresultsforallbutoneoftheURLs.ThereasonwhyitreturnedresultsforalmostallinputURLsisthatwheninsuf-ficientconnectivitywasfoundsurroundinganinputURL(e.g.a.com/X/Y),theCocitationalgorithmusedachoppedURLasinput(e.g.a.com/X).Al-thoughwedidnotincludethischoppingfeatureinourimplementationoftheCompanionalgorithm,itisdirectlyapplicableandwouldenabletheCompan-ionalgorithmtoreturnanswersformoreURLs.WehaveempiricallyobservedthatNetscape’salgorithm
alsoappliesasimilarchoppingheuristicinsomecases.
Table4containsalistingofthe59URLsinourstudy.ForeachURL,thethreecolumnslabeledCp,Ct,andNshowtheURLsforwhichtheCompanion,Cocitation,andNetscapealgorithmsreturnedresults,respectively.ThetablealsoshowsthenumberofhyperlinkspointingtotheURLintheConnectivityServer(Inlinks).FortheCompanionalgorithm,itshowsthenumberofnodesandedgesinthevicinitygraph,aswellasthewallclocktimeinmillisecondstakentocomputethesetofrelatedpages(computedbysurroundingthecomputationwithgettimeof-daysystemcalls).Forthecocitationalgorithm,itshowsthenumberofsiblingsfound,thenumberofsiblingscocitedatleasttwice(Cocited),andthewallclocktimetakentocomputetheanswers.
Thethreealgorithmsreturnanswersfordiffer-entsubsetsofour59URLs.Tocomparethesealgorithms,wecansubdividetheURLsintosev-eralgroups.TheintersectiongroupconsistsofthoseURLswhereallalgorithmsreturnedatleastoneanswer.Therewere37URLsinthisgroup.Thenon-NetscapegroupconsistsoftheURLswhereNetscape’sapproachdidnotreturnanyanswers.Itconsistsof19URLs.
Toquantifytheperformanceofthethreealgo-rithms,wenowdefinetwometrics.Theprecisionatrforagivenalgorithmisthetotalnumberofanswersreceivinga‘1’scorewithinthefirstranswers,di-videdbyrtimesthenumberofqueryURLs.NoticethatwhenanalgorithmdoesnotgiveanyanswersforaURL,thisisasifitgaveallnon-relevantanswersforthatURL.
ForagivenURLu,theaverageprecisionforuofanalgorithmisthesumoftheprecisionateachrankwheretheanswerofthealgorithmforureceiveda‘1’scoredividedbythetotalnumberoftheanswersofthealgorithmforureceivinga‘1’score.Ifthealgorithmdoesnotreturnanyanswersforu,itsaverageprecisionforuis0.TheoverallaverageprecisionforanalgorithmisthesumofalltheaverageprecisionsforallthequeryURLsdividedbythetotalnumberofqueryURLs.
ForeachofthethreegroupsofURLs(all,in-tersection,andnon-Netscape),Table5showstheaverageprecisionandtheprecisionat10foreachalgorithm.Fig.1showstheprecisionatrforeachof
1474J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479
Table4
User-providedURLsforevaluation(andstatistics)URL
Cp
Ct
N
InCompanionalgorithmCocitationalgorithm1.babelfish.altavista.digital.com/cgi...2.developer.intel.com/design/strong/t...3.english-server.hss.cmu.edu/4.hack.box.sk/
5.ieor.berkeley.edu/¾hochbaum/html/bo...6.linas.org/linux/corba.html
7.members.tripod.com/¾src-fall-regatt...8.metroactive.com/music/
9.travelwithkids.miningco.com/10.www-db.stanford.edu/¾wiener11.www.acf.dhhs.gov/
12.www.adventurewest.com/pub/NASTC/13.www.amazon.com/exec/obidos/cache/br...14.www.anglia.ac.uk/¾systimk/Humour/Hi...15.www.ayso26.org/
16.www.babynames.com/17.www.braintumor.org/
18.www.bris.ac.uk/Depts/Union/BTS/Scri...19.www.carrier.com/
20.www.chesschampions.com/kasparov.htm...21.www.cl.cam.ac.uk/users/rja14/tamper...22.www.davecentral.com/
23.www.duofold.com/stepout/ski-wsa.htm24.www.ebay.com/25.www.etoys.com/26.www.fifa.com/27.www.focus.de/
28.www.geocities.com/Paris/Metro/1324/29.www.geocities.com/TheTropics/2442/d...30.www.harappa.com/har/har0.html
31.www.harmony-central.com/MIDI/Doc/tu...32.www.hotelres.com/
33.www.innovation.ch/java/HTTPClient/34.www.inquizit.com/35.www.israelidance.com/36.www.jewishmusic.com/37.www.joh.cam.ac.uk/38.www.levenger.com/
39.www.mdl.sandia.gov/micromachine/gal...40.www.midiweb.com/41.www.minimed.com/
42.www.mit.edu/people/mkgray/net/43.www.mot-sps.com/44.www.movielink.com/
45.www.netusa1.net/¾spost/bench.html46.www.nsc.com/catalog/sg708.html47.www.odci.gov/cia/publications/factb...48.www.paccc.com/
49.www.perl.com/perl/index.html50.www.pianospot.com/1700305.htm
ppppppppppppppppppppp
pppppppp
pppp
pppppppppppppppppppppppppppppppppppp
pppppppppppppppppppppppppppppppppppppppppppppppppppppppp
pp
p
links
Nodes292846382193333774419716667963512320280013201997442171822552213650000514252826952017791315059275511571197416909470300465843257922941281561057662488111924503823531184708307273341126240210399161716255725913670019675304217778258113839188312274
62051900376543521840914530
0
EdgesTime(ms)10305269479128263215140904887344222770030267105127183149673949240000169847811101763351084162567212390312562711391
26300866019961531111236034914039361138768519318862497603614863913954323931116610272320835100155493401663332124441638115
33793008322221625159240
0
SiblingsCocited66011079853966302063609914934915444138680015231685845912881630136591401521611494111032700182965512123791521837580535220452155434911481856659511314335110055214524208130139151846520237153272380821568541817340123239740911411162741436133701141573223628749920656221207944229277411496708196221391839
23
Time(ms)5731758451517487363147110042414262901676872165553772549342752252530252321196211516134378715408697254514863336137585
J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479
Table4(continued)URL
Cp
Ct
N
Inlinks
CompanionalgorithmNodes105312580296546073458490
Edges16366106201105608515650
Time(ms)6311014030420130
CocitationalgorithmSiblings8137210607209633552344961846
Cocited16126074241684118486516709
1475
Time(ms)16386216118147823010214
51.www.rei-outlet.com/
52.www.sddt.com/files/library/94headli...53.www.sultry.arts.su.edu.au/links/lin...54.www.telemarque.com/articles/andrnch...55.www.trane.com/56.www.traxxx.de/57.www.us-soccer.com/
58.www.wisdom.weizmann.ac.il/¾naor/puz...59.www.wwa.com/¾android7/pilot/index.h...
ppppppp
ppppppppp
pppp
2095409001219117580
thesegroupsofURLsingraphsa,bandc.Fig.1aandbillustratethattheCompanionandCocitationalgorithmssubstantiallyoutperformNetscape’sal-gorithmatallranks,andtheCompanionalgorithmalmostalwaysoutperformstheCocitationalgorithm.Theintersectiongroupisthemostinterestingcomparison,sinceitavoidspenalizinganalgorithmfornotreturningatleastoneanswer.Fortheinter-sectiongroup,Netscape’salgorithmachievesapreci-sionat10of0.357,whiletheCompanionalgorithmachievesaprecisionat10of0.501(40%better),andtheCocitationalgorithmachievesaprecisionat10of0.435(22%better).Theaverageprecisionintheintersectiongroupdoesnotpenalizeanalgorithmforreturningfewerthan10answers.Underthismet-ric,theCompanionalgorithmis32%betterthanNetscape’salgorithm,whiletheCocitationalgorithmis20%betterthanNetscape’salgorithm.
InthegroupthatincludesallURLs,allthreeal-gorithmshaddropsintheirprecisionat10values.Therearetworeasonsforthis.Thefirstisthatalgo-rithmsweregivenaprecisionof0foragivenURLiftheydidnotreturnanyanswers.ThismostlyaffectedtheNetscapeandCompanionalgorithms.Thesec-Table5
PrecisionmetricsforeachalgorithmforthreegroupsofURLsAlgorithm
All
Averageprecision
CompanionCocitationNetscape
0.5410.5180.343
Precisionat100.4170.3630.241
Intersection
ondreasonisthatfortheURLsinthenon-Netscapeset,boththeCompanionandCocitationalgorithmsdidnotperformaswellastheydidforURLsintheIntersectionset.Despitethesedropsinabsoluteaverageprecision,theaverageprecisionoftheCom-panionalgorithmis57%betterthanthatofNetscape,andtheaverageprecisionoftheCocitationalgorithmis51%betterthanthatofNetscape.Similarresultsholdwhenexaminingaverageprecisionratherthanprecisionat10.
Toevaluatethestatisticalsignificanceofourre-sults,wecomputedthesigntestandtheWilcoxsonsumsofrankstestforeachpairofalgorithms[18].TheseresultsareshowninTable6andshowthatthedifferencebetweentheCompanionandNetscapealgorithmsandbetweentheCocitationandNetscapealgorithmsarestatisticallysignificant.
WealsowantedtoevaluatewhetherornotthealgorithmsweregenerallyreturningthesameresultsforagivenURLorwhethertheywerereturninglargelydisjointsetsofURLs.Table7showstheamountofoverlapintheanswersreturnedbyeachpairofalgorithms.Thepercentageinparenthesesistheoverlapdividedbythetotalnumberofan-
Non-Netscape
Precisionat100.5010.4350.357
Averageprecision0.5400.434n=a
Precisionat100.4010.325n=a
Averageprecision0.6660.6050.502
1476J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479
1.00.8
CompanionCocitationNetscape
1.00.80.60.40.20.0
1.00.80.60.40.20.0
110110Precision0.60.40.20.0
110rrr
(a) All(b) Intersection
Fig.1.PrecisionatrforthethreegroupsofURLs.
(c) Non-Netscape
Table6
SigntestandWilcoxonsumofrankstestforalgorithmpairsAlgorithm
AllSign
CompanionbetterthanNetscapeCocitationbetterthanNetscapeCompanionbetterthanCocitation
<0.00010.01360.1922
Ranksum0.00260.010.38
IntersectionSign0.00410.16850.0793
Ranksum0.03400.23400.2628
Non-NetscapeSignn=an=a0.23
Ranksumn=an=a0.4180
Table7
OverlapbetweenanswersreturnedbyalgorithmsAlgorithmCompanionCocitationNetscape
Companion253(44%)55(15%)
Cocitation253(51%)56(15%)
Netscape55(11%)56(10%)
swersreturnedbythealgorithminthatrow.Asthetableshows,thereisalargeoverlapbetweentheanswersreturnedbytheCompanionandCocitationalgorithms.Thisisnotsurprising,sincethetwoalgorithmsarebothbasedonconnectivityinforma-tionsurroundingtheinputURLandsincebothusesimilarparameterstochoosethesurroundingnodes.ThereisrelativelylittleoverlapbetweentheanswersreturnedbyNetscapeandtheothertwoalgorithms.4.3.Run-timeperformance
Inthissection,wepresentdataabouttherun-timeperformanceoftheCompanionandCocitationalgorithms.SincewedonothavedirectaccesstoNetscape’salgorithmandonlyaccessitthroughthepublicWebinterface,weareunabletopresentper-
formanceinformationforNetscape’salgorithm.AllmeasurementswereperformedonaCompaqAl-phaServer4100with8GBofmainmemoryanddual466MHzAlphaprocessors.ThemeasuredrunningtimesarewallclocktimesfromthetimetheinputURLisgiventotheserveruntilthetimetheanswersarereturned.ThesetimesdonotincludethetimetakentoformattheresultsasanHTMLpage,sincethatwasdonebyaserverprocessrunningonanothermachine(andthetimetodothiswasnegligible).TheaveragerunningtimefortheCompanionalgorithmonthe50URLsforwhichitreturnedanswerswas109msec,whiletheaveragerunningtimefortheCocitationalgorithmonthe58URLsforwhichitprovidedanswerswas195msec.Theperformanceofboththesealgorithmsissufficientlyfastthateitheronecouldhandlealargeamountoftraffic(closeto800,000requestsperdayfortheCompanionalgorithm).Furthermore,theaverageperformancecouldprobablybeimprovedbycachinganswersforfrequentlyrequestedURLs.
Althoughwedidnotexplicitlyincludethisfactorinouruserstudy,wehaveinformallyobservedthatthesubjectivequalityofanswersreturnedforboththeCompanionandtheCocitationalgorithmsdoes
J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–14791477
Companion time (msec)40030020010000500010000Cocitation time (msec)600
400
200
(a)1500000200040006000(b)# of siblings
Graph edges
Fig.2.GraphsizeversusrunningtimeofCompanionandCocitationalgs.
notdecreasewhenwesomewhatdecreasetheparam-eterB(thenumberofinlinksconsidered)duringthebuildingofthevicinitygraph.Thisisimportantforon-lineservicesbecauseitmeansthatthegraphsizecouldbereducedduringtimesofhighload,therebyreducingtheamountoftimetakentoserviceeachrequest.Underconditionsoflowload,thegraphsizecouldbeincreased.
TheCompanionalgorithmgenerallyconvergesonitsanswerswithinafewiterations(typicallylessthan10iterations),butthenumberofiterationsincreaseswiththegraphsize.Eachiterationtakestimethatislinearinthenumberofedgesinthevicinitygraph.WeplottherunningtimeversusthenumberofgraphedgesinFig.2a.
TherunningtimeoftheCocitationalgorithmisO.nlogn/,wherenisthenumberofsiblingsexaminedforcocitation,sinceitsortsthesiblingsbythedegreeofcocitation.ThiseffectisillustratedinFig.2b.Inourexperience,therunningtimesforthecocitationandcompanionalgorithmsaregenerallycorrelated,sinceURLswhichhavealargenumberofsiblingstoconsiderinthecocitationalgorithmalsogenerallyproducealargeneighborhoodgraphforprocessinginthecompanionalgorithm.5.Relatedwork
Manyresearchershaveproposedschemesforus-ingthehyperlinkstructureoftheWeb[1,3,4,8,12,16,17,20,22,23].Forthemostpart,thisworkdoesnotdiscussthefindingofrelatedpages,withfourexceptionsdiscussedbelow.
Weknowofonlyonepreviousworkthatexploitstheorderoflinks:Chakrabartietal.[6]usethelinksandtheirordertocategorizeWebpagesandtheyshowthatthelinksthatarenearagivenlinkinpageorderfrequentlypointtopagesonthesametopic.PreviousauthorshavesuggestedusingcocitationandotherformsofconnectivitytoidentifyrelatedWebpages.Spertusobservedthatcocitationcanin-dicatethattwopagesarerelated[20].Thatis,ifpageApointstobothpagesBandC,thenBandCmightberelated.Variousresearchersinthefieldofbibliometricshavealsoobservedthis[9–11,19],andthisobservationformsthebasisofourCocitationalgorithm.Thenotionofcollaborativefiltering,al-thoughitisbasedonuser’srecommendationsratherthanhyperlinks,alsoreliesonthisobservation[21].PitkowandPirolli[16]clusterWebpagesbasedoncocitationanalysis.TerveenandHill[22]usetheconnectivitystructureoftheWebtofindgroupsofrelatedWebsites.
OurcompanionalgorithmdescendedfromtheHITSalgorithmdevelopedbyKleinberg[12].TheHITSalgorithmwasoriginallyproposedbyKlein-bergasawayofusingconnectivitystructuretoiden-tifythemostauthoritativesourcesofinformationonaparticulartopic,wherethetopicwasdefinedbythecombinedlinkstructureofalargenumberofstartingnodesonthetopic.Kleinbergalsoproposedthatthe
1478J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479
HITSalgorithmcouldbeusedtofindrelatedpageswhenthetopicwasdefinedbyjustasinglenode.TheCompanionalgorithmusedHITSalgorithmasastartingpointandextendedandmodifieditinfourmainways:
(1)Kleinbergsuggestedusingthefollowinggraph
tofindrelatedpages:Takeafixednumber(say200)ofparentsofthegivenURLandcallthesetconsistingoftheURLandtheseparentsthestartset.Nowbuildthegraphconsistingofallnodespointingtoanodeinthestartsetorpointedtobyanodeinthestartset.Thismeansthat‘grandparents’ofuareincludedinthegraph,whilenodesthatshareachildwithuarenotincludedinthegraph.Webelievethatthelatternodesaremorelikelytoberelatedtouthanarethe‘grandparent’nodes.Thereforeourvicinitygraphisstructuredtoexcludegrandparentnodesbuttoincludenodesthatshareachildwithu.(2)Weexploittheorderofthelinksonapage
todeterminewhich‘siblings’ofutoinclude.Whenweaddedthisfeature,theprecisionofouralgorithmimprovednoticably.
(3)TheoriginalHITSalgorithmdidnothaveedge
weights.Weuseedgeweightstoreducetheinfluenceofpagesthatallresideononehost,sinceBharatandHenzingerhaveshownthatedgeweightsimprovetheprecision[3].
(4)Wealsomergenodesinourvicinitygraphthat
havealargenumberofduplicatelinks.Dupli-catenodesarenotsuchaseriousproblemwhenusingtheHITSorimpalgorithmstorankqueryresults,sincethestartsetconsistsofalargenumberofURLs.However,whenformingthevicinitygraphstartingwithjustasingleURL,theinfluenceofduplicatenodesisincreasedbe-causeduplicatenodeswithalargenumberofoutlinkswillquicklydominatethehubandauthoritycomputation.
KleinbergalsoshowedthatHITScomputestheprincipaleigenvectorofthematrixAAT,whereAistheadjacencymatrixoftheabovegraph,andsug-gestedusingnon-principaleigenvectorsforfindingrelatedpages.Finally,hegaveanecdotalevidencethatHITSmightworkwell.
Consecutively,asequenceofpapers[5,7]pre-sentedimprovementsonHITSandusedittopopu-lateagivenhierarchyofcategories.
6.Conclusion
WehavepresentedtwodifferentalgorithmsforfindingrelatedpagesintheWorldWideWeb.TheysignificantlyoutperformNetscape’salgorithmforfindingrelatedpages.Thealgorithmscanbeimple-mentedefficientlyandaresuitableforusewithinalargescaleWebserviceprovidingarelatedpagesfeature.
OurtwoalgorithmscanbeextendedtohandlemorethanoneinputURL.Inthiscase,thealgo-rithmswouldcomputepagesthatarerelatedtoallinputURLs.Wearecurrentlyexploringtheseexten-sions.
Acknowledgements
ThisworkhasbenefitedgreatlyfromdiscussionswehavehadwithKrishnaBharat,AndreiBroder,PuneetKumar,andHannesMarais.WearealsoindebtedtoPuneetforhisworkontheConnec-tivityServer.Assomeoftheearliestusersoftheserver,Puneetansweredourmanyquestionsandimplementedmanyimprovementstotheserverinresponsetooursuggestions.WearealsogratefultoHannesMaraisfordevelopingWebL,aWebscript-inglanguage[13].UsingWebL,wewereabletoquicklydevelopaprototypeuserinterfaceforourrelatedpagesserver.KrishnaBharat,AllanHeydon,andHannesMaraisprovidedusefulfeedbackonear-lierdraftsofthispaper.Finally,wewouldliketothankalltheparticipantsinouruserstudy.
References
[1]G.O.Arocena,A.O.MendelzonandG.A.Mihaila,Appli-cationsofawebquerylanguage,in:Proc.oftheSixthInternationalWorldWideWebConference,pp.587–595,SantaClara,CA,April,1997.
[2]K.Bharat,A.Z.Broder,M.Henzinger,P.KumarandS.
Venkatasubramanian,Theconnectivityserver:fastaccesstolinkageinformationontheWeb,in:Proc.ofthe7thInternationalWorldWideWebConference,pp.469–477,Brisbane,Qld,April1998.
[3]K.BharatandM.Henzinger,Improvedalgorithmsfortopic
distillationinhyperlinkedenvironments,in:Proc.ofthe21stInternationalACMSIGIRConferenceonResearch
J.Dean,M.R.Henzinger/ComputerNetworks31(1999)1467–1479
1479
andDevelopmentinInformationRetrieval(SIGIR’98),pp.104–111,1998.
[4]
S.BrinandL.Page,Theanatomyofalarge-scalehyper-textualWebsearchengine,in:Proc.ofthe7thInternationalWorldWideWebConference,pp.107–117,Brisbane,Qld.,April1998.
[5]
S.Chakrabarti,B.Dom,D.Gibson,S.R.Kumar,P.Ragha-van,S.RajagopalanandA.Tomkins,Experimentsintopicdistillation,in:ACM–SIGIR’98Post-ConferenceWorkshoponHypertextInformationRetrievalfortheWeb,1998.
[6]
S.Chakrabarti,B.DomandP.Indyk,Enhancedhypertextcategorizationusinghyperlinks,in:Proc.oftheACMSIG-MODInternationalConferenceonManagementofData,pp.307–318,1998.
[7]
S.Chakrabarti,B.Dom,P.Raghavan,S.Rajagopalan,D.GibsonandJ.Kleinberg,Automaticresourcecompilationbyanalyzinghyperlinkstructureandassociatedtext,in:Proc.oftheSixthInternationalWorldWideWebConfer-ence,pp.65–74,SantaClara,CA,April1997.
[8]
J.CarriereandR.Kazman,WebQuery:Searchingandvisu-alizingthewebthroughconnectivity,in:Proc.oftheSixthInternationalWorldWideWebConference,pp.701–711,SantaClara,CA,April1998.
[9]E.Garfield,Citationanalysisasatoolinjournalevaluation,Science178(1972).
[10]E.Garfield,CitationIndexing,ISIPress,Philadelphia,PA,1979.
[11]M.M.Kessler,Bibliographiccouplingbetweenscientificpapers,AmericanDocumentation14(1963).
[12]
J.Kleinberg,Authoritativesourcesinahyperlinkedenvi-ronment,in:Proc.ofthe9thAnnualACM–SIAMSympo-siumonDiscreteAlgorithms,pp.668–677,January1998.[13]
T.KistlerandH.Marais,WebL—Aprogramminglan-guagefortheWeb,in:Proc.ofthe7thInternationalWorldWideWebConference,pp.259–270,Brisbane,Qld.,April1998.
[14]
NetscapeCommunicationsCorporation,‘What’sRelatedFAQ’webpage,http://home.netscape.com/escapes/related/faq.html
[15]NetscapeCommunicationsCorporation,‘What’sRelated’webpage,http://home.netscape.com/escapes/related/
[16]
J.PitkowandP.Pirolli,Life,death,andlawfulnessontheelectronicfrontier,in:Proc.oftheConferenceonHumanFactorsinComputingSystems(CHI97),pp.383–390,March1997.
[17]
P.Pirolli,J.PitkowandR.Rao,Silkfromasow’sear:
ExtractingusablestructuresfromtheWeb,in:Proc.oftheConferenceonHumanFactorsinComputingSystems(CHI96),pp.118–125,April1996.
[18]S.M.Ross,IntroductoryStatistics,McGraw-Hill,NewYork,1996.
[19]
H.Small,Co-citationinthescientificliterature:anewmea-sureoftherelationshipbetweentwodocuments,JournalAmericanSocietyInformationScience24(1973).
[20]
E.Spertus,ParaSite:MiningstructuralinformationontheWeb,in:Proc.oftheSixthInternationalWorldWideWebConference,pp.587–595,SantaClara,CA,April1997.[21]
U.ShardanandandP.Maes,Socialinformationfiltering:Algorithmsforautomating‘WordofMouth’,in:Proc.ofthe1995ConferenceonHumanFactorsinComputingSystems(CHI’95),1995.
[22]
L.TerveenandW.Hill,Findingandvisualizinginter-siteclangraphs,in:Proc.oftheConferenceonHumanFactorsinComputingSystems(CHI-98):MakingtheImpossiblePossible,pp.448–455,ACMPress,NewYork,April18–23,1998.
[23]
L.TerveenandW.Hill,Evaluatingemergentcollabora-tionontheWeb,in:Proc.ofACMCSCW’98ConferenceonComputer-SupportedCooperativeWork,pp.355–362,SocialFiltering,SocialInfluences,1998.
JeffreyDeanreceivedhisPh.D.fromtheUniversityofWashing-tonin1996,workingonefficientimplementationtechniquesforobject-orientedlanguagesunderProfessorCraigChambers.HejoinedCompaq’sWesternResearchLaboratoryin1996,whereheworkedonprofilingtechniques,performancemonitoringhard-ware,compileralgorithmsandinformationretrieval.InFebruary,1999,hejoinedmySimon,Inc.,whereheiscurrentlyworkingonscalablecomparisonshoppingsystemsfortheWorldWideWeb.HiscurrentresearchinterestsincludeinformationretrievalandthedevelopmentofscalablesystemsfortheWorldWideWeb.Heistwocontinentsshyofhisgoalofplayingbasketballoneverycontinent.
MonikaR.HenzingerreceivedherPh.D.fromPrincetonUni-versityin1993underthesupervisionofRobertE.Tarjan.Af-terwards,shewasanassistantprofessorinComputerScienceatCornellUniversity.ShejoinedtheDigitalSystemsResearchCenter(nowCompaqComputerCorporation’sSystemsResearchCenter)in1996.HercurrentresearchinterestsareinformationretrievalontheWorldWideWebandalgorithmicproblemsarisinginthiscontext.
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- huatuo0.cn 版权所有 湘ICP备2023017654号-2
违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务