Investigating the Performance of Automatic New TopicIdentification Across Multiple Datasets 1H. Cenk ÖzmutluIndustrial Engineering Department, Uludag University, Gorukle Kampusu, Bursa,TURKEY Tel: (++90-224) 442-8176 Fax: (++90-224) 442-8021hco@uludag.edu.trFatih CavdurIndustrial Engineering Department, Uludag University, Gorukle Kampusu, Bursa,TURKEY Tel: (++90-224) 442-8176 Fax: (++90-224) 442-8021fcavdur@uludag.edu.trAmanda SpinkSchool of Information Sciences, University of Pittsburgh, 610 IS Building, 135 NBellefield Ave, Pittsburgh, PA 15260 Tel: (412) 624-9454 Fax: (412) 8-7001aspink@sis.pitt.eduSeda Ozmutlu (corresponding author)Industrial Engineering Department, Uludag University, Gorukle Kampusu, Bursa,TURKEY Tel: (++90-224) 442-8176 Fax: (++90-224) 442-8021seda@uludag.edu.trRecent studies on automatic new topic identification in Web search engine usersessions demonstrated that neural networks are successful in automatic new topicidentification. However most of this work applied their new topic identificationalgorithms on data logs from a single search engine. In this study, we investigatewhether the application of neural networks for automatic new topic identificationare more successful on some search engines than others. Sample data logs fromthe Norwegian search engine FAST (currently owned by Overture) and Excite areused in this study. Findings of this study suggest that query logs with more topicshifts tend to provide more successful results on shift-based performancemeasures, whereas logs with more topic continuations tend to provide betterresults on continuation-based performance measures. Introduction and related researchAn important facet of Web mining is to study the behavior of search engine users. Onedimension of search engine user profiling is content-based behavior. Currently, searchengines are not designed to differentiate according to the user's profile and the content thatthe user is interested in. One of the main elements in following user topics is new topicidentification, which is discovering when the user has switched from one topic to anotherduring a single search session. If the search engine is aware that the user’s new query is onthe same topic as the previous query, the search engine could provide the results from thedocument cluster relevant to the previous query, or alternatively, if the user is on a new topic,the search engine could resort to searching other document clusters. Consequently, searchengines can decrease the time and effort required to process the query. Besides providingbetter results to the user, custom-tailored graphical user interfaces can be offered to theWeb search engine user, if topic changes were estimated correctly by the search engine(Ozmutlu, et al., 2003).Many researchers worked on large scaled studies on search engine datalogs, such asSilverstein et al.(1999), Cooley, Mobasher, & Srivastava (1999), Spink, et. al., (2000, 2001,2002a), Ozmutlu, et al. (2002b, 2003b, 2003c) and Ozmultu and Spink (2002). There arefew studies on query clustering and new topic identification, and the studies generallyanalyzed the queries semantically. Silverstein, et al.(1999), Jansen, et al.(2000), and Spink,et al. (2001) have performed content analysis of search engine data logs at the term level,and. Jansen, et al. (2000) and Spink, et al. (2001,2002a) at the conceptual or topical level.Ozmutlu, et al. (2004b) and Beitzel, et al. (2004) have done hourly statistical and topicalanalysis search engine query logs. Besides studies analyzing search engine queries for content information, another researcharea is developing query clustering models based on content information. Pu et al. (2002)developed an automatic classification methodology to classify search queries into broadsubject categories using subject taxonomies. Muresan and Harper (2004) propose a topicmodeling system for developing mediated queries. Beeferman and Berger (2000) and Wen,et al. (2002) applied query clustering that uses search engine query logs includingclickthrough data, which provides the documents that the user have selected as a result ofthe search query. Query similarities are proposed based on the common documents thatusers have selected.Another dimension of topic related Web searching is multitasking, which is defined as “theprocess of searches over time in relation to more than one, possibly evolving, set ofinformation problems (including changes or shifts in beliefs, cognitive, affective, and/orsituational states” in terms of information retrieval (Spink, et al, 2002b). Spink, et al.(2002b) and (Ozmutlu, et al, 2003a) have found that 11.4%-31.8% of search engine usersperformed multitasking searches.Most query clustering methods are focused on interpretation of keywords or understandingthe topic or the contents of the query, which complicates the process of query clustering andincreases the potential noise of the results of the study. One of the promising approaches isto use content-ignorant methodologies to the problem of query clustering or new topicidentification in a user search session. In such an approach, queries can be categorized indifferent topic groups with respect to their statistical characteristics, such as the timeintervals between subsequent queries or the reformulation of queries. Ozmutlu (2006)applied multiple factor regression to automatically identify topic changes, and showed thatthere is a valid relationship between non-semantic characteristics of user queries and topicshifts and continuations. Ozmutlu. (2006) showed that the non-semantic factors of timeinterval, search pattern and query position in the user session, as well as the search patternand time interval interaction, have a statistically significant effect on topic shifts. Theseresults provided statistical proof that Web users demonstrate a certain way of behavior whenthey are about to make topic shifts or continue on a topic, which is exacerbated when acertain combination of search pattern and time interval occurs. He, et al. (2002) proposed atopic identification algorithm that uses Dempster-Shafer Theory (1976) and geneticalgorithms. Their algorithm automatically identifies topic changes using statistical data fromWeb search logs. He et al. (2002) used the search pattern and duration of a query for newtopic identification. Their approach was replicated on Excite search engine data (Ozmutlu andCavdur, 2005a). Ozmutlu and Cavdur (2005b) and Ozmutlu, et al. (2004a) proposed anartificial neural network to automatically identify topic changes, and showed that neuralnetworks successfully provided new topic identification. Application of neural networks forautomatic new topic identification does not contain semantic analysis, and relies on thestatistical characteristics of the queries. In this study, we aim to apply the neural network to multiple datasets, and investigatewhether the application of neural networks for automatic new topic identification are moresuccessful on some search engines than others. We also aim to see whether there arespecific conditions, which affect the performance of neural networks in terms of automaticnew topic identification. To conduct this study, neural networks are tested on separatedatasets. In the next section, we provide a detailed discussion of the experimental framework andapplication of neural networks for automatic new topic identification. Finally, we provideresults of the proposed methodology and discussion of the results, and conclude the study.MethodologyThe datasetsThe first search query log used in this study comes from the Excite search engine, wascollected on December 20, 1999, and consists of 1,025,910 search queries. The first 10,003queries of the dataset were selected as a sample. The sample was not kept very large, sinceevaluation of the performance of the algorithm would require a human expert to go over allthe queries. The second dataset comes from the FAST search engine, and contains a query log of1,257,1 queries. Queries were collected on February 6, 2001. We selected a sample of10,007 queries by using Poisson sampling (Ozmutlu, et al., 2002a) to provide a sampledataset that is both representative of the data set and small enough to be analyzedconveniently. NotationThe notation used in this study is below: Nshift: Number of queries labeled as shifts by the neural network Ncontin : Number of queries labeled as continuation by the neural networkNtrue shift: Number of queries labeled as shifts by manual examination of human expertNtrue contin: Number of queries labeled as continuation by manual examination ofhuman expertNshift & correct : Number of queries labeled as shifts by the neural network and bymanual examination of human expertNcontin & correct: Number of queries labeled as continuation by the neural network andby manual examination of human expertType A error: This type of error occurs in situations where queries on same topics areconsidered as separate topic groups.Type B error: This type of error occurs in situations where queries on different topics aregrouped together into a single topic group.Some useful formulation related to the above notation is as follows: Ntrue shift = Nshift & correct + Type B errorNtrue contin = Ncontin & correct + Type A errorNshift = Nshift & correct + Type A errorNcontin = Ncontin & correct + Type B errorThe commonly used performance measures of Precision (P) and Recall (R) are used in this study to demonstrate the performance of the proposed neural network. The focus of P and Rare both on correctly estimating the number of topic shifts and continuations. Theformulation of these measures are as follows:(1)(2)(3)(4)Research DesignThe main research question in this study is whether automatic new topic identification isequally successfully performed on different datasets. To answer this research question,neural networks need to be tested on several datasets. For testing, the research design inTable 1 is proposed. For application of neural networks for automatic new topicidentification, datasets are divided to almost two equal parts, and the neural network istrained on the first half of the datasets. Then, using the information from training, the neuralnetwork is used to identify topic changes in the second half of the datasets. In this study, weinitially train and test the neural network on Excite and FAST datalogs. Additionally, theneural network trained on the Excite dataset is tested on the Excite and FAST datasets, andthe neural network trained on the FAST dataset is tested on FAST and Excite datasets. Thereason for a cross-application of the neural networks is that we would like to observe anyeffects that might come from the training dataset.Table 1: Research designTraining dataset Excite: Neural Network ATesting dataset Excite Testing dataset FAST Case 1 Case 3 Training dataset FAST: Neural Network B Case 4 Case 2 Proposed AlgorithmThe steps of the application of neural networks in this paper are explained in detail in thefollowing paragraphs. Evaluation by human expert: A human expert goes through the 10,003 query set forExcite and 10,007 query set for FAST and marks the actual topic changes and topiccontinuations. This step is necessary for training the neural network and also fortesting the performance of the neural network.Divide the data into two sets: For both datasets, approximately, first half of the data isused to train the data and the second half is used to test the performance of the neuralnetwork. The two data sections do not contain the same number of queries to keep theentirety of the user session containing the query in the middle of the datasets. The sizeof the datasets to train and test the neural network is seen in Table 2.Table 2: Size of the datasets used in the studySearch engine Entire dataset Sample set 1st half of the sample set used for training the NNExcite Fast 1,025,910 1,257,1 10,003 10,007 5014 queries 4997 queries 2nd half of the sample set used for training the NN49queries 5010 queries Identify search pattern and time interval of each query in the dataset: Each query in thedataset is categorized in terms of its search pattern and time interval. The time intervalis the difference of the arrival times of two consecutive queries. The classification ofthe search patterns is based on terms of the consecutive queries within a session. Thecategorization of time interval and search pattern is selected similar to those of(Ozmutlu and Cavdur, 2005a, 2005b), Ozmutlu (2006), Ozmutlu, et al. (2004a) to avoidany bias during comparison.We use seven categories of time intervals for a query: 0-5 min., 5-10 min., 10-15 min., 15-20min., 20-25 min., 25-30 min., 30+ min. See Table 3 for distribution of the queries withrespect to time interval. It should be noted that not all of 5014 queries in Excite and 4997queries in FAST can be used for training; the last query of each user session cannot beprocessed for pattern classification and time duration, since there are no subsequent queriesafter the last query of each session. In the training dataset for Excite, excluding the last queryof each session, the test dataset is reduced to 3813 queries from 5014 queries. In thetraining dataset for FAST, excluding the last query of each session, the test dataset isreduced to 4560 queries from 4997 queries. For the Excite dataset, after the human expertidentified the topic shifts and continuations, 3544 topic continuations and 269 topic shiftswere identified within the 3813 queries. For the FAST dataset, 4174 topic continuations and386 topic shifts were identified within the 4560 queries.Table 3: Distribution of time interval of queriesTime Interval (min) Excite continuations Excite shifts FAST continuations FAST Shifts 0-55-1010-15 15-20 20-25 25-30 3001 21877 18 3466 283 95 27 85 14 112 24 47 7 56 19 22 13 33 17 20 5 24 10 30+Total 151135 200 194 3544 269 4174 386 We also use seven categories of search patterns in this study, which are as follows:Unique (New): the second query has no common term compared to the first query.Next Page (Browsing): the second query requests another set of results on the firstquery. Generalization: all of the terms of second query are also included in the first query butthe first query has some additional terms. Specialization: all of the terms of the first query are also included in the second querybut the second query has some additional terms. Reformulation: some of the terms of the second query are also included in the firstquery but the first query has some other terms that are not included in the secondquery. This means that the user has added and deleted some terms of the first query.Also if the user enters the same terms of the first query in different order, it is alsoconsidered as reformulation.Relevance feedback: the second query has zero terms (empty) and it is generated bythe system when the user selects ‘‘related pages ’’.Others: If the second query does not fit any of the above categories, it is labeled asother. For details on the search patterns, see Ozmutlu and Cavdur (2005a, 2005b), Ozmutlu(2006), Ozmutlu, et al. (2004a). The search patterns are automatically identified by acomputer program. The pattern identification algorithm is adapted from He et al. (2002), butis considerably altered. The logic for the automatic search pattern identification can be foundin Figure 1. See Table 4 for distribution of queries with respect to search patterns in thetraining datasets. Table 4: Distribution of search pattern of queriesSearch Pattern Excite Intra-topic Excite Inter-topic FAST Intra-topic FAST Inter-topic BrowsingGeneralizationSpecilizationReformulationNew Relev. Feed. Other Total 2371 58 166 3270 0 0 1 3100 39 136 276 5 0 2 5 622 268 551 370 0 0 70 2 0 0 2 2 3544269 4174 386 Figure 1: Search pattern identification algorithmForming the neural network: In this study, we propose a feedforward neural networkwith three layers; an input layer, one hidden layer and an output layer. There are twoneurons in the input layer. One neuron corresponds to categories of search patternsand the other corresponds to the categories of time interval of queries. Each neuroncan get the value 1 through 7 according to its search pattern or time interval (Note thatthere are seven search pattern types and seven time intervals). The output layer hasonly one neuron, which can get the values 1 or 2, referring to a topic shift orcontinuation. The hidden layer has five neurons. The number of hidden layers and thenumber of neurons in each hidden layer are determined after a series of pilotexperiments.Figure 2: The structure of the proposed neural networkTraining the neural network: We obtain two neural networks by training the neuralnetwork with the first half of Excite and FAST datasets. See Table 1. The values for theinput layer, i.e the search pattern and time interval of the query, and output layer, i.e.the label of each query as topic shift or continuation, are provided to the neuralnetwork, so that it can train itself. The neural network trains the weights so that theoutput layer yields the correct label (the topic shift or continuation) as much aspossible. We used the software MATLAB to create and train the neural network.Applying the neural network to the test data sets: Using the information from training,the neural network is used to identify topic changes in the second half of the datasets.To be statistically reliable, each case is repeated 50 times. The output layer of theneural network design yields a result between 1 and 2 depending on the inputparameters. To conform with previous studies, we use a threshold value of 1.3. Anyvalue over 1.3 is considered as 2 (shift), and under 1.3 is considered as1(continuation).Comparison of results from human expert and the neural network: The results of theneural network tested on the FAST and Excite datasets are compared to the actualtopic shifts and identifications determined by the human expert. Correct and incorrectestimates of topic shift and continuation are marked and the statistics in the notationsection are calculated.Evaluation of results: The performance of the neural network is evaluated in terms ofprecision (P) and recall (R). Higher P and R values mean higher success in topicidentification. 3. Results and DiscussionIn this section, we present the results of the methodology described in the previous sectionand provide the discussion of the results. We present the results of the cases in Table 1.Case 1: Neural network A trained with the Excite dataset and tested with the Excitedataset: When the human expert evaluated the 10,003 query dataset, 7059 topic continuations and421 topic shifts were found. Eliminating the last query of each session leaves 7480 queriesto be included in the analysis. In the subset used for training (first half of the dataset (5014queries), there are 3544 topic continuations and 269 topic shifts, and in the second half ofthe dataset (49 queries), there are 3515 topic continuations and 152 topic shifts. Thestructure of the dataset in terms of number of sessions and queries included in the analysiscan be seen in Table 5. To be statistically reliable 50 replications of the neural network is made. The results of thefirst 10 runs of the 50 runs of the neural network is seen in Table 6. All the results could notbe provided due to space considerations.The results are also seen in Figures 3a, 4a, 5a and6a. The explanation of the figures can be provided as follows: For example in Run 1 in Table6, we observed that the neural network marked 3375 queries as topic continuation, whereasthe human expert identified 3515 queries as topic continuation. Similarly, the neuralnetwork marked 292 queries as topic shifts, whereas the human expert identified 152queries as topic shifts. 86 out of 152 topic shifts are identified correctly, yielding an Rshiftvalue of 0.57 for Run 1 (Fig. 4a) and 3309 out of 3515 topic continuations are identifiedcorrectly, yielding an Rcontin value of 0.94 (Fig. 6a). For the first run, these results show thattopic shifts and continuations were estimated somewhat correctly by the neural network. Onthe other hand, the neural network yielded 292 topic shifts, when actually there are 152topic shifts, giving a value of 0.30 for Pshift (Fig. 3a). This results means that the neuralnetwork overestimates the number of topic shifts. Since the previous studies gave greaterweight to identifying topic shifts, we kept the threshold value in the neural network as 1.3,therefore increased the probability of erring on the preferred side, hence overestimating thenumber of topic shifts. Changing the threshold value of the neural network is subject tofurther study. In terms of topic continuations Pcontin was 0.98 (Fig. 5a), 3309 topiccontinuations out of 3375 topic continuations were estimated correctly, i.e. almost all, but2%, of the topic continuations marked by the neural network were correct.Table 5: Topic shifts and continuations in the Excite and FAST datasets as evaluated byhuman expert Total numberof queriesshifts marked by Number of considered by the the humansessionsneural networkexpertNo. of queries Total no. of Total no. of continuations marked by thehuman expert1st half of dataset used for training2nd half of dataset used for testingEntire dataset 5014-Excite 1201-Excite 3813-Excite 4997 - Fast 437-Fast 4560-Fast 269-Excite 386-Fast 152-Excite 310-Fast 421-Excite 696-Fast 3544-Excite 4174-Fast 3515-Excite 4174-Fast 7059-Excite 8348-Fast 49-Excite 1322-Excite 3667-Excite 5010-Fast 526-Fast 4484-Fast 10003-Excite 2523-Excite 7480-Excite 10007-Fast 963-Fast 9044-Fast Table 6: Results of training the neural network on Excite and testing it on Excite - Case 1Total number Correctly Correctly of Type Type Number Number of Origin estimated estimated queries of topic B PshiftRshift PcontinRcontinAtopic of no. of no. error errorresults included shifts continuationsof.shiftscontinuationsin analysisHuman expert Neural Network NN-Run 1 NN- Run 2 NN- Run 3 NN- Run 4 NN- Run 5 NN- Run 6 NN- Run 7 3667 Ntrue Ntrue contin shift = = 3515 152 Nshift & Nshift Ncontin correct 3375 3353 3375 3307 3223 3355 3365 86 92 86 103 117 92 92 Ncontin & correct A ---Type Type B error error ---3667 Pshiftt Rshift PcontinRcontin3667 292 3667 314 3667 292 3667 360 3667 444 3667 312 3667 302 3309 206 66 0,295 0,566 0,980 0,941 3293 222 60 0,293 0,605 0,982 0,937 3309 206 66 0,295 0,566 0,980 0,941 3258 257 49 0,286 0,678 0,985 0,927 3188 327 35 0,2 0,770 0,9 0,907 3295 220 60 0,295 0,605 0,982 0,937 3305 210 60 0,305 0,605 0,982 0,940 NN- Run 8 NN- Run 9 NN- Run 10 3667 285 3667 446 3667 292 3382 3221 3375 86 117 86 3316 199 66 0,302 0,566 0,980 0,943 3186 329 35 0,262 0,770 0,9 0,906 3309 206 66 0,295 0,566 0,980 0,941 Figure 3a Pshift when Excite is the training datasetFigure 3b: Pshift when FAST is the training datasetFigure 4a: Rshift when Excite is the training datasetFigure 4b: Rshift when FAST is the training datasetFigure 5a: Pcontin when Excite is the training datasetFigure 5b: Pcontin when FAST is the training datasetFigure 6a: Rcontin when Excite is the training datasetFigure 6b: Rcontin when FAST is the training datasetCase 2: Neural network B trained with the FAST dataset and tested with the FASTdataset: Out of 9044 queries, 8348 topic continuations and 696 topic shifts were found. In the subsetused for training (4997 queries), there were 427 user sessions, thus 4560 queries of the firsthalf of the dataset are used for training the neural network. Out of 4560 queries, there are4174 topic continuations and 386 topic shifts. In the second half of the dataset, there were5010 queries and 526 user sessions. Eliminating the last query of each session leaves 4484queries to be included in the analysis. Out of 4484 queries, 4174 were topic continuations,whereas 310 were topic shifts. The results of the evaluation of the human expert can beseen in Table 5. After training the neural network with the first half of the Excite dataset andrunning it on the second half of the FAST dataset, we obtain the results in Table 7. Theresults of the first 20 runs of the 10 runs of the neural network is seen in Table 7. All theresults could not be provided due to space considerations. In Run 1, we observe that theneural network marked 4069 queries as topic continuation, whereas the human expertidentified 4174 queries as topic continuation. Similarly, the neural network marked 415queries as topic shifts, whereas the human expert identified 310 queries as topic shifts,yielding an Rshift value of 0.59 (Fig. 4b). In addition, 3942 topic continuations out of 4174continuations were estimated correctly, yielding a Rcontin value of 0.944 (Fig. 6b). Theseresults denote a high level of estimation of topic shifts and continuations. On the other hand,the neural network yielded 415 topic shifts, when actually there are 310 topic shifts, giving avalue of 0,441 for Pshift (Fig. 3b). This results means that the neural network overestimatesthe number of topic shifts. This result could be due to the assumption stated in the previoussection, i.e. giving greater weight to identifying topic shifts. In terms of topic continuationsPcontin was 0,969 (Fig. 5b) , 3942 topic continuations out of 4069 topic continuations wereestimated correctly, i.e. almost all topic continuations marked by the neural network werecorrect.Table 7: Results of training the neural network on FAST and testing it on FAST - Case 2Total number Correctly Correctly of Type Type Number Number of Origin estimated estimated queries of topic A B Pshift Rshift PcontinRcontintopic of no.of no. of errorerrorresultsincluded shiftscontinuationscontinuationsshiftsin analysisHuman expert Neural Network NN-Run 1 NN- Run 2 NN- Run 3 NN- Run 4 NN- Run 5 NN- Run 6 NN- Run 7 NN- Run 8 4484 Ntrue Ntrue contin shift = = 4174 310 Nshift & Nshift Ncontin correct 4069 4083 4007 4039 4046 4081 4073 4080 183 183 207 185 203 183 183 183 Ncontin & correct A ---Type Type B Pshift Rshift PcontinRcontinerror error ---4484 4484 415 4484 401 4484 477 4484 445 4484 438 4484 403 4484 411 4484 404 3942 232 127 0,441 0,590 0,969 0,944 3956 218 127 0,456 0,590 0,969 0,948 3904 270 103 0,434 0,668 0,974 0,935 3914 260 125 0,416 0,597 0,969 0,938 3939 235 107 0,4 0,655 0,974 0,944 3954 220 127 0,454 0,590 0,969 0,947 3946 228 127 0,445 0,590 0,969 0,945 3953 221 127 0,453 0,590 0,969 0,947 NN- Run 9 NN-Run 10 4484 477 4484 400 4007 4084 207 184 3904 270 103 0,434 0,668 0,974 0,935 3958 216 126 0,460 0,594 0,969 0,948 Case 3: Neural network A trained with the Excite dataset and tested with the FASTdataset: The number of topic shifts and continuations as evaluated by the human expert and thestructure of the datasets are given in Table 5. After training the neural network with the firsthalf of the Excite dataset and running it on the second half of the FAST dataset, we obtainthe results in Table 8. The results of the first 10 runs of the 50 runs of the neural network areseen in Table 8. All the results could not be provided due to space considerations. Forcomparison, we also include the results on the second half of the dataset as evaluated by thehuman expert. In run 1, we observe that the neural network marked 4114 queries as topiccontinuation, whereas the human expert identified 4174 queries as topic continuation.Similarly, the neural network marked 370 queries as topic shifts, whereas the human expertidentified 310 queries as topic shifts. 3975 topic continuations out of 4174 continuationswere estimated correctly, yielding a Rcontin value of 0.952 (Fig. 6a). 171 topic shifts out of310 were also estimated correctly, giving an Rshift value of 0.552 (Fig. 4a). The neuralnetwork overestimates the number of topic shifts (370 instead of 310). The potential reasonfor this result was explained in the previous paragraphs. In terms of topic continuationsPcontin was 0.966 (Fig. 5a), 3975 topic continuations out of 4114 topic continuations wereestimated correctly, i.e. almost all the topic continuations marked by the neural networkwere correct. Case 4: Neural network B trained with the FAST dataset and tested with the Excitedataset: The number of topic shifts and continuations as evaluated by the human expert are given inTable 5. After training the neural network with the first half of the FAST dataset and runningit on the second half of the Excite dataset, we obtain the results in Table 9. The results of thefirst 10 runs of the 50 runs of the neural network are seen in Table 9. All the results couldnot be provided due to space considerations. For comparison, we also include the results onthe second half of the dataset as evaluated by the human expert. The results in Run 1 arediscussed as follows: We observe that the neural network marked 3348 queries as topiccontinuation, whereas the human expert identified 3515 queries as topic continuation.Similarly, the neural network marked 319 queries as topic shifts, whereas the human expertidentified 152 queries as topic shifts. During the topic identification process, we observed227 Type A errors and 60 Type B errors. Using the neural network approach, in Run 1, 92 outof 152 topic shifts are identified correctly, yielding an Rshift value of 0.605 (Fig. 4b) and3288 out of 3515 topic continuations are identified correctly, yielding an Rcontin value of 0.935 (Fig. 6b). For the first run, these results show that of the topic shifts and continuationswere estimated somewhat correctly by the neural network. On the other hand, the neuralnetwork yielded 319 topic shifts, when actually there are 152 topic shifts, giving a value of0.29 for Pshift (Fig. 3b). This results means that the neural network overestimates thenumber of topic shifts. This result could be due to the assumption stated in the previoussections, i.e. giving greater weight to identifying topic shifts by using a threshold value of 1.3in the neural network. In terms of topic continuations Pcontin was 0.982 (Fig. 5b), 3288 topiccontinuations out of 3348 topic continuations were estimated correctly, i.e. almost all, but2%, of the topic continuations marked by the neural network were correct.DiscussionIn Figures 3, 4, 5 and 6, we see the effects of the training datasets on Pshift, Rshift, Pcontin,and Rcontin, respectively. Figure 3 shows that regardless of the training dataset, the FASTdataset seems to yield better values of Pshift. Figures 4 and 6 show that, in terms of Rshiftand Rcontin, both datasets are equally successful. Figure 5 demonstrates that the Excitedataset bears better results in terms of Pcontin compared to the FAST dataset. Consequently,this paper’s findings might indicate that the application of neural networks on differentsearch engines does not provide the same results.Generally, the FAST dataset tends to produce better results on shift based measures,whereas the Excite dataset tends to yield more favorable results in terms of continuationbased measures. During the presentation of the results for the cases, we noted that theneural network usually overestimated the number of topic shifts. This result is probably dueto keeping the threshold value in the neural network as 1.3, to be consistent with previousstudies. The previous studies gave priority to identifying topic shifts, and the probability oferring on the preferred side increases when the threshold is set to 1.3. This choice causesoverestimating the number of topic shifts. Since the neural networks overestimate thenumber of topic shifts, the dataset which has more topic shifts, would be expected to bemore successful in terms of shift based measures. The FAST dataset has more topic shiftscompared to the Excite dataset, as seen in Table 5. Similarly, it can be deduced that theExcite dataset is more successful in terms of continuation based performance measures,since it has more topic continuations. Since the neural network seems to be biased towardsidentifying topic shifts, it would be expected to modify the parameters of the neural networkto remove the bias. The parameter, which might cause the shift-directed bias is the thresholdof the neural network. Testing neural networks with different threshold values is a subject offurther research. Table 8: Results of training the neural network on Excite and testing it on FAST- Case 3Total no. of Correctly Correctly Type Type Number Number of queries estimated estimated A B PshiftRshiftPcontinRcontinof topic topic included no. of no. of error errorshiftscontinuationsresults in contin.s shiftsanalysisOrigin of Human expertNeural NetworkNN-Run 1 NN- Run 2 NN- Run 3 NN- Run 4 NN- Run 5 NN- Run 6 NN- Run 7 NN- Run 8 NN- Run 9 NN- Run 10 Ntrue 4484 shift = Ntrue contin = 4174 ---Type Type ---310 4484 Nshift & Ncontin & Nshift Ncontin correct correct 4114 4069 4114 4001 33 4080 4081 4111 33 4114 A B Pshift Rshift PcontinRcontinerror error 4484 370 4484 415 4484 370 4484 483 4484 591 4484 404 4484 403 4484 373 4484 591 4484 370 171 3975 199 139 0,462 0,552 0,966 0,952 183 3942 232 127 0,441 0,590 0,969 0,944 171 3975 199 139 0,462 0,552 0,966 0,952 207 38 276 103 0,429 0,668 0,974 0,934 230 3813 361 80 0,3 0,742 0,979 0,914 183 3953 221 127 0,453 0,590 0,969 0,947 183 3954 220 127 0,454 0,590 0,969 0,947 172 3973 201 138 0,461 0,555 0,966 0,952 230 3813 361 80 0,3 0,742 0,979 0,914 171 3975 199 139 0,462 0,552 0,966 0,952 Table 9: Results of training the neural network on FAST and testing it on Excite - Case 4Total no. of queries Correctly Correctly estimated estimated no. of no. of shiftscontin.sOrigin Number Number of Type Type A B PshiftRshiftPcontinRcontinerrorerrorof topic topic of included shiftscontinuations results in analysisHuman expert Ntrue 3667 shift = 152 Ntrue contin = 3515 ---Type Type A B ---Neural 3667 Network NN-Run 1 NN- Run 2 NN- Run 3 NN- Run 4 NN- Run 5 NN- Run 6 NN- Run 7 NN- Run 8 NN- Run 9 NN- Run 10 Nshift & Ncontin & Nshift Ncontin correct correct 3348 3361 3320 3321 3347 3365 3356 3353 3320 3368 92 92 Pshift Rshift PcontinRcontinerror error 3667 319 3667 306 3667 347 3667 346 3667 320 3667 302 3667 311 3667 314 3667 347 3667 299 3288 227 60 0,288 0,605 0,982 0,935 3301 214 60 0,301 0,605 0,982 0,939 103 3271 244 49 0,297 0,678 0,985 0,931 92 3261 254 60 0,266 0,605 0,982 0,928 102 3297 218 50 0,319 0,671 0,985 0,938 92 92 92 3305 210 60 0,305 0,605 0,982 0,940 3296 219 60 0,296 0,605 0,982 0,938 3293 222 60 0,293 0,605 0,982 0,937 103 3271 244 49 0,297 0,678 0,985 0,931 92 3308 207 60 0,308 0,605 0,982 0,941 An additional result that the figures support is that the choice of training dataset did notaffect the performance of automatic new topic identification. In all the figures, had thetraining dataset been effective on the performance of the neural network, the linecorresponding to the training network would have shown superior results. Even though thetraining dataset is different in Figures 3a and 3b, FAST did better in terms of Pshift. Had thetraining dataset been effective, Excite should have done better, when Excite was the trainingdataset, and FAST should have done better when FAST was the training dataset. However,this is not the case. The same comment applies to all the performance measures covered inFigures 3 through 6. ConclusionThis study shows that neural networks can be successfully applied for automatic new topicidentification. The search query logs used in this study comes from two search engines; theExcite and FAST search engines. Samples of approximately 10,000 queries were selectedfrom both datasets. Two neural networks were trained with approximately half the data sets.The neural network trained on the Excite dataset was tested on both the Excite and FASTdatasets, and the neural network trained on the FAST dataset was tested on both the Exciteand FAST datasets. The results were compared to those of a human expert.In all the cases considered, topic shifts and continuations were estimated successfully.However, the performance of the neural network changed with respect to the performancemeasure and the test dataset that is used. Shift-based performance measures tend to havebetter values with datasets having more shifts and continuation-based performancemeasures tend to acquire better values with datasets having more continuations. To have amore consistent performance of automatic new topic identification with neural network,enhancing and refinement of the neural network structure and parameters could benecessary, such as changing threshold values of the neural network.The findings of this study also indicate that the estimation power of the neural network isindependent of the training dataset for the neural network. Conclusively, no matter whichtraining dataset is used, the application results of the neural network were successful. Basedon these indications, further studies should be performed with more datasets to validatethese findings. Notes1This research has been funded by TUBITAK, Turkey and is a National Young Researchers CareerDevelopment Project 2005: Fund Number: 105M320: “Application of Web Mining and IndustrialEngineering Techniques in the Design of New Generation Intelligent Information RetrievalSystems”.BackReferencesBeeferman, D. & Berger, A. (2000)Agglomerative clustering of a search engine querylogProceedings of the sixth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, Boston, MA407 -416Beitzel, S.M., Jensen, E.C., Chowdhury, A., Grossman, D. & Frieder, O. (2004)Efficiency andScaling: Hourly Analysis of a Very Large Topically Categorized Web Query LogProc. of the 27th Inter. Conf. on Research and Development in Information Retrieval, Sheffield,UK321-328Cooley, R., Mobasher, B., & Srivastava, J. (1999)Data preparation for mining world wideweb browsing patternsKnowledge and Information Systems1, 5-32He, D., Goker, A. & Harper, D.J. (2002)Combining evidence for automatic Web sessionidentificationInformation Processing and Management38 (5), 727-742Jansen, B.J., Spink, A. & Saracevic, T. (2000)Real life, real users, and real needs: a studyand analysis of user queries on the webInformation Processing and Management36,207-227Muresan, G. & Harper, D.J. (2004)Topic Modeling for Mediated Access to Very LargeDocument CollectionsJournal of the American Society for Information Science andTechnology55(10), pp. 2-910Ozmultu, H.C. & Spink, A., (2002)Characteristics of question format web queries: anexploratory studyInformation Processing & Management38, 453-471Ozmutlu, S. (2006)Automatic new topic identification using multiple linearregressionInformation Processing and Management42, 934-950Ozmutlu, H.C. & Cavdur, F. (2005a)Application of automatic topic identification on exciteweb search engine data logsInformation Processing and Management41(5), 1243-1262Ozmutlu, H.C. & Cavdur, F. (2005b)Neural network applications for automatic new topicidentificationOnline Information Review29, 35-53Ozmutlu, H.C., Cavdur, F., Ozmutlu, S. & Spink, A. (2004a)Neural Network Applications forAutomatic New Topic Identification on Excite Web search engine datalogsProceedings ofASIST 2004, Providence, RI310-316Ozmutlu, S., Ozmutlu, H. C. & Spink, (2002b)Multimedia Web searchingASIST 2002: Proceedings of the 65th American Society of Information Science and Technology AnnualMeeting, Philadephia403-408Ozmutlu, S., Ozmutlu, H.C. & Spink, A. (2003a)Multitasking Web searching andimplications for designProceedings of ASIST 2003, Long Beach, CA416-421Ozmutlu, S., Ozmutlu, H. C., & Spink, A., (2003b)Are people asking questions of generalweb search enginesOnline Information Review 27, 396-406Ozmutlu, S., Spink, A., & Ozmutlu, H. C. (2003c)Trends in multimedia web searching:1997-2001Information Processing and Management39, 611-621Ozmutlu, S., Ozmutlu, H.C. & Spink, A. (2004b)A day in the life of Web searching: anexploratory studyInformation Processing and Management40, 319-345Ozmutlu, S., Spink, A. & Ozmutlu, H.C. (2002a)Analysis of large data logs: an applicationof Poisson sampling on excite web queriesInformation Processing and Management38, 473-490Pu, H.T., Chuang, S-L. & Yang, C. (2002)Subject Categorization of Query Terms forExploring Web Users’ Search InterestsJournal of the American Society for InformationScience and Technology53(8), 617-630Shafer,G. (1976)A mathematical theory of evidencePrinceton University Press, Princeton,NJ, 1976 Silverstein, C., Henzinger, M., Marais, H. & Moricz, M. (1999)Analysis of a very large Websearch engine query logACM SIGIR Forum33(1), 6-12Spink, A., Jansen, B.J. & Ozmultu, H.C. (2000)Use of query reformulation and relevancefeedback by Excite users Internet Research: Electronic Networking Applications andPolicy10, 317-328.Spink, A., Jansen, B.J., Wolfram, D. & Saracevic, T. (2002a)From e-sex to e-commerce:Web search changesIEEE Computer35(3), pp. 133-135Spink, A., Ozmutlu, H.C. & Ozmutlu, S. (2002b)Multitasking information seeking andsearching processesJournal of the American Society for Information Science andTechnology53(8), 639-652Spink, A., Wolfram, D., Jansen, B.J. & Saracevic, T., (2001)Searching the Web: The publicand their queriesJournal of the American Society for Information Science andTechnology53(2), 226-234Wen, J.R. , Nie, J.Y. & Zhang, H.J. (2002)Query Clustering Using User LogsACM Transactions on Information Systems20(1), 59-81