
Cím:
Studies in Theory of Information RetrievalKiadó:
Foundation for Information SocietyKiadás éve:
2007. októberOldalak száma:
293 
 Cím:
 On Document Populations and Measures of IR Effectiveness ion
 Írta:
 Stephen Robertson
 Oldalak száma:
 922
 Work on the statistical validity of experimental results in retrieval tests has concentrated on treating the topics as a sample from a population, but regarding the collection of documents as fixed. This paper raises the argument that we should also consider the documents as having been sampled from a population. It follows that we should regard a pertopic measurement as also having a pertopic noise or error associated with it, which may depend critically on the number of relevant documents for that topic. Some of the common measures used in retrieval testing are reexamined from this point of view. The examination is essentially theoretical, supported by limited simulation experiments.

 Cím:
 EffortPrecision and GainRecall Based on a Probabilistic User Navigation Model
 Írta:
 Gabriella Kazai, Benjamin Piwowarski, Stephen Robertson
 Oldalak száma:
 2336
 Traditional evaluation of information retrieval (IR) systems is based on implicit assumptions about the users’ interaction with the system. It is assumed that the user is presented with a ranked list of results and examines the returned documents one after the other in the order they are listed. In this paper we argue that such a model is obsolete in the case of the Web and structured document retrieval, where navigation is an integral part of the user’s search strategy. We advocate that postquery navigation needs to be reflected in the evaluation framework. We substantiate our proposal with evidence of postquery navigation from user studies and discuss examples of systems that have been developed with the consideration of the user’s browsing behaviour. In order to capture retrieval effectiveness for query and navigation based search, we introduce a measure of retrieval effectiveness that comprises a probabilistic model of the users’ postquery navigation.

 Cím:
 In Search of a Better BET
 Írta:
 MattMouley Bouamrane, Saturnino Luz
 Oldalak száma:
 3750
 In recent years, the study of multimodal meetings has attracted considerable interest among a variety of research communities. While there remain many challenges in terms of identifying, retrieving and representing information of interest in multimodal meeting repository, one aspect which has often been neglected is the development of a widely adopted meeting browsing evaluation methodology and shared performance metrics. The goal of such a methodology would be to permit the evaluation of meeting browsing systems, regardless of the composition of multimodal meeting corpora or browsing modes and enable comparisons between various systems. Recently, a taskoriented information retrieval experiment (BET) has been proposed to tackle some of these issues. In this paper, we propose to complement the emerging BET evaluation framework with novel performance metrics. These performance metrics aim to measure how efficiently a meeting browsing system support users during an information retrieval task. In addition, they need to be both instructive of users’ browsing behaviour while remaining general enough to be applied to any type of browsing systems.

 Cím:
 The Effect of Modifications of TermFrequency Based Features on Collection Properties
 Írta:
 Vishwa Vinay
 Oldalak száma:
 5164
 The Vector Space Model has a proven track record of success in IR research. Efforts have concentrated on defining suitable similarity metrics (e.g. cosine dot product) and weighting schemes (variations of tfidf) that achieve high retrieval effectiveness (with metrics like mean average precision). The success of a collection representation scheme is measured based on this retrieval effectiveness. How these design choices affect interdocument and querydocument relationships, in terms of the collection’s geometric properties, has been ignored. This paper is an empirical investigation of a number of tfidf variations, contrasting retrieval effectiveness with collection properties.

 Cím:
 Fairly Retrieving Documents of All Lengths
 Írta:
 Leif Azzopardi, David E. Losada
 Oldalak száma:
 6576
 Normalizing document length is widely recognized as an important factor for adjusting retrieval systems. Previous studies have shown that tuning the retrieval model so that the lengths of retrieved documents are similar to the lengths of relevant documents will result in substantially better performance. However, the goal of Document Length Normalization is to “fairly” retrieve documents of all lengths. In this paper, we consider this proposition against the previous findings in the context of the Language Modeling approach for ad hoc information retrieval, and study the impact of the smoothing method and parameter setting on the length of documents retrieved. Our study confirms that tuning the system to fairly retrieve documents results in mediocre performance, whereas tuning to favor relevant (longer) documents delivers superior performance. While this reconfirms previous findings, we discover that this discrepancy appears to stem from the fact that relevant documents are drawn from a biased sample, the set of assessed documents which are substantially longer than documents in the collection.

 Cím:
 Integrating Conceptual Knowledge Into Relevance Models
 Írta:
 Edgar Meij and Maarten de Rijke
 Oldalak száma:
 7784
 We address the issue of combining explicit background knowledge with pseudorelevance feedback from within a document collection. To this end, we use documentlevel annotations in tandem with generative language models to generate terms from pseudorelevant documents and bias the probability estimates of expansion terms in a principled manner. By applying the knowledge inherent in document annotations, we aim to control query drift and reap the benefits of automatic query expansion in terms of recall without losing precision. We consider the parameters which are associated with our modeling and describe ways of estimating these automatically. We then evaluate our modeling and estimation methods on two test collections, both provided by the TREC Genomics track.

 Cím:
 Construction of Compact Retrieval Models
 Írta:
 Benno Stein, Martin Potthast
 Oldalak száma:
 8594
 In similarity search we are given a query document dq and a document collection D, and the task is to retrieve from D the most similar documents with respect to dq. For this task the vector space model, which represents a document d as a vector d, is a common starting point. Due to the high dimensionality of d the similarity search cannot be accelerated with space or datapartitioning indexes; de facto, they are outperformed by a simple linear scan of the entire collection (Weber et al., 1998). In this paper we investigate the construction of compact, lowdimensional retrieval models and present them in a unified framework. Compact retrieval models can take two fundamentally different forms: (1) As ngram vectors, comparable to vector space models having a small feature set. They accelerate the linear scan of a collection while maintaining the retrieval quality as far as possible. (2) As socalled document fingerprints. Fingerprinting opens the door for sublinear retrieval time, but comes at the price of reduced precision and incomplete recall. We uncover the two—diametrically opposed—paradigms for the construction of compact retrieval models and explain their rationale. The presented framework is comprehensive in that it integrates all wellknown construction approaches for compact retrieval models developed so far. It is unifying since it identifies, quantifies, and discusses the commonalities among these approaches. Finally, based on a largescale study, we provide for the first time a “compact retrieval model landscape”, which shows the applicability of the different kinds of compact retrieval models in terms of the rank correlation of the achieved retrieval results.

 Cím:
 Clustered Multidimensional Scaling for Exploration in IR
 Írta:
 Enikő Székely, Éric Bruno, Stéphane MarchandMaillet
 Oldalak száma:
 95104
 The data that needs to be processed nowadays is frequently represented in highdimensional spaces with the dimension given by the number of features selected. There is a gap between human perception of lowdimensional spaces and the behaviour of distances within highdimensional spaces. In data analysis the phenomenon of “curse of dimensionality” has consequences on the (dis)similarity matrices because the points become equidistant. In such a situation, methods for dimensionality reduction fail to reveal in the lowdimensional projected space structures existing in the data. We therefore propose in this article a clustered multidimensional scaling method for the discovery and understanding of data structures in view of exploration. Firstly, the data is clustered in the original space based on the closest k neighbours of each point which results in a disconnected graph. Secondly, an MDS is performed on each of the graph components. And finally the clusters’ representatives are projected in the reduced space by means of an MDS in order to preserve the distances between clusters from the original space.

 Cím:
 A Framework for Adaptive IR
 Írta:
 AxelCyrille Ngonga Ngomo, Hans Friedrich Witschel
 Oldalak száma:
 105114
 This paper introduces extended MultiLevel Association Graphs (short eMLAGs), a metamodel for adaptive information retrieval. The goals of this framework are twofold: First, it subsumes existing techniques for adaptive information retrieval, easing their comparison and merging. Second, it aims at being a tool for stimulating the discovery of new learnworthy relations. The paper shows how prominent models can be represented as eMLAGs and presents some unexplored relations in the adaptive retrieval setting.

 Cím:
 Reconsidering the Fundamentals of Measurement Discrimination Information
 Írta:
 D. Cai, C.J. van Rijsbergen
 Oldalak száma:
 115124
 Measurement of Discrimination Information (MDI) of terms is a fundamental issue and a persistent them for Information Retrieval (IR) and many areas of science. In this study, we reconsider MDI, and present an indepth discussion into the concept of discrimination information conveyed by a term. The discussion is based on three information measures, which have widely been used in many IR applications. In particular, we formally interpret discrimination measures in terms of simple but important properties, and argue that the interpretations are essential for guiding the application of the discrimination measures. The intuitive notion of relatedness between terms and a given query is introduced in general, and relatedness measures are expressed according to the discrimination measures. The relatedness measures can be used to identify closely related terms for modifying queries provided by users of IR systems. Some potential problems applying the relatedness measures are highlighted and solutions are suggested for their proper use.

 Cím:
 Explicitly Considering Relevance Within The LM Framework
 Írta:
 Leif Azzopardi, Thomas Rölleke
 Oldalak száma:
 125134
 Whilst the event of relevance is central to the Binary Independence Retrieval model, Language Modeling focuses on the estimation of the document model. In this paper, we review the different past formulations of the Language Modeling (query likelihood) approach. We find that these previous formulations largely ignore relevance by making implicit or explicit assumptions. The main contribution of this work is an alternative formulation that specifically relates relevance and language modeling in a sound probabilistic framework. This leads to valuable insights into the application of Language Modeling to Information Retrieval, including how the approach handles relevance information and how the approach can be further developed.

 Cím:
 On The Holistic Cognitive Theory for IR
 Írta:
 Peter Ingwersen, Kalervo Järvelin
 Oldalak száma:
 135148
 The paper demonstrates how the Laboratory Research Framework fits into the holistic Cognitive Framework for IR. It first discusses the Laboratory Framework with emphasis on its underlying assumptions and known limitations. This is followed by a view of interaction and relevance phenomena associated with IR evaluation and central to the understanding of IR. The ensuing section outlines how interactive IR is viewed from a Cognitive Framework, and ‘light’ interactive IR experiments are suggested performed by drawing on the latter framework’s contextual possibilities. These include independent variables drawn from a collection, matching principles in a retrieval system, and the searcher’s situation and task context. The paper ends with concluding points of summarization of issues encountered.

 Cím:
 Representing Word Semantics for IR by Continuous Functions
 Írta:
 Péter Wittek, Sándor Darányi
 Oldalak száma:
 149156
 Information representation is an important but neglected aspect of building text information retrieval models. In order to be efficient, the mathematical objects of a formal model, like vectors, have to reasonably reproduce languagerelated phenomena such as word meaning inherent in index terms. On the other hand, the classical vector space model, when it comes to the representation of word meaning, is approximative only, whereas it exactly localizes term, query and document content. It can be shown that by replacing vectors by continuous functions, information retrieval in Hilbert space yields comparable or better results. This is because according to the nonclassical or continuous vector space model, content cannot be exactly localized. At the same time, the model relies on a richer representation of word meaning than the VSM can offer.

 Cím:
 Learning and Optimization of an Aspect Hidden Markov Model for Query LM Generation
 Írta:
 Qiang Huang, Dawei Song, Stefan Rüger, Peter Bruza
 Oldalak száma:
 157164
 The Relevance Model (RM) incorporates pseudo relevance feedback to derive query language model and has shown a good performance. Generally, it is based on unigram models of individual feedback documents from which query terms are sampled independently. In this paper, we present a new method to build the query model with latent state machine (LSM) which captures the inherent term dependencies within the query and the term dependencies between query and documents. Our method firstly splits the query into subsets of query terms (i.e., not only single terms, but different combinations of multiple query terms). Secondly, these query term combinations are then considered as weighted latent states of a hidden Markov Model to derive a new query model from the pseudo relevant documents. Thirdly, our method integrates the Aspect Model (AM) with the EM algorithm to estimate the parameters involved in the model. Specifically, the pseudo relevant documents are segmented into chunks, and different chunks are associated with different weights in relation to a latent state. Our approach is empirically evaluated on three TREC collections, and demonstrates statistically significant improvements over a baseline language model and the Relevance Model.

 Cím:
 Probabilistic Logical Modelling of the BIR Model
 Írta:
 Thomas Rölleke, Jun Wang
 Oldalak száma:
 165176
 The binary independence retrieval (BIR) model is a main pillar of information retrieval (IR); recently, the model even attracted the attention of database research on ranking tuples for SQL queries. One of the problems with the BIR model is that though it is referred to as a probabilistic model, the retrieval status value actually lacks a probabilistic interpretation since the BIR model is based on the odd (fraction) of the relevance probabilities. This makes it hard to implement the BIR model in a probabilistic reasoning framework that aggregates and generates sound probabilities. Because of the growing impact of the BIR model for database research, and because the aggregation of the BIR term weights lacks a probabilistic meaning, we investigate in this paper the probabilistic relational implementations of the BIR model. This investigation led to the following findings: The probabilistic variants of the BIR model perform at least as good as the genuine model, where slightly refined variants outperform the genuine model, but cannot achieve the performance of tf idf based retrieval.

 Cím:
 Expressive Resource Descriptions for OntologyBased IR
 Írta:
 Thanh Tran, Stephan Bloehdorn, Philipp Cimiano, Peter Haase
 Oldalak száma:
 177190
 In this paper, we introduce an expressive ontologybased model for representing resources with respect to a domain ontology. Our resource model is based on semantic web standards as well as established ontologies and metadata schemas such as SUMO, MPEG7 and Dublin Core to provide a reference model for ontologybased information retrieval. Based on this expressive resource model, the user can directly specify his information need at an enhanced level of expressiveness. In particular, it does not restrict the description of resources to keywords but allows for the description of resources in terms of factual and terminological axioms as well as events and complex situations. We show that with the proposed resource description model, a large set of different retrieval functionalities can be supported to address complex information needs.

 Cím:
 A Belief Network Model For Expert Search
 Írta:
 Craig Macdonald, Iadh Ounis
 Oldalak száma:
 191200
 Expert search is a task of growing importance in Enterprise settings. In a classical search setting, users normally require relevant documents to fulfil an information need. However, in Enterprise settings, users also have a need to identify the coworkers with relevant expertise to a topic area. An expert search engine assists users with their expertise need, by ranking candidate experts with respect to their predicted expertise about a topic of interest. This work presents a novel model for the expert search, based on a Bayesian belief network. We show how the proposed model can generate several different strategies for ranking candidates by their predicted expertise with respect to a query. The Bayesian belief network model for expert search proposed here is general, as it can be extended in the future to take into account various other types of evidence in the expert search task, such as the social aspect of expert search, where people work within groups and coauthor publications.

 Cím:
 POLIS: A Probabilistic Logic for Document Summarisation
 Írta:
 Jan Frederik Forst, Anastasios Tombros, Thoman Rölleke
 Oldalak száma:
 201212
 Summarisation is an important and reoccurring task to be solved in manifold search applications and customerspecific scenarios. Therefore, we propose and investigate in this paper a new approach to summarisation, namely an approach to describing summarisation approaches in a new abstraction: a probabilistic logic for information summarisation (POLIS). POLIS features the usual advantages of abstraction such as robustness, flexibility, and, most importantly, reusability. The research achievement relevant to information retrieval is on one hand the welldefined probabilistic semantics applying possible worlds semantics, and, on the other hand an implementation of POLIS where we take advantage of an existing probabilistic algebraic approach to IR, and prove applicability and investigate retrieval quality in largescale experimental settings.

 Cím:
 On Using Graphical Models for Supporting Context Aware IR
 Írta:
 Lynda TamineLechani, Fatiha Boubekeur, Mohand Boughanem
 Oldalak száma:
 213222
 It is well known that with the increasing of information volumes across the Web, it is increasingly difficult for search engines to deal with ambiguous queries. In order to overcome this limit, a key challenge in information retrieval nowadays consists in enhancing an information seeking process with the user’s context in order to provide accurate results in response to a user query. The underlying idea is that different users have different backgrounds, preferences and interests when seeking information and so a same query may cover different specific information needs according to who submitted it. This paper investigates the use of graphical models to respond to the challenge of contextaware information retrieval. The first contribution consists in using PNets as formalism for expressing qualitative queries. The approach for automatically computing the preference weights is based on the predominance property embedded within such graphs. The second contribution focuses on another aspect of context, namely the user’s interests. An influencediagram based retrieval model is presented as a theoretical support for a personalized retrieval process. Preliminary experimental results using enhanced TREC collections show the effectiveness of our approach.

 Cím:
 Utilizing Event Spaces for Distributed IR
 Írta:
 Emanuele Di Buccio, Massimo Melucci
 Oldalak száma:
 223232
 In this paper, a probabilistic approach to modeling distributed Information Retrieval centered around the notion of event space is illustrated. This notion is the underlying issue of all probabilistic models and its structure cannot be ignored when a probabilistic model is being constructed. The importance of defining the event space is not only related to the correctness of the model, but also to describing different architectures. Three different spaces are proposed in this paper for modeling distributed IR. Each space captures different aspects and dictates a distinct function for ranking by probability of relevance.

 Cím:
 Representative PageSet
 Írta:
 Pramod P, Srivatsava
 Oldalak száma:
 233238
 In this paper we explore a new Information Retrieval paradigm of Representative PageSet. The aim of this is to maximize information in the search results. To solve this problem, we propose a new formulation called Biased Vector Space Model. We also show how this formulation is suitable for solving various problems of conventional Information Retrieval like Vertical Search and Personalized Search

 Cím:
 What Does It Mean To Converge In Rank?
 Írta:
 Enoch Peserico, Luca Pretto
 Oldalak száma:
 239246
 We give the first rigourous, formal definition of convergence in rank of an iterative ranking algorithm – an intuitive notion whose formalization is more challenging than it might at first appear. We then compare, in the context of the well known PageRank algorithm, the rate of convergence in rank with the “classic” rate of convergence of the score vector. Even though both might appear to depend essentially on the same factors, subtle differences make it possible for either to be arbitrarily slow even when the other is quite fast. Thus, making predictions on one based on the other can be completely misleading.

 Cím:
 Users’ Perspectives On The Usefulness of Structure For XML IR
 Írta:
 Gabriella Kazai, Andrew Trotman
 Oldalak száma:
 247260
 The widespread use of the eXtensible Markup Language (XML) on the Web and in digital libraries has led to a drastic increase in the number of XML Information Retrieval (IR) systems being developed. XML IR approaches exploit the logical structure of documents for their querying, retrieval and presentation to the user. Despite their abundance, there remains uncertainty regarding the advantages that structural information may bring to IR. In this paper we report on a user study exploring questions around the potential benefits of structure to users, such as: Is structural information useful when searching for relevant information? Can the structure of a document help to locate relevant information when browsing inside a document? Does the role of structural information depend on the length of a document? Our investigation was conducted as part of the INEX 2006 interactive track experiment, which we supplemented with questionnaires. Our qualitative analysis of the data collected from seven participants aims to identify how users will interact with XML IR systems. We do this by drawing parallels with paper based information searching, Web searching, and digital library searching. What we find is that XML IR users are unlike Web users – they use advanced search facilities, they prefer a list of results supplement with branch points into the document, and they need better methods of navigation within long documents.

 Cím:
 Local Identification of Web Graph Communities
 Írta:
 Max Hinne
 Oldalak száma:
 261278
 In order to use knowledge of the Web graph in Information Retrieval, we provide a consistent overview, aiming firstly at global aspects of the graph such as degree distribution, and then proceed by examining local aspects of the graph: community identification. We discuss several community models and we implement a community identification algorithm that operates without a priori knowledge of the graph. To elaborate on the algorithm we introduce a notational framework for graph clusters. We run the algorithm on the Dutch domain (.NL) and from the results of this experiment we conclude that the Web consists of several clusters that are mutually connected through a core of hubs. In addition we evaluate the clustering quality of the algorithm, which provides a reputable basis for local community identification.

 Cím:
 Meta Information Retrieval by Information Fusion
 Írta:
 Gábor Szűcs
 Oldalak száma:
 279286
 The research written in this publication deals with only a slice of the higher level Information Retrieval (IR) systems, which send queries to different search engines, Web catalogues, databases, and collect the information. The task is to try to combine the retrieved document from different sources into an aggregated list of documents in order to get a solution (Meta Information Retrieval system) with better indicators. In this publication a part of the work is shown, where the rank aggregation phase of the whole process is solved by information fusion techniques. Different rank aggregation methods are investigated and algorithms surveyed how can be adopted for Meta IR systems.

 Cím:
 On the Theory and Practice of Sets in IR
 Írta:
 Johan Eklund
 Oldalak száma:
 287290
 In the general theoretical framework of IR it is common to use the mathematical concept of a set to formalize collections of terms, documents and queries. In many document databases multiple copies of the same document are indexed, which makes it problematic to represent the full collection of documents as a set and may cause frequencybased measures to yield misleading figures. In this paper we investigate the practical implications of using the concept of a set, and how document collections can be formalized as sets of equivalence classes to facilitate more accurate frequency calculations for term weighting and probability calculations.