Size: 5428
Comment:
|
Size: 5453
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 115: | Line 115: |
* Queries are expressed in DisjunctiveNormalForm of the $[q=k_{a}\wedge (k_{b}\vee \lnot k_{c})]$ can be written in disjunctive normal form as $[\vec{q}_{dnf}=(1,1,1)\vee (1,1,0)\vee (1,0,0)]$ |
* Queries are expressed in DisjunctiveNormalForm of the [[latex2($[q=k_{a}\wedge (k_{b}\vee \lnot k_{c})]$)]] can be written in disjunctive normal form as [[latex2($[\vec{q}_{dnf}=(1,1,1)\vee (1,1,0)\vee (1,0,0)]$)]]. |
Chapter 1 + Section 2.1 Introduction
attachment:InformationRetreivalProcess.jpg
Information Retrieval Process
- Three Models of Browsing
- Flag
- Structure guided
- Hypertext
Section 2.2 A taxonomy of Information Retrieval Models
Predicting which documents are relevant is usaually dependent on a ranking algorithm.
- The three classic models in information retreival are:
Boolean Model: In the boolean model documents and queries are represented as sets of index terms, thus we say this model is a set theoretic model
Vector Model: In the vector model documents and queries are represented as vectors in a t-dimensional space, thus we say that the model is algebraic.
Probabilistic Model: The framework for modeling document and query representations is based on probability theory, and thus we sat that the model is prababilistic.
Section 2.3 Retrieval: Ad hoc and Filtering
The following is the formal definition for IR from MIR p 23.
\usepackage{amsmath}% \setcounter{MaxMatrixCols}{30}% \usepackage{amsfonts}% \usepackage{amssymb}% \usepackage{graphicx} \usepackage{geometry} \newtheorem{theorem}{Theorem} \newtheorem{acknowledgement}[theorem]{Acknowledgement} \newtheorem{algorithm}[theorem]{Algorithm} \newtheorem{axiom}[theorem]{Axiom} \newtheorem{case}[theorem]{Case} \newtheorem{claim}[theorem]{Claim} \newtheorem{conclusion}[theorem]{Conclusion} \newtheorem{condition}[theorem]{Condition} \newtheorem{conjecture}[theorem]{Conjecture} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{criterion}[theorem]{Criterion} \newtheorem{definition}[theorem]{Definition} \newtheorem{example}[theorem]{Example} \newtheorem{exercise}[theorem]{Exercise} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{notation}[theorem]{Notation} \newtheorem{problem}[theorem]{Problem} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{remark}[theorem]{Remark} \newtheorem{solution}[theorem]{Solution} \newtheorem{summary}[theorem]{Summary} \newenvironment{proof}[1][Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}} \geometry{left=0.5in,right=0.5in,top=0.5in,bottom=0.5in} %%end-prologue%% \begin{definition} An information retrieval model is a quadruple $D,Q,F,R(q_i , d_j))$ where \begin{enumerate} \item $D$ is a set composed of logical views (or representations) for the {\bf documents} in the collection. \item $Q$ is a set composed of logical views (or representations) for the user information needs. Such representations are called {\bf queries} \item $F$ is a {\bf framework} for modeling document representations, queries and their relationships. \item $R(q_i,d_j)$ is a {\bf ranking function} wich associates a real number with a query $q_i \in Q$ and a document represenation $d_j \in D$. Such ranking defines an ordering among the documents with regard to the query $q_i$. \end{enumerate} \end{definition}
Section 2.5.1 Basic Concepts of Classic IR
Each document is described by a set of representative key workds calle index terms.
- Index terms are usually nouns. Why? Because verbs, adjectives connectives etc. have little meaning on their own.
- Index terms have weights described as follows:
\usepackage{amsmath}% \setcounter{MaxMatrixCols}{30}% \usepackage{amsfonts}% \usepackage{amssymb}% \usepackage{graphicx} \usepackage{geometry} \newtheorem{theorem}{Theorem} \newtheorem{acknowledgement}[theorem]{Acknowledgement} \newtheorem{algorithm}[theorem]{Algorithm} \newtheorem{axiom}[theorem]{Axiom} \newtheorem{case}[theorem]{Case} \newtheorem{claim}[theorem]{Claim} \newtheorem{conclusion}[theorem]{Conclusion} \newtheorem{condition}[theorem]{Condition} \newtheorem{conjecture}[theorem]{Conjecture} \newtheorem{corollary}[theorem]{Corollary} \newtheorem{criterion}[theorem]{Criterion} \newtheorem{definition}[theorem]{Definition} \newtheorem{example}[theorem]{Example} \newtheorem{exercise}[theorem]{Exercise} \newtheorem{lemma}[theorem]{Lemma} \newtheorem{notation}[theorem]{Notation} \newtheorem{problem}[theorem]{Problem} \newtheorem{proposition}[theorem]{Proposition} \newtheorem{remark}[theorem]{Remark} \newtheorem{solution}[theorem]{Solution} \newtheorem{summary}[theorem]{Summary} \newenvironment{proof}[1][Proof]{\noindent\textbf{#1.} }{\ \rule{0.5em}{0.5em}} \geometry{left=0.5in,right=0.5in,top=0.5in,bottom=0.5in} %%end-prologue%% \begin{definition} Let $t$ be the number of index terms in the system and $k_i$ be a generic index term. $K={k_1,...,k_t}$ is the set of all index terms. A weight $w_{i,j} > 0$ is associated with each index term $k_i$ of a document $d_j$. For an index term which does not appear in the document text, $w_{i,j}=0$. With document $d_j$ is associated an index term vector $\vec{d}_{j}=(w_{1,j},w_{2,j},...,w_{t,j})$. Further, let $g_{i}$ be a function that returns the wieght associated with the index term $k_{i}$ in any $t$-dimensional vector (i.e., $g_{i}(\vec{d}_{j})=w_{i,j}$). \end{definition}
Section 2.5.2 Boolean Model
- index terms are weighted either 0 or 1.
Queries are expressed in DisjunctiveNormalForm of the latex2($[q=k_{a}\wedge (k_{b}\vee \lnot k_{c})]$) can be written in disjunctive
normal form as latex2($[\vec{q}_{dnf}=(1,1,1)\vee (1,1,0)\vee (1,0,0)]$).