Text retrieval of documents becomes an important issue
S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (2) Current trends in online handwriting retrieval
The ideal
ink search algorithm could perform matching at
any level of representation (Lopresti & Tomkins, 1994)
S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (2) Current trends in online handwriting retrieval
The ideal
ink search algorithm could perform matching at
any level of representation (Lopresti & Tomkins, 1994)
Match points
Match text
S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (2) Current trends in online handwriting retrieval
The ideal
ink search algorithm could perform matching at
any level of representation (Lopresti & Tomkins, 1994)
Match points
Match text
Designing such an algorithm is very dicult
S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (2) Current trends in online handwriting retrieval
The ideal
ink search algorithm could perform matching at
any level of representation (Lopresti & Tomkins, 1994)
Match points
Match text
Designing such an algorithm is very dicult Most algorithms match at a single specic level
S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (2) Current trends in online handwriting retrieval
The ideal
ink search algorithm could perform matching at
any level of representation (Lopresti & Tomkins, 1994)
Match points
Match text
Designing such an algorithm is very dicult Most algorithms match at a single specic level
We can combine several algorithms that perform matching at dierent levels into a single combined algorithm
S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (3) Current trends in online handwriting retrieval
Handwriting Retrieval Approaches
Recognition-
Recognition-
based
free
Word spotting IR on noisy texts
Signal-to-
Text-to-signal
signal search
search
Broad classication of handwriting retrieval methods S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (3) Current trends in online handwriting retrieval
Handwriting Retrieval Approaches
Recognition-
Recognition-
based
free
Recognition-free and recognition-based matching involves very dierent methods
Word spotting IR on noisy texts
Signal-to-
Text-to-signal
signal search
search
Broad classication of handwriting retrieval methods S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (3) Current trends in online handwriting retrieval
Handwriting Retrieval Approaches
Recognition-
Recognition-
based
free
Recognition-free and recognition-based matching involves very dierent methods
Word spotting
Retrieval of dierent sets of
IR on noisy texts
documents
Signal-to-
Text-to-signal
signal search
search
Broad classication of handwriting retrieval methods S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (3) Current trends in online handwriting retrieval
Handwriting Retrieval Approaches
Recognition-
Recognition-
based
free
Recognition-free and recognition-based matching involves very dierent methods
Word spotting
Retrieval of dierent sets of
IR on noisy texts
documents Each method has its Signal-to-
Text-to-signal
signal search
search
strengths and weaknesses
Broad classication of handwriting retrieval methods S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (4) This work's hypothesis
Hypothesis Combining the results of online handwriting retrieval algorithms working at dierent levels of representation can improve retrieval eectiveness.
S. Peña Saldarriaga
Combining approaches to online HWR IR
Why ? (4) This work's hypothesis
Hypothesis Combining the results of online handwriting retrieval algorithms working at dierent levels of representation can improve retrieval eectiveness.
Let us try to verify it !
S. Peña Saldarriaga
Combining approaches to online HWR IR
Ranking fusion in information retrieval
S. Peña Saldarriaga
Combining approaches to online HWR IR
Ranking fusion in IR (1) Brief overview
Ranking or data fusion
Active eld in information retrieval research The combination of retrieval results in such a way that retrieval performances are improved
S. Peña Saldarriaga
Combining approaches to online HWR IR
Ranking fusion in IR (1) Brief overview
Ranking or data fusion
Active eld in information retrieval research The combination of retrieval results in such a way that retrieval performances are improved
Improvements expected
Improved precision: when relevant documents are ranked in top positions after fusion Improved recall: when algorithms retrieve dierent sets of relevant documents
S. Peña Saldarriaga
Combining approaches to online HWR IR
Ranking fusion in IR (2) Methods
Our experiments focus on the use of two standard methods that do not require training data
S. Peña Saldarriaga
Combining approaches to online HWR IR
Ranking fusion in IR (2) Methods
Our experiments focus on the use of two standard methods that do not require training data
CombSUM
linear combination of retrieval scores
S. Peña Saldarriaga
Combining approaches to online HWR IR
Ranking fusion in IR (2) Methods
Our experiments focus on the use of two standard methods that do not require training data
CombSUM
linear combination of retrieval scores
CombMNZ
linear combination of retrieval scores weighted by the number of non-zero scores for a given document
S. Peña Saldarriaga
Combining approaches to online HWR IR
Experimental setup
S. Peña Saldarriaga
Combining approaches to online HWR IR
Experimental setup (1) Building the test collection
We use a corpus collected for previous research on text categorization (TC)
≈2,000
Problem
documents,
≈250,000
words, 10 categories
this as TC collection, thus it does not have a standard set of queries and corresponding relevant documents
S. Peña Saldarriaga
Combining approaches to online HWR IR
Experimental setup (1) Building the test collection
We use a corpus collected for previous research on text categorization (TC)
≈2,000
Problem
documents,
≈250,000
words, 10 categories
this as TC collection, thus it does not have a standard set of queries and corresponding relevant documents
Solution
automatically generate queries using category codes and relevance feedback methods
Query terms vs ct net shr loss acquir stake acquisit complet merger tonn wheat grain corn agricultur stg monei dollar band bill oil crude barrel post well rate prime lend citibank percentag surplu decit narrow trade tari port strike vessel hr worker sugar raw beet cargo kain coe bag ico registr ibc
Queries are likely to be representative of their categories
Query terms vs ct net shr loss acquir stake acquisit complet merger tonn wheat grain corn agricultur stg monei dollar band bill oil crude barrel post well rate prime lend citibank percentag surplu decit narrow trade tari port strike vessel hr worker sugar raw beet cargo kain coe bag ico registr ibc
Queries are likely to be representative of their categories It is not clear if they make sense from a human perspective S. Peña Saldarriaga
Combining approaches to online HWR IR
Experimental setup (3) Baseline systems Three baseline methods are used
Baseline methods
Recognition-
Recognition-
based
free
Cosine /
Okapi
tf × idf
(BM25)
S. Peña Saldarriaga
R InkSearch
Combining approaches to online HWR IR
Experimental setup (4) Recognition
R Builder Recognition is performed using MyScript
Character-level strategy Lexicon and language model strategy Rec. strategy
lex+lm charac
S. Peña Saldarriaga
Word error rate
22.19% 52.47%
Combining approaches to online HWR IR
Results
S. Peña Saldarriaga
Combining approaches to online HWR IR
Results (1) Baseline scores
MAP
upper bound 0.7 0.6 0.5 charac lex+lm Retrieval method InkSearch Cosine Okapi Recognition errors result in heavy retrieval performance degradations (−17.31%)
Recognition-based methods outperform word spotting S. Peña Saldarriaga
Again: signicant improvements over baseline performances (+4%,
+21%)
Performances of the combined runs are very close to the upper bound S. Peña Saldarriaga
Combining approaches to online HWR IR
Summary and conclusions
S. Peña Saldarriaga
Combining approaches to online HWR IR
Summary
This work focus on the fusion of handwriting retrieval strategies The application of fusion methods is justied in this context Simple techniques result in major improvements over baseline scores Our initial hypothesis is veried However...
S. Peña Saldarriaga
Combining approaches to online HWR IR
Summary
This work focus on the fusion of handwriting retrieval strategies The application of fusion methods is justied in this context Simple techniques result in major improvements over baseline scores Our initial hypothesis is veried However... the need to generate queries due to lack of benchmark collections is a major shortcoming
S. Peña Saldarriaga
Combining approaches to online HWR IR
Conclusions and futur work
Further experimental validation needs to be conducted
Validation against humain prepared queries ... with relevance judgments given by human assessors
S. Peña Saldarriaga
Combining approaches to online HWR IR
Conclusions and futur work
Further experimental validation needs to be conducted
Validation against humain prepared queries ... with relevance judgments given by human assessors Extension to retrieval of documents beyond online handwriting
Oine handwritten documents historical manuscripts and printed documents etc.
Whatever the type of the query is (electronic or handwritten text) the performances of recognition-free approaches substantially rely on the proper selection of ...
Otherwise, we must consider the border of the border of y, i.e.,. Border2(y) · a, instead of Border(y) · a and repeat this process, until. Borderk (y) · a is a prefix of y ...
If it is non-empty, we slide the word so that we compare x[sk x (i)] and t[j]. ... Knuth-Morris-Pratt/Maximum disjoint borders (cont). Therefore γx(|y|) =..
LIST, Dassault, Search Lab, FOKUS,...) â· Taint analysis to identify ..... VALUE exports computed variable domains in the form of. WP-assumptions. S.Bardin, N.
Oct 31, 2008 - Property x · ǫ = ǫ · x = x holds for all strings x. ...... the same pair of nodes and listing the labels, separated by commas: q0 q1 q2. 0. 1. 0. 1. 0,1.
in so many ways (document and query repre- sentations, matching methods, etc.), we hypoth- esize that fusion methods applied to the handwritten-domain can ...
have been used in Music Information Retrieval for some time to represent the ... tomatic genre classification of music as well as for alternative organisations of ...
describe and compare the handwritten query to each sample of handwriting in .... queries (typically 50 graphemes) the correct writer was determined in nearly ...
Image query are handwritten documents projected on the feature space prior to the retrieval of the suitable responses. The method is tested on a database of 88 ...
of Neural Network (NN) and HMM are popular methods in .... alternative to popular methods such as neural network. ..... convolutional neural networks for online.
Page 1 ... Abstract. Retrieval systems rank documents according to their retrieval ... Each IRS has a particular way to compute document RSV according to the IR.
constraints q(zt) into a constraint store Q for this mode. Note that .... be certain and there might a small chance that the command is not executed. ... cause we want to be very clear about when we are referring to the complete distribution (x, etc.
1A revised version of this paper appeared in J. Silber (ed):Handbook of Income ... âDouble countingâ and clustering solutions, and other data related questions .... In this approach one puts down an explicit set of require- ... âfundamental wel
Information Retrieval (Keyword Based Search) can be ... storage and transmission, instead of the traditional paper ... and stored as images in databases. Optical ...
in studying a variety of interactive systems, it has been devised and applied in research work on exploratory .... If a better understanding of the topic ... the list of hits (ViewTargetHitList), or may be viewing a selected hit (ViewTargetDoc) or.
An Integrated Approach to Interface Design and Interaction Analysis ... design of the log analyzer, by requiring the development of a conceptual model that unifies ... A schema can be used by a human to understand or to impose the format of ... MIR a
usefulness of a document for a user query. Most IR systems assign a numeric score to every document and rank documents by this score. Several models have ...
expert and non-expert languages. We propose to investigate this issue within the. Information Retrieval field. The patient queries have to be associated with the.
Humans use multiple sources of sensory information to estimate environmental properties. For example, the eyes and hands both provide relevant information.
frequently because of the higher node velocity and the nodes having to fulfill the traffic rules. Information dissemination in. VANETs is a fundamental operation to ...
ment with each query is called a Symmetric PIR (or SPIR) scheme. In this paper, we will focus on PIR schemes that do not need the database to be replicated, ...