異なる例からの素性の組合せを用いたペアワイズ分類器の学習

Transactions of the Japanese Society for Artificial Intelligence 20:105-116 (2005)
  Copy   BIBTEX

Abstract

We propose a kernel method for using combinations of features across example pairs in learning pairwise classifiers. Pairwise classifiers, which identify whether two examples belong to the same class or not, are important components in duplicate detection, entity matching, and other clustering applications. Existing methods for learning pairwise classifiers from labeled training data are based on string edit distance or common features between two examples. However, if two examples from the same class have few common features, these methods have difficulties in finding these pairs and achieving high recall. One typical example is to check whether two abbreviated author names in different citations refer to the same person or not. Since similarities between examples from the same class become close to zero, classifiers fail to distinguish positive pairs from negative pairs. One approach to avoiding the problem of zero similarities is using conjunctions of different features across examples, but implementing this idea straightforwardly makes the computational cost prohibitive for practical problems. Using a kernel on pair instances, our method can use feature conjunctions across examples without actually doing feature mappings, which are computationally expensive. The kernel is a tensor product of two inner products on the original feature space. The corresponding feature mapping generates conjunctions of features only across the two different examples while that of the conventional polynomial kernel also generates conjunctions of features from the same example, which are irrelevant to pairwise classification and cause deterioration of accuracy. Our experiments on the author matching problem show that this method can give a precision 4 to 8 times higher than that of previous methods at medium recall levels.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,283

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Analytics

Added to PP
2014-03-20

Downloads
19 (#803,294)

6 months
5 (#648,432)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references