Unsupervised approaches for measuring textual similarity between legal court case reports

Artificial Intelligence and Law 29 (3):417-451 (2021)
  Copy   BIBTEX

Abstract

In the domain of legal information retrieval, an important challenge is to compute similarity between two legal documents. Precedents play an important role in The Common Law system, where lawyers need to frequently refer to relevant prior cases. Measuring document similarity is one of the most crucial aspects of any document retrieval system which decides the speed, scalability and accuracy of the system. Text-based and network-based methods for computing similarity among case reports have already been proposed in prior works but not without a few pitfalls. Since legal citation networks are generally highly disconnected, network based metrics are not suited for them. Till date, only a few text-based and predominant embedding based methods have been employed, for instance, TF-IDF based approaches, Word2Vec and Doc2Vec based approaches. We investigate the performance of 56 different methodologies for computing textual similarity across court case statements when applied on a dataset of Indian Supreme Court Cases. Among the 56 different methods, thirty are adaptations of existing methods and twenty-six are our proposed methods. The methods studied include models such as BERT and Law2Vec. It is observed that the more traditional methods that rely on a bag-of-words representation performs better than the more advanced context-aware methods for computing document-level similarity. Finally we nominate, via empirical validation, five of our best performing methods as appropriate for measuring similarity between case reports. Among these five, two are adaptations of existing methods and the other three are our proposed methods.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,168

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

The Binding Force of the Case Law of the Court of Justice of the European Union.Gundega Mikelsone - 2013 - Jurisprudencija: Mokslo darbu žurnalas 20 (2):469-495.
Textual practices in crafting bioethics cases.Brian Hurwitz - 2012 - Journal of Bioethical Inquiry 9 (4):395-401.
Law and Indirect Reports: Citation and Precedent.Brian E. Butler - 2018 - In Alessandro Capone, Una Stojnic, Ernie Lepore, Denis Delfitto, Anne Reboul, Gaetano Fiorin, Kenneth A. Taylor, Jonathan Berg, Herbert L. Colston, Sanford C. Goldberg, Edoardo Lombardi Vallauri, Cliff Goddard, Anna Wierzbicka, Magdalena Sztencel, Sarah E. Duffy, Alessandra Falzone, Paola Pennisi, Péter Furkó, András Kertész, Ágnes Abuczki, Alessandra Giorgi, Sona Haroutyunian, Marina Folescu, Hiroko Itakura, John C. Wakefield, Hung Yuk Lee, Sumiyo Nishiguchi, Brian E. Butler, Douglas Robinson, Kobie van Krieken, José Sanders, Grazia Basile, Antonino Bucca, Edoardo Lombardi Vallauri & Kobie van Krieken (eds.), Indirect Reports and Pragmatics in the World Languages. Springer Verlag. pp. 357-369.
Michael H. v. Gerald D.: A Case Study of Political Ideology Disguised in Legal Thought.Jeffrey A. Ellsworth - 2009 - International Journal for the Semiotics of Law - Revue Internationale de Sémiotique Juridique 22 (1):105-122.

Analytics

Added to PP
2021-01-05

Downloads
68 (#240,692)

6 months
42 (#95,815)

Historical graph of downloads
How can I increase my downloads?