|
|
|
|
|
|
|
|
| ( 1 of 2 ) |
| United States Patent | 7,558,778 |
| Carus , et al. | July 7, 2009 |
A semantic discovery and exploration system is disclosed where an environment enabling a developer or user to uncover, navigate, and organize semantic patterns and structures in a document collection with or without the aid of structured knowledge. The semantic discovery and exploration system provides techniques for searching document collections, categorizing documents, inducing lists of related concepts, and identifying clusters of related terms and documents. This system operates both without and with infusions of structured knowledge such as gazetteers, thesauruses, taxonomies and ontologies. System performance improves when structured knowledge is incorporated. The semantic discovery and exploration system may be used as a first step in developing an information extraction system such as to categorize or cluster documents in a particular domain or to develop gazetteers and as a part of a deployed run-time information extraction system. It may also be used as standalone utility for searching, navigating, and organizing document collections and structured knowledge bases such as dictionaries or domain-specific reference works.
| Inventors: | Carus; Alwin B. (Waban, MA), DePlonty; Thomas J. (Melrose, MA) |
| Assignee: |
Information Extraction Systems, Inc.
(Waban,
MA)
|
| Appl. No.: | 11/820,677 |
| Filed: | June 20, 2007 |
| Application Number | Filing Date | Patent Number | Issue Date | ||
| 60815431 | Jun., 2006 | ||||
| Current U.S. Class: | 1/1 ; 700/246; 704/257; 704/9; 707/999.001; 707/999.003; 707/999.1 |
| Current International Class: | G06F 17/30 (20060101) |
| Field of Search: | 707/1-5,100,101 700/246 704/9,257 |
| 6606625 | August 2003 | Muslea et al. |
| 6714941 | March 2004 | Lerman et al. |
| 6728707 | April 2004 | Wakefield et al. |
| 6732097 | May 2004 | Wakefield et al. |
| 6732098 | May 2004 | Wakefield et al. |
| 6738765 | May 2004 | Wakefield et al. |
| 6741988 | May 2004 | Wakefield et al. |
| 7139752 | December 2004 | Broder et al. |
| 7146361 | December 2006 | Broder et al. |
| 7483892 | January 2009 | Sommer et al. |
| 2003/0028564 | February 2003 | Sanfilippo |
| 2004/0167870 | August 2004 | Wakefield et al. |
| 2004/0167883 | August 2004 | Wakefield et al. |
| 2004/0167884 | August 2004 | Wakefield et al. |
| 2004/0167885 | August 2004 | Wakefield et al. |
| 2004/0167886 | August 2004 | Wakefield et al. |
| 2004/0167887 | August 2004 | Wakefield et al. |
| 2004/0167907 | August 2004 | Wakefield et al. |
| 2004/0167908 | August 2004 | Wakefield et al. |
| 2004/0167909 | August 2004 | Wakefield et al. |
| 2004/0167910 | August 2004 | Wakefield et al. |
| 2004/0167911 | August 2004 | Wakefield et al. |
| 2004/0215634 | October 2004 | Wakefield et al. |
| 2004/0243554 | December 2004 | Broder et al. |
| 2004/0243556 | December 2004 | Ferrucci et al. |
| 2004/0243560 | December 2004 | Broder et al. |
| 2005/0027664 | February 2005 | Johnson et al. |
| 2005/0071217 | March 2005 | Hoogs et al. |
| 2005/0108256 | May 2005 | Wakefield et al. |
| 2005/0165789 | July 2005 | Minton et al. |
| 2007/0294200 | December 2007 | Au |
Huang, Applying Associative Retrieval Techniques to Alleviate the Sparsity Problem in Collaborative Filtering, ACM Transactions on Information Systems, vol. 22, No. 1, Jan. 2004, pp. 116-142. cited by other . Sibler, Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization, Association for Computational Linguistics, vol. 28, 2002. cited by other . Chen, An Alorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Branch and Bound Search vs. Connectionist Hopfield Net Activation, Journal of the American Society for Information Science, vol. 46 No. 5, pp. 348-369, 1995. cited by other . Day, Mixed Initiative Development of Language Processing Systems, 1999, Advanced Information Center, The MITRE Corporation, Bedford MA. cited by other . Shen, Multi-Criteria-based Active Learning for Named Entity Recognition, Universitat des Saarlandes, Computational Linguistics Dept., 66041 Saarbrucken, Germany, [email protected]. cited by other . Bonino, Ontology Driven Semantic Search, Dipartimento di Automatica ed Informatica Corso Duca degli Abruzzi, 10129 Torino Italy, 2004. cited by other . Becker, Optimising Selective Sampling for Bootstrapping Named Entity Recognition, Proceedings of the Workshop on Learning with MultipleViews, 22 nd ICML, Bonn, Germany, 2005. cited by other . Seung, Query by Committee, Racah Institute of Physics and Center for Neural Computation Hebrew University Jerusalem, Israel, 1992. cited by other . Crestani, Retrieving Documents By Constrained Spreading Activation on Automatically Constructed Hypertexts, Department of Computing Science University of Glasgow Glasgow G12 8QQ, Scotland, 1999. cited by other . Wolverton, Retrieving Semantically Distant Analogies with Knowledge-Directed Spreading Activation, Knowledge Systems Laboratory, Stanford University, Palo Alto, CA 1997. cited by other . NGAI, Rule Writing or Annotation: Cost Efficient Resource Usage for Base Noun Phrase Chunking, Department of Computer Science, The Johns Hopkins University, Baltimore, MD 2000. cited by other . HWA, Sample Selection for Statistical Grammar Induction, Division of Engineering and Applied Sciences, Harvard University, 2000. cited by other . HWA, Sample Selection for Statistical Parsing, University of Pittsburgh, Computational Linguistics, 2002. cited by other . Preece, Retrieval, Oct. 1991. cited by other . Crestani, Searching the Web by Constrained Spreading Activation, Department of Computing Science University of Glasgow Glasgow G12 8QQ, Scotland, 2000. cited by other . Kozima, Segmenting Narrative Text into Coherent Scenes, Department of Computer Science, University of Electro Communications, Tokyo, Japan 1994. cited by other . Freund, Selective Sampling Using the Query by Committee Algorithm, Machine Learning 28, pp. 133-168, 1997. cited by other . Muslea, Selective Sampling With Redundant Views, 2000, American Association for Artificial Intelligence. cited by other . Medelyan, Semantically Enhanced Automatic Keyphrase Indexing, a PhD Full Research Proposal, Jul. 30, 2006. cited by other . Guha, Semantic Search,IBM Research, Almaden, Budapest, Hungary, 2003. cited by other . Sibler, An Efficient Text Summarizer Using Lexical Chains, Computer and Information Sciences, University of Delaware. cited by other . Kozima, Similarity Between Words Computed by Spreading Activation on an English Dictionary, Department of Computer Science, University of Electro Communicatons, Tokyo, Japan. cited by other . Ngai, Text Classification from Labeled and Unlabeled Documents using EM, Machine Learning, pp. 1-34, Kluwer Academic Publishers, Boston, Manufactured in The Netherlands, Mar. 15, 1998. cited by other . Brunn, Text Summarization Using Lexical Chains, Department of Mathematics and Computer Science University of Lethbridge, 4401 University Drive, Lethbridge, Alberta, Canada. cited by other . Medelyan, Thesaurus Based Automatic Keyphrase Indexing, JCDL, Jun. 2006, Chapel Hill, North Carolina, USA. cited by other . Medelyan, Thesaurus-Based Index Term Extraction for Agricultural Documents, Department of Computer Science, The University of Waikato, Private Bag 3105, Hamilton, New Zealand. cited by other . Cao, Uncertainty Reduction in Collaborative Bootstrapping: Measure and Algorithm, Microsoft Research Asia 5F Sigma Center, No. 49 Zhichun Road, Haidian Beijing, China, 2000. cited by other . Yabney, Understanding the Yarowsky Algorithm, Association for Computational Linguistics 2004. cited by other . Yangarber, Unsupervised Discovery of Scenario-Level Patterns for Information Extraction, Courant Institute of Mathematical Sciences, New York University. cited by other . Yangarber, Unsupervised Learning of Generalized Names, In Proceedings of the 19th International Conference on Computational Linguistics (Coling 2002). cited by other . Collins, Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer, AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ, 1999. cited by other . Yarosky, Unsupervised Word Sense Disambiguation Rivaling Supervised Methods, Department of Computer and Information Science, University of Pennsylvania. cited by other . Riloff, Using Learned Extraction Patterns for Text Classification, Connecticut Statistical and Symbolic Approaches to Learning Nautral Language Processing, pp. 275-289, 1996. cited by other . Barzilay, Using Lexical Chains for Text Summarization, Mathematics and Computer Science Department, Ben Gurron University, Israel, 1997. cited by other . Ng, Weakly Supervised Natural Language Learning Without Redundant Views, Department of Computer Science Cornell University, Ithaca, New York. cited by other . Crestani, WebSCSA: Web Search by Constrained Spreading Activation, Department of Computing Science University of Glasgow Glasgow G12 8QQ, Scotland, UK. cited by other . Tsatsaroni, Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri, PENED 2003 Programme of the EU and the Greek General Secretariat for Research and Technology. cited by other . Bennett, Learning to Tag Multilingual Texts Through Observation, pp. 109-116, 1999. cited by other . Seeger, Learning with Labelled and Unlabelled Data, Dec. 19, 1992, Institute for Adaptive and Neural Computation, University of Edinburgh, 5 Forest Hill, Edinburgh. cited by other . Schonn, Less is More, Active Learning with Support Vector Machines, Just Research, 416 Henry Street, Pittsburgh, PA. cited by other . McCarthy, Lexical Chains. cited by other . Morris, Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of the Text, 1991, Association for Computational Linguisitics, vol. 17, No. 1. cited by other . Pierce, Limitations of Co-Training for Natural Language Learning from Large Datasets, Proceedings on the 2001 Conference on Empirical Methods in Natural Language Processing. cited by other . Riloff, Little Words Can Make a Big Difference for Text Classification, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 130- 136. cited by other . Brill, Man vs. Machine, A Case Study in Base Noun Phase Learning, Department of Computer Science, The Johns Hopkins University. cited by other . Dempster, Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of Royal Statistical Society, Series B Methodological, vol. 39, No. 1 (1977), pp. 1 -38. cited by other . Engelson, Minimizing Manual Annotation Cost in Supervised Training From Corpora, Department of mathematics and Computer Science, Bar-Illan University, 52900, Ramat Gan Israel. cited by other . Shen, A Collaborative Ability Measurement for Co-Training, Institute for Infocomm. Technology, 2004. cited by other . Dr. Dieter MERKL, Activation on the Move: Adaptive Information Retrieval via Spreading Activation; Institut f ur Softwaretechnik, 2003. cited by other . Jones, Active Learning for Information Extraction with Multiple View Feature Sets, School of Computer Science, Carnegie Mellon University. cited by other . Thompson, Active Learning for Natural Language Parsing and Information Extraction, Appears in Proceedings of the Sixteenth International Machine Learning Conference, pp. 406-414, Bled, Slovenia, Jun. 1999. cited by other . Tang, Active Learning for Statistical Natural Language Parsing, Spoken Language Systems Group MIT Laboratory for Computer Science Cambridge, Massachusetts, 2002. cited by other . Finn, Active Learning Selection Strategies for Information Extraction Smart Media Institute, Computer Science Department, University College Dublin, Ireland. cited by other . Liere, Active Learning with Committees for Text Categorization, American Association for Artificial Intelligence, 1997. cited by other . Rocha, A Hybrid Approach for Searching in the Semantic Web, Dept. of Informatics, PUC-Rio and Milestone--I.T. Rua Marquas de S o Vicente 225 Predio G nesis, Sala 21b Rio de Janeiro, RJ 22453-900, Brasil, May 17, 2004. cited by other . Day, The Alembic Workbench Environment for Natural Language Engineering, A National Resource Working in the Public Interest, the MITRE Corporation, 1997. cited by other . Merkl, An Adaptive Information Retrieval System based on Associative Networks Copyright, Australian Computer Society, Inc. First Asia-Pacific Conference on Conceptual Modelling, 2004. cited by other . Chen, An Algorithmic Approach to Concept Exploration in a Large knowledge Network (Automatic Thesaurus Consultation): Journal of the American Society for Information Science. 46(5):348-369, 1995. cited by other . Nigram, Analyzing the Effectiveness and Applicability of Cotraining School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213. cited by other . Riloff, An Empirical Approach to Conceptual Case Frame Acquisition, The Sixth Workshop on Very Large Corpora, 1998. cited by other . Riloff, An Empirical Study of Automated Dictionary Construction for Information Extraction in Three Domains, Al Journal, El Sevier Publishers, 1996. cited by other . Riloff, An Introduction to the Sundance and AutoSlog Systems, Nov. 8, 2004. cited by other . Crestani, Application of Spreading Activation Techniques in Information Retrieval, Dipartimento di Elettronica e Informatica, Universit'a di Padova, Padova, Italy.sa.tex; Dec. 19, 1995; 14:37;No. v. cited by other . Huangacm, Applying Associative Retrieval Techniques to Alleviate the Sparsity Problem in Collaborative Filtering Transactions on Information Systems, vol. 22, No. 1, Jan. 2004, pp. 116-142. cited by other . Stevenson, A Semantic Approach to IE Pattern Induction, Proceedings of the 43rd Annual Meeting of the ACL, pp. 379-386, Jun. 2005. Association for Computational Linguistic. cited by other . Lewis, A Sequential Algorthim for Training Text Classifiers, SIGIR 94, pp. 3 - 12, 1994. cited by other . Yangarber, Automatic Acquisition of Domain Knowledge for Information Extraction, Proceedings of the 18th International Conference on Computational Linguistic, 2000. cited by other . Rhiloff, Automatically Constructing a Dictionary for Information Extraction Tasks, Proceedings of the Eleventh National Conference on Artificial Intelligence, 1993, AAAI Press / MIT Press, pp. 811-816. cited by other . Rhiloff, Automatically Generating Extraction Patterns from Untagged Text, Proceedings of the Thirteenth National Conference on Artificial Intelligence, 1996, pp. 1044-1049. cited by other . Reeve, BioChain: Lexical Chaining Methods for Biomedical Text Summarization, SAC'06, Apr., 23-27, 2006, Dijon, France. cited by other . Abney, Bootstrapping, AT&T Laboratories--Reasearch 180 Park Avenue, Florham Park, NJ, USA, 2002. cited by other . Lin, Bootstrapped Learning of Semantic Classes from Positive and Negative Examples, Proceedings of the ICML-2003 Workshop on The Continuum from Labeled to Unlabeled Data, Washington DC, 2003. cited by other . Blum, Combining Labeled and Unlabeled Data with Co-Training, Proceedings of the Conference on Computational Learning Theory, 1998. cited by other . Dagan, Committee Based Sample Selection for Probabilistic Classifiers, Journal of Al Research, pp. 355-360, 1999. cited by other . Dagan, Committee Based Sampling for Probabilistic Classifiers. cited by other . Doran, Comparing Lexical Chain-based Summarisation Approaches Using an Extrinsic Evaluation, Eds., GWC 2004, Proceedings, pp. 112-117. Masaryk University, Brno, 2003. cited by other . Aleman-Meza, Context-Aware Semantic Association Ranking, Semantic Web and Databases Workshop Proceedings. Berlin, Sep. 2003. cited by other . Bean, Corpus Based Identification of Non-Anaphoric of Noun Phrases, Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL-99) pp. 373-380. cited by other . Yangarber, Counter-Training in Discovery of Semantic Patterns, Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003). cited by other . Sibler, Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization, Association for Computational Linguistics, 2002. cited by other . Sibler, An Efficient Text Summarizer Using Lexical Chains, 2000. cited by other . McCallum, Employing EM and Pool Based Active Learning for Text Classification. cited by other . Goldman, Enhancing Supervised Learning with Unlabeled Data, Department of Computer Science, Washington University. cited by other . Steedman, Example Selection for Bootstrapping Statistical Parsers, Main Papers, pp. 157-164 Proceedings of HLT-NAACL 2003. cited by other . Yangarber, Extraction Pattern Discovery through Corpus Analysis, Computer Science Department, New York University. cited by other . Kenter, Using Gate as an Annotation Tool, Jan. 28, 2005. cited by other . Tablan, Gate--An application Developer's Guide, Department of Computer Science, University of Sheffield, UK, Jul. 19, 2004. cited by other . Cunningham, Developing Language Processing Components, Jul. 2007. cited by other . Lewis, Heterogeneous Uncertainty Sampling for Supervised Learning, Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann Publishers, San Francisco, CA, pp. 148-156. cited by other . IBM, An Open Industrial Strength for Platform Unstructured Information Analysis and Search, 2005. cited by other . Cohn, Improving Generalization with Active Learning, Machine Learning, 1990. cited by other . Brinker, Incorporating Diversity in Active Learning with Support Vector Machines, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003. cited by other . Yangarber, Information Extraction from Epidemiological Reports, Department of Computer Science, University of Helsinki, Finland, 2004. cited by other . Shi, Intimate Learning: A Novel Approach for Combining Labelled and Unlabelled Data, School of Computing Science, Simon Fraser University, Canada. cited by other . Hachey, Investigating the Effects of Selective Sampling on the Annotation Task, School of Informatics University of Edinburgh Edinburgh, EH8 9LW, UK. cited by other . Riloff, Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, American Association for Artificial Intelligence, 1999. cited by other . Blum, Learning From Lablelled and UnLabelled Data Using Graphics Mincuts, Computer Science Department, Camegie Mellon University, 2001. cited by other . Andrew McCallum, Queue--Information Extraction, University of Massachusetts, Amherst, USA, 2006. cited by other . Andrew McCallum, Queue--Information Extraction, ACM Queue, pp. 48-47, University of Massachusetts, Amherst, USA, Nov. 2005. cited by other . Karanikas, Knowledge Discovery in text and Test Mining Software, Centre for Research in Information Management, Department of Computation, MIST, Manchester, UK. cited by other . Soderland, Learning Information Extraction Rules for Semi-Strucutred and Free Text, Machine Learning, 1997, pp. 1-44, Kluwer Academic Publishers, Boston, USA. cited by other . Liere, Active Learning with Committees for Text Categorization, American Association for Artifical Intelleigence, 1997, Dept. of Computer Science, Oregon State University, USA. cited by other . Day, Mixed-Initiative Development of Lanuage Processing Systems, 1997 Advanced Information Systems Center, the MITRE Corporation, Bedford MA. cited by other . Bennett, Learning to Tag Multilingual Texts Through Observation, 1998, pp. 109-116, SRA International, Fairfax, Virginia, USA. cited by other . Lewis, Heterogensous Uncertainty Sampling for . . . Machine Learning: Proceedings of the Eleventh Int'l Conf., pp. 148-156, Morgan Kaufman Publishers, San Franciso, CA. cited by other . Freund, Selective Sampling Using the Query by Committee Algorithm, Machine Learning: 28, 133-168, 1997, Kluwer Academic Publishers, The Netherlands. cited by other . Cohn, Improving Generalization with Active Learning, Machine Leraning 15, pp. 201-221, 1992. cited by other. |
|
|