Support Vector Machine
SVMLightan Implementation of Vapnik’s Support Vector MachineLibsvma Library for Support Vector Machines
C4.5THE “Classic” Decision-Tree Tool, Developed by J. R. Quinlan Tutorial
Yasmetyet Another Small Maxent Toolkit
Conditional Random Field
CRF ++ A Simple, Customizable, And Open Source Implementation of Condom Fields (CRFS) for segmenting / labeling sequential data
Natural language processing
OpenNLPAn organizational center for open source projects related to natural language processingCMU Statistical Language Modeling ToolkitA suite of UNIX software tools to facilitate the construction and testing of statistical language modelsThe Dragon ToolKitA Java-based development package for academic use in information retrieval (IR) and text mining . Include many NLP toolsLingPipeA suite of Java libraries for the linguistic analysis of human language, including track mentions of entities (eg people or proteins); link entity mentions to database entries; uncover relations between entities and actions; classify text passages by language, character encoding, genre, topic, or sentiment; correct spelling with respect to a text collection; cluster documents by implicit topic and discover significant trends over time; and provide part-of-speech tagging and phrase chunking.Natural Language ToolkitOpen source Python modules, linguistic Data and Documentation for research and development in Natural Language Processing and Text Analytics, with Distributions for Windows, Mac OSX and Linux.antelopeadvanced Natural Lange Object-Oriented Processing Environment. includes a series of tools (special C # Stanford Parser)
Chinese Points of ICTCLAS Chinese Academy Stanford Chinese Word Segmentera Java Implementation of a CRF-BASED CHINESE WORD SEGMENTER Words Number
Brill taggerA error-driven transformation-based tagger implemented by Eric BrillStanford POS TaggerA Java implementation of the log-linear part-of-speech taggers descriped by Kristina Toutanova, et.al.MBT: Memory-based TaggerTreeTaggerA decision tree based tagger from the University Of Stuttgart.SVMTool, A Pos Tagger Based ON SVMS QTAG Part of Speech Taggeran Hmm-Based Java Pos Tagger from Birmingham U.
Name entity identification
Stanford Named Entity RecognizerA Java implementation of a Conditional Random Field sequence model, together with well-engineered features for Named Entity RecognitionLingPipe Tools include statistical named-entity recognition, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Java. GPL. By Bob Carpenter, Breck Baldwin and co. YamChaSVM-based NP-chunker, also usable for POS tagging, NER, etc. C / C ++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an End user.)
Porter StemmingA process for removing the commoner morphological and inflexional endings from words in English by Martin PorterSnowballA small string processing language designed for creating stemming algorithms for use in Information Retrieval.
Stanford Parserjava Implementations of Probabilistic Natural Language Parsers, Both Highly Optimized Pcfg and Dependency Parsers, And A Lexicalized PCFG Parser.berkeley Parser
Rouge Rouge configuration under Windows
OpenSSL includes a wide range of encryption algorithms, RSA, DES, MD5, SHA, etc. Win32 installation version
Zliba Massively SPIFFY YET DELICATELY UNOBTRUSIVE COMPRESSION LIBRARY
Apache Logging ServicesCreates and maintains open-source software related to the logging of application behavior and released at no charge to the public, including log4j for Java, log4cxx for C ++, andlog4net for MS .Net framework Note:. Log4cxx official version has a memory leak problem Unicode
ICUA MATURE, WIDELY USED SET OF C / C ++ and Java Libraries Providing Unicode and Globalization Support for Software Applications
Xercesa Validating XML Parser, Including C and Java Edition
AC in C #: Aho-Corasick String Matching in C #
Html Agility Pack, an agile HTML parser that builds a read / write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse “out of the web” HTML files. Majestic-12, an open source High-Performance .NET C # Module That Was Created to Parse Html for Links, Indexing and Other Purposes. Fast, but does not generate a DOM tree
An Annotated List of Resources by Stanford NLP GroupkDNugTs has some software related to KDD, etc.