pkimber.net
0.0.2
  • Amazon
  • Android
  • Apple
  • Apache
  • C++
  • C#
  • CSS
  • deploy
  • Django
  • Eclipse
  • git
  • Google
  • HTML
  • HTTP
  • Java
  • Javascript
  • jython
  • Linux
  • Liquibase
  • Lucene
    • Links
      • Issues
      • Alternatives
      • Articles
      • Competitors
      • Did you mean
      • Faceted
      • History
      • Index Accessor
      • Monitor
      • StopWords
      • Snowball
      • Text Extractor
        • html
        • Microsoft
        • OpenOffice
        • pdf
      • Projects
      • Sample
      • Upgrade
      • Word List
    • Snippets
    • Deprecated
    • Highlighter
    • .NET
    • Lucene in Action
    • Luke
    • Query Language
    • Setup
    • Snowball Analyzer
  • Markup
  • Memcached
  • Mercurial
  • Microsoft
  • Mozilla
  • MySQL
  • Nginx
  • OpenOffice
  • PHP
  • PostgreSQL
  • Project
  • Python
  • R programming language
  • RabbitMQ
  • Raspberry Pi
  • Redis
  • Redmine
  • Ruby
  • Social
  • Solr
  • SQLite
  • subversion
  • TaskWarrior
  • Testing
  • Vim
  • Virtualbox
  • CV - Patrick Kimber
  • Info
  • CV - Open Source Contributions
pkimber.net
  • Lucene
  • Links
  • View page source

Links

  • http://lucene.apache.org/

  • http://lucenebook.com/

  • .NET_

  • Search Engine Watch

Issues

  • http://issues.apache.org/jira/browse/LUCENE

Alternatives

Also see Competitors (below)…

  • MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java.

  • http://sphinxsearch.com/

    • Sphinx search introduction

    • http://xapian.org/

Articles

  • Delve inside the Lucene indexing mechanism including Improving the indexing performance.

Competitors

Also see Alternatives (above)…

  • IBM OmniFind Yahoo! Edition

  • IBM OmniFind Enterprise Edition

Did you mean

  • org.apache.lucene.search.didyoumean

  • Did You Mean: Lucene?

  • Spelling Checker using Lucene

Faceted

  • Faceted Metadata Search and Browse

History

  • The Lucene Search Engine, Doug Cutting, 16 June, 2000

  • Lucene - Doug Cutting, November 24, 2004

Index Accessor

  • lucene-index-accessor

Monitor

  • LucidGaze for Lucene Monitor and improve your Lucene search performance.

StopWords

  • Stopword List

  • Key to Effective Searches, Dealing with Stopwords

  • To Stopword or Not to Stopword?

Snowball

  • RE: [Snowball-discuss] Stop word lists

Text Extractor

  • Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

  • Aperture, Extract full-text and metadata from many common file formats Getting started (Appears to use Java 1.5)

  • OpenXML4j is a complete Java framework supporting Open Package Convention,

  • Office Open XML (WordProcessingML, SpreadsheetML, PresentationML and shared specs like DrawingML).

html

  • Parsing HTML with java:

    • HTMLEditorKit

    • JTidy

    • HTML Parser

    • HTML Cleaner For Maven instructions: Maven repository notes.

Microsoft

  • wv is a library which allows access to Microsoft Word files.

  • catdoc is program which reads one or more Microsoft word files and outputs text

  • catdoc, xls2csv and catppt

  • Antiword is a free MS Word reader for Linux

  • Using Java to Crack Office 2007

OpenOffice

  • JOOConverter automates conversions between office document formats using OpenOffice.org

  • file2xliff4j is a set of Java classes to convert HTML, Word, Excel, OpenOffice.org Text, PowerPoint, RTF and MIF documents to XLIFF File Format.

pdf

  • http://www.jpedal.org/

Projects

  • The Compass Framework is a first class open source Java framework, enabling the power of Search Engine semantics to your application stack decoratively.

  • Enhydra Snapper - Fulltext Indexing and Search

  • Hibernate Search brings the power of full text search engines to the persistence domain model and Hibernate experience, through transparent configuration (Hibernate Annotations) and a common API. Might be here now… http://www.hibernate.org/410.html

  • Hibernate Annotations includes a package of annotations that allows you to mark any domain model object as indexable and have Hibernate maintain a Lucene index of any instances persisted via Hibernate.

  • Kowari is an Open Source, massively scalable, transaction-safe, purpose-built database for the storage, retrieval and analysis of metadata.

  • DBSight is a highly customisable full-text search platform for any relational database.

  • NetSearch - the Enterprise Search Solution from Ardentia

  • Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP APIs, caching, replication, and a web administration interface.

  • LIUS is an indexing Java framework based on the Jakarta Lucene project. The LIUS framework indexes : MsWord, MsExcel, MsPowerPoint, RTF, PDF, XML, HTML, TXT, OpenOffice suite, ZIP files, MP3, VCard, Latex and JavaBeans.

  • Tika, a generic document parsing framework

Sample

  • sample-lucene-did-you-mean

  • sample-lucene-count-unique-terms

Upgrade

  • Lucene 2.4 in 60 seconds

Word List

  • Kevin’s Word List Page

Previous Next

© Copyright 2023, Patrick Kimber.

Built with Sphinx using a theme provided by Read the Docs.