Whoosh: full-text search with Python

To add an efficient search function to the product I work on, I was looking for a good indexer. Elastic Search, a Java indexer that is managed through a REST api, looks good but it requires to set-up a dedicated server: it’s not a library but a full software. Another option was Xapian, looks efficient, but not very well documented.

Then I discovered Whoosh, a Python library which offers indexing and search features. The documentation and the API makes it really easy to use. The performance are probably worst than the Elastic Search or Xapian but it should be enough for a lot of projects. The library provides a lot of search strategies and functionalities (stemming, faceting, highlighting…). In conclusion, if you have a Python project that requires full-text search, you should definitely have a look at it.

To illustrate this article here is a little snippet I wrote that index a list of blog posts located in MongoDB database.

import os

from whoosh.fields import Schema, ID, KEYWORD, TEXT
from whoosh.index import create_in
from whoosh.query import Term

from pymongo import Connection
from bson.objectid import ObjectId

# Set index, we index title and content as texts and tags as keywords.
# We store inside index only titles and ids.
schema = Schema(title=TEXT(stored=True), content=TEXT,
                nid=ID(stored=True), tags=KEYWORD)

# Create index dir if it does not exists.
if not os.path.exists("index"):
    os.mkdir("index")

# Initialize index
index = create_in("index", schema)

# Initiate db connection
connection = Connection('localhost', 27017)
db = connection["cozy-home"]
posts = db.posts

# Fill index with posts from DB
writer = index.writer()
for post in posts.find():
    writer.update_document(title=post["title"],
                           content=post["content"],
                           nid=unicode(post["_id"]),
                           tags=post["tags"])
writer.commit()

# Search inside index for post containing "test", then it displays
# results.
with index.searcher() as searcher:
    result = searcher.search(Term("content", u"test"))[0]
    post = posts.find_one(ObjectId(result["nid"]))
    print result["title"]
    print post["content"]

Newebe version 0.5.0 released

Newebe finally reached the version 0.5.0! This one is a little bit special because now Newebe has the main features of the distrbuted social network I described two years ago. As you understand, this is a great satisfaction for all of people which helped to build Newebe! Sharing stuff without worrying about privacy issues is a real pleasure and we are glad to have made it possible.
But this should not be limited to a small bunch of users. So for the next release, we will focus more on adoption by improving installation process and adding popular features like file sharing or integration with other social networks. If you have any suggestions/requests feel free to write it in the comments of this post.

Now let’s speak about the new features! Here is the list of what comes with this release:

  • All connections (with browsers and between contacts) are based on HTTPS.
  • Notes and pictures can be attached to microposts.
  • Data from posted microposts can be saved in your Newebe notes.
  • Theming: you can put your own CSS and change the way your Newebe looks like.
  • Easy installation script for Debian-like distributions.

NB: For newcomers, if you want to see Newebe in action you can try our demo (password: newebe) or install it in a way we recommend. If you need any help, refer to the installation guide or ask for assistance on our mailing-list.