A search system is as useful as the data you decide to index into it. With that in mind, it is important to have a reliable, flexible and configurable connection to your various data sources. It is also desirable to have a unified configuration and management interface that is simple to use.
Ravn Pipeline is a connector framework that handles the whole Extract, Transform and Load (ETL) process. It provides a single interface where the user can configure and manage multiple connections to a disparate set of data sources. Documents are grabbed by the connector and are then put in a Pipeline, which consists of several stages. One of the most important stages is the filtering stage, which extracts all the content and meta-data from the original document. A set of standard or custom made stages can be applied to modify, refine, normalise and extend the extracted text and meta-data.
Normally, the final stage will be an indexer that indexes into a chosen search engine, but it is also possible to have the final stage write the items into a database, write it to disk or post it to any other web service or Ravn Pipeline instance.
The User Interface of Ravn Pipeline is an easy to use web interface. It allows full control of all the pipelines, their stages and their scheduling. Pipelines can be easily duplicated and adapted. It also gives interesting management information about any issues that arose during the ingestion process, which makes troubleshooting a lot easier.
All this makes Ravn Pipeline a very powerful, flexible, scalable and user friendly tool for different use cases, from data normalisation projects to Enterprise search related projects.