READ THIS FIRST: To bid for this project, you MUST have strong Drupal and SOLR skills and experience integrating these together in non-standard environments. We must get a strong sense of your skills in your initial communication and have some confidence that you have thoroughly read and understood our [url removed, login to view] is not On-the-Job-Training. Generic messages such as "We have reviewed your requirements and can help you. Please award us the project." will not be answered. Further, we will flag your company in our database and will automatically disqualify you from future bidding opportunities.
We have a few different Drupal implementations either under development or in the planning stages. We need some help in architecting appropriate solutions and solving a few current issues. We have a variety of questions around the following subject areas (in decreasing order of precedence.)
- Drupal - SOLR Integration
- Drupal - DB2 Integration
- Drupal module selection
- Drupal implementation (General)
For your reference, we have listed our scenarios as well as specific questions which are outstanding. We need for you to provide detailed responses to our queries and help us solve these issues. If you are unable to solve every issue, please tell us the issues you can solve and provide a quote for those issues.
If you can satisfactorily answer these questions (paid of course), we may also want to work with you on future Drupal/SOLR implementation and integration issues for us. With the exception of DB2, we will be doing a lot with these other technologies (i.e. Drupal, SOLR) in the future.
NOTE: Our choice of Drupal or other tools in these scenarios is not up for review. If you have a better technology option, we would love to hear about it, but these sites will all have to move forward with the technologies listed.
SCENARIO ALPHA - DRUPAL <-> SOLR <-> DB2 INTEGRATION
We are preparing to implement several Drupal sites in an organization. In order to maintain consistency, the organization will use Drupal for all web sites. One of the planned Drupal sites appears to be problematic.
This site will be based around a LAMP configuration. The actual Drupal database will be a MySQL database.
This site will not implement the typical functionality included in a normal Drupal site. Rather, the site will serve as a front-end for approximately 100 "reports" whose data will come from a DB2 version 9.5 database. Approximately half (50) of these reports will be public-facing. The other half of the reports will be secured. The client does NOT have a separate reporting engine. All of the reports will be selected, executed, and displayed within Drupal.
Most of these reports will include a query filter of 1-5 fields (with the average being 3 fields). Example query filter fields might be selection of a department, date range, etc. Query filter fields may be dependent on each other. The choice of query filters will vary from report to report with some overlap.
Query filter forms will need to be developed in Drupal (perhaps using the Form API) and query parameters information transmitted to and executed against the DB2 database and the results displayed in Drupal.
Most of the result sets returned will contain table data and will be displayed as tables within a Drupal page.
For sending query requests to DB2 from within Drupal, someone had suggested using an XML gateway.
The client's staff must be able to periodically add new reports. While this staff has some development background, their skill is limited. We are looking for the simplest/most efficient mechanism to allow them to 1) add a menu item for the report, 2) add query filters, 3) execute the query, and display the report on screen. To the maximum degree possible, we would like to limit the amount of development required by the client's staff.
1.1. What is the best way to develop report filters in Drupal--perhaps even allowing report filter elements (e.g. Date Range - as in fiscal year) to be re-used across reports?
1.2 What is the best way to actually send query requests to DB2 and have them processed on the DB2 Server?
1.3 What is the best way to return the query results from the remote DB2 server and to process them on the local machine.
1.4 How could data from the DB2 database be indexed within Apache SOLR and then searched within Drupal?
1.5 Some of the DB2 database tables link to external documents stored on the file system. How could these documents be indexed and searched within Drupal?
1.6 What is the best way to add new reports to Drupal--while minimizing the impact on the client's staff?
1.7 If we want to index all of the documents in a specific file directory (e.g. TIFF images of scanned documents), can we do this directly with SOLR, or do we need Nutch? (I realize TIFFS are quite legacy but in this instance conversion to PDF is not an option.)
1.8 Regarding question 1.7, is there an easy way to convert TIFF files into searchable PDFs for use with SOLR?
SCENARIO BETA - DRUPAL <-> SOLR <-> MULTIPLE LEGACY SITES
We are implementing a Drupal site in organization which has many departments. Each of the departments currently have their own legacy sites. The organization plans to utilize Apache SOLR as its primary search tool. The organization wants to be able to index both its Drupal site and its legacy sites and have the content from all sites appear in the search results within Drupal. However, the organization is running into some issues.
They have made some attempts to combine the [url removed, login to view] file that comes with Apache SOLR Search Integration as well as Nutch. While it appears the documents are indexing, they do not appear within Drupal.
We've read about issues with this approach that even once the schemas are combined, the lack of a true numeric ID from the Nutch documents still prevents Nutch documents from appearing in search results in Drupal.
2.1 Do we need to consider the possibility that the lack of a true numeric ID from the Nutch documents is preventing Nutch documents from appearing in Drupal? If so, h How do we get around this problem?
2.2 Another approach I've seen is to use a nightly build of Nutch 1.1 which allows customization of the schema that Nutch uses to push documents into Solr. Is this a better solution? What is the best way to setup this schema so that the search results show up in Drupal and the fields are mapped to the most appropriate places? Where is there a good listing of the available fields from Nutch and what they contain? Does the non numeric ID problem mentioned above also apply here?
2.3 How would this organization be able to combine content from both legacy and Drupal sites and have that information all appear in the search results?