tuning drupal's search index with hook_nodeapi and $node->build_mode

I've been working on getting our search index to exclude certain fields from custom content types so that hidden information isn't available by searching for it.

Modules can affect what content a node displays through their hook_nodeapi function by modifying the $node object's 'content' parameter.

For example, to add a simple phrase near the bottom of every node when it is viewed:

function my_module_nodeapi(&$node,$op) {
  if ($op =='view') {
    $node->content['extra_info'] = array(
      '#value' => 'The moon is made of cheese.',
      '#weight' => 100,
    );
  }
}

Normally, Drupal goes about indexing content by loading each node and removing all punctuation. All this nonsense about the moon and cheese is going to skew my search results when people are looking for either subject.

To keep the phrase out of search results, use the $node object's build_mode parameter, which is set to one of these constants:

NODE_BUILD_NORMAL
NODE_BUILD_PREVIEW
NODE_BUILD_SEARCH_INDEX
NODE_BUILD_SEARCH_RESULT
NODE_BUILD_RSS

function my_module_nodeapi(&$node,$op) {
  if ($op =='view' && $node->build_mode != NODE_BUILD_SEARCH_INDEX) {
    $node->content['extra_info'] = array(
      '#value' => 'The moon is made of cheese.',
      '#weight' => 100,
    );
  }
}

Now my phrase will display on every node, except when it is being indexed, so if you search for the phrase, you won't find every node.

You can test how your search indexing is working with a snippet that calls _node_index_node to index nodes one at a time.

$nid = 123456; // or whatever nid you want to reindex
$node = node_load($nid, NULL, TRUE);
_node_index_node($node);

You can set nodes to be re-indexed en masse by hacking your database's search_dataset table and setting the 'reindex' field to 1 for the nodes you want to update.

mysql> update prefix_search_dataset as s join prefix_node as n on s.sid=n.nid set s.reindex=1 where n.type='my_type';

When you're ready to re-index your site, the development-status-but-perfectly-fine Reindex module uses Drupal 6's Batch API to do it all at once. My caveat is that you should only really try to do a few thousand at a time, so hack that module too and add a limit to the database query.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <h3> <h4> <h5> <h6> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote>
  • Lines and paragraphs break automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Use [toc list: ol; title: Table of Contents; minlevel: 2; maxlevel: 3; attachments: yes;] to insert a mediawiki style collapsible table of contents. All the arguments are optional.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
                       _____               _ 
___ ____ _ __ |___ / _ __ | |
/ __| |_ / | '_ \ |_ \ | '_ \ _ | |
| (__ / / | | | | ___) | | | | | | |_| |
\___| /___| |_| |_| |____/ |_| |_| \___/
Enter the code depicted in ASCII art style.