Docu36064 Documentum XPlore 1.2 Administration and Development Guide

May 21, 2018 | Author: zepolk | Category: Search Engine Indexing, Database Index, Information Science, Data Management, Data


Comments



Description

EMC® Documentum®xPlore Version 1.2 Administration and Development Guide EMC Corporation Corporate Headquarters: Hopkinton, MA 01748–9103 1–508–435–1000 www.EMC.com Copyright ©2010-2013 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Adobe and Adobe PDF Library are trademarks or registered trademarks of Adobe Systems Inc. in the U.S. and other countries. All other trademarks used herein are the property of their respective owners. Documentation Feedback Your opinion matters. We want to hear from you regarding our product documentation. If you have feedback about how we can make our documentation better or easier to use, please send us your feedback directly at [email protected]. Table of Contents Chapter 1 Introduction to xPlore..................................................................................... 11 Features .......................................................................................................... 11 Limitations .......................................................................................................12 xPlore compared to FAST .................................................................................15 Architectural overview.......................................................................................17 Physical architecture ........................................................................................18 xPlore disk areas ..........................................................................................18 xPlore instances ...........................................................................................19 xDB libraries and Lucene index......................................................................20 Indexes ........................................................................................................21 Logical architecture ..........................................................................................22 Documentum domains and categories............................................................25 Mapping of domains to xDB ...........................................................................26 How Content Server documents are indexed......................................................27 How Content Server documents are queried ......................................................29 Chapter 2 Managing the System .....................................................................................31 Opening xPlore administrator ............................................................................31 Starting and stopping the system.......................................................................32 Viewing and configuring global operations (all instances) ....................................32 Managing instances..........................................................................................33 Configuring an instance.................................................................................33 Using the watchdog service ...........................................................................34 Changing the host name and URL .................................................................34 Replacing a failed instance with a spare .........................................................35 Replacing a failed primary instance ................................................................35 Changing a failed instance into a spare ..........................................................37 Configuring system metrics ...............................................................................37 Managing the status database...........................................................................38 Configuring the audit record ..............................................................................38 Troubleshooting system problems .....................................................................39 Debugging queries with the xDB admin tool .......................................................41 Modifying indexserverconfig.xml........................................................................42 Tasks performed outside xPlore administrator ....................................................44 Administration APIs ..........................................................................................46 Open an admin connection ............................................................................46 Call an admin API .........................................................................................46 Configuration APIs ........................................................................................47 EMC Documentum xPlore Version 1.2 Administration and Development Guide 3 Table of Contents Chapter 3 Managing Security..........................................................................................49 Changing search results security.......................................................................49 Manually updating security ...............................................................................50 Configuring the security cache ..........................................................................51 Troubleshooting security ...................................................................................52 Chapter 4 Managing the Index Agent..............................................................................55 Starting the index agent ....................................................................................55 Installing index agent filters (Content Server 6.5 SPX or 6.6)...............................56 Configuring index agent filters ...........................................................................58 Migrating documents ........................................................................................58 Migrating content (reindexing)........................................................................59 Migrating documents by object type ...............................................................59 Migrating a limited set of documents ..............................................................59 Using ftintegrity ................................................................................................60 Indexing documents in normal mode .................................................................64 Resubmitting documents for indexing ................................................................64 Removing entries from the index .......................................................................65 Mapping content to collections ..........................................................................65 Sharing content storage ................................................................................65 Mapping Server storage areas to collections...................................................66 Indexing metadata only.....................................................................................67 Making types non-indexable..............................................................................67 Configuring the index agent after installation ......................................................68 Setting up index agents for ACLs and groups .....................................................68 Documentum attributes that control indexing ......................................................69 Injecting data and supporting joins.....................................................................70 Custom content filters .......................................................................................73 Troubleshooting the index agent........................................................................73 Chapter 5 Document Processing (CPS) ..........................................................................79 About CPS.......................................................................................................79 Adding a remote CPS instance..........................................................................80 Configuring a dedicated CPS ............................................................................81 Administering CPS ...........................................................................................82 Maximum document and text size......................................................................83 Configuring languages and encoding .................................................................84 Indexable formats.............................................................................................85 Lemmatization .................................................................................................85 About lemmatization......................................................................................86 Configuring lemmatization .............................................................................87 Lemmatizing specific types or attributes .........................................................87 Troubleshooting lemmatization.......................................................................88 Saving lemmatization tokens .........................................................................89 4 EMC Documentum xPlore Version 1.2 Administration and Development Guide Table of Contents Handling special characters ..............................................................................90 Configuring stop words .....................................................................................91 Troubleshooting content processing ..................................................................92 Troubleshooting slow ingestion..........................................................................96 Adding dictionaries to CPS................................................................................99 Custom content processing............................................................................. 102 About custom content processing................................................................. 102 Text extraction ............................................................................................ 104 Troubleshooting custom text extraction......................................................... 106 Annotation .................................................................................................. 106 UIMA example ............................................................................................ 110 Custom content processing errors................................................................ 115 Chapter 6 Indexing ....................................................................................................... 117 About indexing ............................................................................................... 117 Configuring text extraction .............................................................................. 117 Defining an index ........................................................................................... 119 Creating custom indexes ................................................................................ 121 Managing indexing in xPlore administrator ....................................................... 122 Troubleshooting indexing ................................................................................ 122 Running the standalone consistency checker ................................................... 125 Indexing APIs................................................................................................. 126 Route a document to a collection ................................................................. 126 Creating a custom routing class................................................................ 127 SimpleCollectionRouting example............................................................. 127 Chapter 7 Index Data: Domains, Categories, and Collections ...................................... 131 Domain and collection menu actions................................................................ 131 Managing domains ......................................................................................... 132 Configuring categories.................................................................................... 133 Managing collections ...................................................................................... 134 About collections......................................................................................... 134 Planning collections for scalability ................................................................ 135 Limitations of subcollections ........................................................................ 136 Adding or deleting a collection ..................................................................... 136 Changing collection properties ..................................................................... 137 Routing documents to a specific collection.................................................... 137 Attaching and detaching a collection ............................................................ 138 Moving a temporary collection...................................................................... 138 Creating a collection storage location ........................................................... 138 Rebuilding collections.................................................................................. 139 Deleting and recreating indexes ................................................................... 140 Querying a collection................................................................................... 140 Troubleshooting data management.................................................................. 140 Chapter 8 Backup and Restore ..................................................................................... 143 About backup................................................................................................. 143 About restore ................................................................................................. 145 EMC Documentum xPlore Version 1.2 Administration and Development Guide 5 Table of Contents Handling data corruption ................................................................................. 146 Detecting data corruption............................................................................. 146 Handling a corrupt domain........................................................................... 146 Repairing a corrupted index ......................................................................... 146 Cleaning and rebuilding the index ................................................................ 147 Too many open files .................................................................................... 148 Snapshot too old......................................................................................... 148 Dead object ................................................................................................ 149 Recovering from a system crash .................................................................. 150 Backup in xPlore administrator ........................................................................ 150 File- or volume-based (snapshot) backup and restore....................................... 151 Offline restore ................................................................................................ 152 Automated backup and restore (CLI) ............................................................... 153 CLI properties and environment ................................................................... 153 Using the CLI.............................................................................................. 154 CLI batch file .............................................................................................. 155 Scripted federation restore........................................................................... 156 Scripted domain restore .............................................................................. 156 Scripted collection restore ........................................................................... 157 Force detach and attach CLIs ...................................................................... 158 Orphaned segments CLIs ............................................................................ 158 Domain mode CLIs ..................................................................................... 159 Collection and domain state CLIs ................................................................. 159 Activate spare instance CLI ......................................................................... 160 Troubleshooting backup and restore ................................................................ 160 Chapter 9 Search .......................................................................................................... 163 About searching ............................................................................................. 163 Query operators.......................................................................................... 164 Administering search ...................................................................................... 164 Configuring query warmup........................................................................... 165 Configuring scoring and freshness ............................................................... 168 Adding a thesaurus ..................................................................................... 169 Configuring query summaries ...................................................................... 174 Configuring query lemmatization .................................................................. 176 Limiting search results................................................................................. 176 Wildcards and fragment search.................................................................... 177 Configuring full-text wildcard (fragment) support ........................................ 178 Wildcard search in metadata .................................................................... 179 Wildcard behavior in xPlore and FAST ...................................................... 179 Configuring fuzzy search ............................................................................. 180 Configuring Documentum search .................................................................... 181 Search engine configuration (dm_ftengine_config) ........................................ 181 Making types and attributes searchable ........................................................ 182 Folder descend queries ............................................................................... 183 DQL, DFC, and DFS queries ....................................................................... 183 Changing VQL queries to XQuery expressions ............................................. 184 Tracing Documentum queries ...................................................................... 185 Supporting subscriptions to queries ................................................................. 185 About query subscriptions ........................................................................... 186 Installing the query subscription DAR ........................................................... 188 Testing query subscriptions.......................................................................... 189 Subscription reports .................................................................................... 190 Subscription logging.................................................................................... 191 6 EMC Documentum xPlore Version 1.2 Administration and Development Guide .................................... 241 Indexing reports .............................................................. 228 Facet datatypes ................................ 225 About Facets............................................................................................ 232 Tuning facets ..................................................................................................................... 268 EMC Documentum xPlore Version 1....................................................................................................... 208 Debugging queries with the xDB admin tool ................. 226 Creating a DFC facet definition....................................................................................................................................................................................... 247 Chapter 12 Logging ........................................................................................................................................ 209 Chapter 10 Facets................................................................................................... 235 Troubleshooting facets.............................................. 256 Trace log format.............................. 244 Sample edited report ......................... 256 Tracing .............................................................................................................................. 253 Adding custom classes .. 260 Handling a NoClassDef exception ..................................... 253 Customization points ................................ 236 Chapter 11 Using Reports ............................................................................................................................................................... 249 CPS logging.................................. 263 Improving search performance with time-based collections .................................................................. 199 Auditing queries ............................................................................. 258 Reading trace output ................................................................. 235 Logging facets ............................................................................................................................................................................................................... 259 Enabling logging in a client application............................................................................. 253 Setting up the xPlore SDK ............................................................................................................. 204 Troubleshooting slow queries............................................................ 261 Chapter 14 Performance and Disk Space ............................................................................................................................................................................ 252 Chapter 13 Setting up a Customization Environment ................. 265 Disk space and storage .......................................... Table of Contents Troubleshooting search ........................................................................................................................................... 239 Types of reports ................................................ 242 Search reports ......................................... 242 Editing a report. 239 Document processing (CPS) reports................................................................................ 225 Configuring facets in xPlore .................................................................................... 263 Planning for performance..................................... 243 Report syntax.................................................................................................................................................................................................................................................................. 239 About reports ............. 231 Creating a DFS facet definition.............................................................................................................................................................................................................................................................................................................. 246 Troubleshooting reports ..................... 205 Search APIs and customization .............. 249 Configuring logging ....................................... 266 System sizing for performance .............................2 Administration and Development Guide 7 ........................ ..............................Table of Contents Measuring performance .... 272 Search performance .................................. 275 Appendix A Index Agent.................................................................... 279 Appendix B Documentum DTDs .................................... 269 Tuning the system ..................................................................... 297 8 EMC Documentum xPlore Version 1........... 272 Indexing performance ........................................................ Indexing.................... 291 Appendix C XQuery and VQL Reference..... and Search Parameters ....... CPS.......................................................................................................................................................................... 270 Documentum index agent performance..........................2 Administration and Development Guide .................................................................... Updated index agent settings related to indexing queue size. Revision history The following changes have been made to this document.2 release January 2013 Updated the wildcard section: wildcards are not implicit with Contains operator when FAST compatibility mode is enabled.2 Administration and Development Guide 9 . These tasks include system monitoring. Additional documentation This guide provides overview. which describes the initial configuration of the xPlore environment. configuration. and known issues. Preface This guide describes administration. and development information. and Documentum integration. For information on installation. and customization of Documentum xPlore. When Documentum functionality is discussed. this guide assumes familiarity with EMC Documentum Content Server administration. Intended Audience This guide contains information for xPlore administrators who configure xPlore and Java developers who customize xPlore: • Configuration is defined for support purposes as changing an XML file or an administration setting in the UI. supported environments. administration. The xPlore SDK is a separate download that supports customization. Updated arguments for the ACL replication job. index configuration and management. query configuration and management. see: • Documentum xPlore Release Notes EMC Documentum xPlore Version 1. • Customization is defined for support purposes as using xPlore APIs to customize indexing and search. You must be familiar with the installation guide. auditing and security. Revision Date Description November 2011 Initial publication for version 1. 2 Administration and Development Guide . Preface • Documentum xPlore Installation Guide • Documentum xPlore High Availability and Disaster Recovery Guide For additional information on Content Server installation and Documentum search client applications. see: • Documentum Content Server Installation Guide • Documentum System Search Development Guide 10 EMC Documentum xPlore Version 1. Transactional updates and purges: xPlore supports transactional updates and purges of indexes as well as transactional commit notification to the caller. EMC Documentum xPlore Version 1. reports. Features Indexing features Collection topography: xPlore supports creating collections online. The xPlore architecture is designed with the following principles: • Uses standards as much as possible. scalable. high-performance. performance tuning.2 Administration and Development Guide 11 . full-text index server that can be configured for high availability and disaster recovery. • Supports virtualization. Chapter 1 Introduction to xPlore This chapter contains the following topics: • Features • Limitations • xPlore compared to FAST • Architectural overview • Physical architecture • Logical architecture • How Content Server documents are indexed • How Content Server documents are queried Documentum xPlore is a multi-instance. and collections can span multiple file systems. diagnostics and troubleshooting. like Lucene • Supports enterprise readiness: High availability. backup and restore. analytics. administration GUI. and configuration and customization points. Multithreaded insertion into indexes: xPlore ingestion through multiple threads supports vertical scaling on the same host. like XQuery • Uses open source tools and libraries. with accompanying lower total cost of ownership. Thesaurus search to expand query terms. Security evaluation: When a user performs a search. This feature is turned on by default and can be configured or turned off. resulting in faster query results. Temporary high query load: For high query load. Configurable stop words and special characters. Reports on ingestion metrics and errors. rebuilding. Faceted search: Facets in xPlore are computed over the entire result set or over a configurable number of results. Full-text queries: To query metadata. then move the collection to another instance for better search performance.Dynamic allocation and deallocation of capacity: For periods of high ingestion.xml. Collections management: Creating. configuring. and user activity. Growing ingestion or query load: If your ingestion or query load increases due to growing business. Search features Case sensitivity: xPlore queries are lower-cased (rendered case-insensitive). Limitations ACLs and aspects are not searchable by default ACLs and aspects are not searchable by default. Set full-text-search to true in the sub-path definition for acl_name amd r_aspect_name and then reindex your content. Extensive testing and validation of search on supported languages. Command-line interface for automating backup and restore. You can then decommission the CPS instance. set up a specific index on the metadata.2 Administration and Development Guide . Native XQuery syntax: The xPlore full-text engine supports XQuery syntax. deleting. Security can be evaluated in the xPlore full-text engine before results are returned to Content Server. permissions are evaluated for each result. you can add instances as needed. you can add a CPS instance and new collection. 12 EMC Documentum xPlore Version 1. to protect security. Boost specific metadata in search results. Fuzzy search finds misspelled words or letter reversals. search performance and errors. Administration features Multiple instance configuration and management. binding. You can reverse the default by editing indexserverconfig. like a legal investigation. Add content to this collection. routing. Extensible indexing pipeline using the open-source UIMA framework. add an xPlore instance for the search service and bind collections to it in read-only mode. querying. The CPS log indicates that the connection has been reset: 2009-02-10 12:19:55. Select an instance and click Configuration. FATAL [DAEMON-LP_RLP-(3440)] Not enough memory to process linguistic requests. see Oracle Outside In 8. Lemmatization • xPlore supports lemmatization.7 documentation. Attempt to read data at address 1 at (connection-handler-2) . use the xPlore administrator report Document Processing Error Detail and choose File format unsupported. 2009-02-10 12:19:55. When one request in a batch fails.. but you cannot configure the parts of speech that are lemmatized.2 Administration and Development Guide 13 . CPS daemon must restart after fatal ingestion error CPS daemon restarts when ingestion generates a fatal error.net.log): ERROR [Daemon-Core-(3400)] Exception happened. and PDF input fields. java. Error message: bad allocation Workaround using xPlore administrator. (All documents in a batch fail to be indexed if one document fails.) • Max text threshold • Thread pool size Ingestion batch fails if a format is not supported CPS cannot process certain formats as documents or email attachments.SocketException: Connection reset Batch failure Indexing requests are processed in batches. For a full list of supported formats. the CPS log file reports one of the following errors (cps_daemon.512 ERROR [MANAGER-CPSTransporter-(CPSWorkerThread-6)] Failed to receive the response XML.3. because the name still exists in the domain.425 INFO [DAEMON-CORE-(-1343566944)] Daemon is shutdown forcefully. EMC Documentum xPlore Version 1. ACCESS_VIOLATION. Change the following to smaller values: • Batch size: Decrease batch size to decrease the number of documents in a failed batch. To see format processing errors. the entire batch fails..Ingestion of many large files can cause failures When CPS processes two or more large files at the same time. Collection names cannot be duplicated The name of an adopted collection cannot be reused. materialize emails so they are fully searchable.2 Administration and Development Guide . not XQuery. DQL filters unmaterialized LWSOs. Phrase searches The content of a phrase search must be an exact match. If a query returns a result that is a lightweight sysobject (LWSO). 14 EMC Documentum xPlore Version 1. Some Documentum client applications. With the period. For example. Stop words are case sensitive Stop word lists are case sensitive. Workaround: Enable alternative lemmatization if you have disabled it (see Configuring lemmatization. A common phrase like because of. Some lightweight sysobjects (LWSOs) are not fully indexed and searchable In a Documentum repository. A search for a word in the phrase like because fails. but only one language is selected for indexing tokens. Add all case forms for words that are not indexed. a good many. A search for the verb dance does not find the document when the word is at the end of the sentence. DFTXML attributes are not indexed xPlore does not index attribute values on XML elements the DFTXML representation of the input document. so emails in SourceOne are sometimes not fully searchable. Without the period. a query on the inherited attributes fails. Only one language is analyzed for indexing Documents with multiple languages are indexed. lightweight sysobjects such as emails inherit many attributes from the parent object. page 87). you cannot find all documents for which the value of the dmfttype attribute of the element acl_name is dmstring. for example: AND and And. For example. The likelihood of errors in Part-Of-Speech (POS) tagging increases with sentence length. dancing is identified as a verb with the lemma dance." Search fails for parts of common phrases. No result is returned. If the LWSOs are not materialized. such as Webtop and DCO.• The part of speech for a word can be misidentified when there is not enough context. the query does not find the object. SourceOne does not. For example. a search on the phrase "feels happy" does not find content with the phrase "felt happy. Workaround: Enable alternate lemmatization. Words in other languages are sometimes not be indexed properly. it is identified as a noun with the lemma dancing. a phrase Mary likes swimming and dancing is lemmatized differently depending on whether there is a period at the end. or status quo is tokenized as a phrase and not as individual words. • Punctuation at the end of the sentence is included in the lemmatization of the last word. doc. all child elements and attributes enclosed within an element.. Special characters lists are limited to ANSI characters. For example. page 99.2 Administration and Development Guide 15 . Chinese Space in query causes incorrect tokenization A space within a Chinese term is treated in DQL as white space. Administration differences xPlore has an administration console. the names of persons and places cannot be searched. The underscore in the following document name is treated as white space. A trailing wildcard (ends with. A query for run* does not return this document because the period is not treated as white space.object_name FROM dm_document SEARCH TOPIC ’tennis <IN>Zone 2<IN>Document’ xPlore does not support zone searching of attributes.9. These features were not configurable for FAST. See Changing VQL queries to XQuery expressions. page 184.. Additionally. Skipped collections are not reported in query results When a collection is unavailable or corrupted. IN zone. FAST does not. they must be added to the Chinese dictionary in xPlore. Many features in xPlore are configurable through xPlore administrator. A search for the string fails. it is skipped in a query. To be found. in advanced search) does return this document. although individual elements and their attributes can be indexed and searched. See Adding dictionaries to CPS. Dictionary must be customized for Chinese name and place search For Chinese documents.9. EMC Documentum xPlore Version 1.1. Results can be different when the collection is restored or brought back online. A query for run_ finds the document. the following information describes differences between the two indexing servers. in DQL) searches defined regions of an XML document. xPlore compared to FAST If you are migrating from FAST to xPlore. for example. administrative tasks are exposed through Java APIs. Special characters limitations Special characters are treated as white space. A search for 中国近代 fails. xPlore does not notify the user or administrator that the collection was skipped.1. the document name is tokenized as run and 9.Zone search not supported Zone searching (SEARCH TOPIC. the term 中国 近代 is treated as 中国 AND 近代.doc. In the example run_9. Zone searching for DFC has the following syntax: SELECT r_object_id. After a VM crash. Storage technology: xPlore supports SAN and NAS.Ports required: During xPlore instance configuration. You can configure lemmatization for specific Documentum attribute values. High availability: xPlore automatically restarts content processing after a CPS crash. including full and incremental. 64-bit address space: 64-bit systems are supported in xPlore but not in FAST. in addition to the index. xPlore requires less temporary disk space than FAST. Search differences One-box search: Searches from the Webtop client default to ANDed query terms in xPlore. Excluding from index: xPlore allows you to configure non-indexed metadata to save disk space and improve ingestion and search performance. and collections can span multiple file systems. For information on high availability. Lemmatization: FAST supports configuration for which parts of speech are lemmatized. Query a specific collection: Targeted queries are supported in xPlore but not FAST. a full-text search on "256" returns no hits in xPlore. Transactional updates and purges: xPlore supports transactional updates and purges as well as transactional commit notification to the caller. For example. Results ranking: FAST and xPlore use different ranking algorithms. the xPlore watchdog sends an email notification. the number of hits differs between FAST and xPlore queries on the non-indexed content. FAST supports only active/active. High availability: xPlore supports N+1. xPlore requires twice the index space. During index agent configuration. FAST used 4000 ports. and active/active shared data configurations. the installer prompts for the HTTP port for the JBoss instance (base port). xPlore supports spare indexing instances that are activated when another instance fails. In xPlore.5 times the space. FAST requires 3. 16 EMC Documentum xPlore Version 1. FAST does not support these features. active/passive with clusters. The installer validates that the next 100 consecutive ports are available. if xPlore does not index docbase_id.2 Administration and Development Guide . FAST supports only offline (cold) backup. lemmatization is enabled or disabled. the installer prompts for the HTTP port for index agent Jboss instance and validates that the next 20 consecutive ports are available. Collection topography: xPlore supports creating collections online. Virtualization: xPlore runs in VMware environments. FAST does not. The search returns all indexed documents for repository whose ID is 256. FAST supports SAN only. FAST does not. Indexing differences Back up and restore: xPlore supports warm backups. Disaster recovery: xPlore supports online backup. for merges and optimizations. Folder descend: Queries are optimized in xPlore but not in FAST. see Documentum xPlore High Availability and Disaster Recovery Guide. With this configuration. Wildcards and word fragments: FAST matches fragments of words in full-text searches. the Documentum index agent prepares an XML representation of each document. you can match fragments by reverting to FAST-type fragment search for both one-box and attribute search. resulting in many hits that the user is not able to view. Each document source is configured as a domain in xPlore. For VQL migration. You can set up domains using xPlore administrator. since the period is considered a context character for part of speech identification. FAST returns results to the Content Server. External content source clients like Webtop or CenterStage. A local or remote instance of the content processing service (CPS) fetches the content. CPS then extracts indexable content from the request stream and parses it into tokens. see Changing VQL queries to XQuery expressions. However. For Documentum environments. performance is slower. the Documentum index agent creates a domain for each repository and a default collection within that domain.2 Administration and Development Guide 17 . Special characters: Special character lists are configurable. In a Documentum environment. Documents are provided in an XML representation to xPlore for indexing through the indexing APIs. When an xPlore instance receives an indexing request. but xPlore supports many more hits. can send indexing requests to xPlore. For example. The document is assigned to a category. or custom Documentum DFC clients. Native XQuery syntax: Supported by xPlore. The default in xPlore differs from FAST when terms such as email addresses or contractions are tokenized. The tokens are used for building a full-text index. it uses the document category to determine what is tokenized and saved to the index. all child elements and attributes enclosed within an element. EMC Documentum xPlore Version 1. xPlore matches whole words only. Search topic: Zone searching (search topic) searches defined regions of an XML document. for example. in xPlore. an email address is split up into separate tokens with the period and @ as boundaries.Security evaluation: Security is evaluated by default in the xPlore full-text engine before results are returned to Content Server. FAST supports VQL zone searching. in FAST. xPlore does not support zone searching in this fashion. only the @ serves as the boundary. and each category corresponds to one or more collections as defined in xPlore. resulting in faster query results. Facets: Facets are limited to 350 hits in FAST. To support faceted search in Documentum repositories. xPlore instances are web application instances that reside on application servers. Architectural overview xPlore provides query and indexing services that can be integrated into external content sources such as the Documentum content management system. In Webtop and CenterStage simple search with xPlore. Zone searches do not span entities nor do they return the contents of the zone. but individual elements and their attributes can be indexed. you can define a special type of an index called an implicit composite index. xPlore does not index XML attribute values. Underprivileged user queries: Optimized in xPlore but not in FAST. However. page 184. XML attributes: Attribute values on XML elements are part of the xPlore binary index. CPS detects the primary language and format of a document. the request is processed on all included collections. Physical architecture The xPlore index service and search service are deployed as a WAR file to a JBoss application server that is included in the xPlore installer. Table 2 Disk areas for xPlore Area Description Use in indexing Use in search xDB data Stores DFTXML. audit. The following table describes how these areas are used during indexing and search.xPlore manages the full-text index. When an instance receives a query request. The index is stored in the storage location that was selected during configuration of xPlore. recording the status of requests and the location of indexed content. xPlore administrator and online help are installed as war files in the same JBoss application server. xPlore provides a web-based administration console. ACLs consumed by disk block for specific elements and and groups. for batch XML files summary xDB redo log Stores transaction Updates to xDB data are Provides snapshot information logged information during some retrievals Lucene index Performs query lookup Index updated through Inverted index lookup and retrieval and facet inserts and merges and facet and security and security information retrieval 18 EMC Documentum xPlore Version 1. then the assembled query results are returned. and index agent content staging. Next free space is Random access retrieval metrics. An external Apache Lucene full-text engine is embedded into the EMC XML database (xDB). the Lucene index. xDB tracks indexing and updates requests. a temp area. Indexes are still searchable during updates.2 Administration and Development Guide . xPlore disk areas xPlore instances xDB libraries and Lucene index Indexes xPlore disk areas xPlore creates disk areas for xDB data and redo log. xPlore configuration and utilities. xDB provides transactional updates to the Lucene index. The first instance that you install is the primary instance. Click a collection to go to the Data Management view of the collection. and state. select a different port for the new instance. and data management services) • Spare: A spare instance can be manually activated to take over for a disabled or stopped instance. page 35. use the xPlore configurator script. and number of classes loaded. status. run the xPlore configurator script. EMC Documentum xPlore Version 1. (CPS) Exports to the index service 3. (CPS) Intermediate Non-committed data is None processing stored to the log 2. An instance can have one or more of the following features enabled: • Content processing service (CPS) • Indexing service • Search service • xPlore Administrator (includes analytics. Shut down the instance before you delete it. although it is more common to have one xPlore instance per host (horizontal scaling). The primary instance must be running when you install a secondary instance. Adding or deleting an instance To add an instance to the xPlore system. Collections that are bound to the instance are listed on the right. instance. You see following instance information: • OS information: Host name. If an xPlore instance exists on the same host. instance type. • JVM information: Version. and architecture. You can have multiple instances on the same host (vertical scaling). because the default port is already in use. You manage instances in xPlore administrator. Instance version. To delete an instance from the xPlore system. Click Instances in the left panel to see a list of instances in the right content pane. See Replacing a failed instance with a spare. You manage an instance by selecting the instance in the left panel. active thread count. OS. Area Description Use in indexing Use in search Temp 1. You can add secondary instances after you have installed the primary instance. You create an instance by running the xPlore installer.2 Administration and Development Guide 19 . • xPlore information: xDB version. Index: Updates to the Lucene index (non-transactional) Index agent content Temporarily holds Holds content None staging area content during indexing process xPlore instances An xPlore instance is a web application instance (WAR file) that resides on an application server. This attribute is used for registering and unregistering instances.The application server instance name for each xPlore instance is recorded in indexserverconfig. • Each domain contains an xDB tracking library (database) records the content that has been indexed. xDB manages parallel dispatching of queries to more than one Lucene index when parallel queries are enabled. To query the correct index. A library corresponds to a collection in xPlore with additional metadata such as category. xDB libraries and Lucene index xDB is a Java-based XML database that enables high-speed storage and manipulation of many XML documents. you can query each collection in parallel. if you have set up multiple collections on different storage locations. These databases record metrics and audit queries by xPlore instance. change the value of the attribute appserver-instance-name on the node element for that instance. Back up the xPlore federation after you change this file. The library is a logical container for other libraries or XML documents. • Each domain contains one or more data libraries. xPlore can be configured to store the content along with the tokens. An xDB library has a hierarchical structure like an OS directory. xDB manages the following libraries for xPlore: • The root library contains a SystemData with metrics and audit databases. For example. and properties. xDB supports the XQuery language and XQFT query specifications. xDB tracks the location of documents. When xPlore processes an XML representation of an input document and supplies tokens to xDB. The default library is the first that is created for a domain. A tracking database in xDB manages deletes and updates to the index. If you change the name of the JBoss instance. xDB passes them to the Lucene index.2 Administration and Development Guide . Optionally. changes to the index are propagated. • Each domain contains a status library (database) that reports indexing status for the domain. xPlore manages the indexes on the collection. xDB stores them into a Lucene index. When documents are updated or deleted. usage. 20 EMC Documentum xPlore Version 1. An xDB library stores an xPlore collection as one or more Lucene indexes that can include the XML content that is indexed. When xPlore supplies XQuery expressions to xDB.xml. page 42.2 Administration and Development Guide 21 . SAN or NAS. path-value combination. Covering indexes are used for security evaluation and facet computation. For information on viewing and updating this file. If you install more than one instance of xPlore. Indexes xDB has several possible index structures that are queried using XQuery. or multiple indexes on a collection. You can configure none. The Lucene index services both value-based and full-text probes of the index. The xDB data stores and indexes can reside on a separate data store. following is a value indexed field: /dmftdoc[dmftmetadata//object_name="foo"] Following is a tokenized. they are pulled from the index and not from the data pages. see Modifying indexserverconfig. If you do not have heavy performance requirements. For example. the storage locations must be accessible by all instances. The Lucene index is modeled as a multi-path index (a type of composite index).xml. Covering indexes are also supported. EMC Documentum xPlore Version 1. paths within the XML document. The locations are configurable in xPlore administrator.xml. full-text field: /dmftdoc[dmftmetadata//object_name ftcontains ’foo’] Indexes are defined and configured in indexserverconfig. When the query needs values.Figure 1 xDB and Lucene An xDB library is stored on a data store. or full-text content. one. xDB and the indexes can reside on the same data store. in xDB. An explicit index is based on values of XML elements. Non-final entries list all committed transactions. 22 EMC Documentum xPlore Version 1. index_info Contains a list of all committed Lucene indexes. for returnable fields that have value compression enabled. For example. When a document is indexed. page 25 • Mapping of domains to xDB. These elements are stored with the index and returned during a query. logical grouping of collections with an xPlore deployment. a domain could contain the indexed contents of a single Documentum content repository. page 26 Domains A domain is a separate. independent. Active indexes are registered in the index_info file. Entries are final or non-final. • Documentum domains and categories. A category is logically represented as one or more collections. returnable_field_value Stores a map of a value-name to a compressed number. it is assigned to a category or class of documents and indexed into one of the category collections.2 Administration and Development Guide . Each collection contains indexes on the content and metadata. Logical architecture A domain contains indexes for one or more categories of documents.Figure 2 Lucene indexes Table 3 Lucene directories Name Function Lucene index directory Begins with LI. All indexes in index_info can be queried. returnable_field_path Stores the XML path for returnable elements such as faceted attributes or security. it will eventually be removed from the file system. If an LI directory is not in the list. or update and search). index only. You can specify the elements that have compression. A collection is bound to a specific instance in read-write state (index and search. all documents in a domain are assigned to a single default collection. EMC Documentum xPlore Version 1. Three collections (two hot and one cold) with their corresponding instances are shown. You can create subcollections under each collection and route documents to user-defined collections. Collections A collection is a logical group of XML documents that is physically stored in an xDB detachable library. tokenization. Categories A category defines how a class of documents is indexed. text extraction. and storage of tokens. A collection can be bound to multiple instances in read-only state (search-only). You also specify the indexes that are defined on the category and the XML elements that are not indexed. This domain receives indexing requests from the Documentum index agent. In a basic deployment. All documents submitted for ingestion must be in XML format.) The category is defined in indexserverconfig. A category definition specifies the processing and semantics that is applied to an ingested XML document. A domain can have multiple collections in addition to the default collection.xml and managed by xPlore.Domains are defined in xPlore administrator in the data management screen.2 Administration and Development Guide 23 . A collection generally contains one category of documents. The Documentum index agent creates a domain for the repository to which it connects. the Documentum index agent prepares an XML version for Documentum repository indexing. (For example. A collection belongs to one category. You can specify the XML elements that are used for language identification. A collection represents the most granular data management unit within xPlore. All documents submitted for indexing are assigned to a collection. each with its own indexing. Each database has a subcollection for each xPlore instance. One metrics and one audit database is defined. You can view this domain and collections in xPlore administrator.2 Administration and Development Guide .Figure 3 Read-write (index and search) and read-only (search-only) collections on two instances Use xPlore Administrator to do the following: • Define a collection and its category • Back up the collection • Change the collection state to read-only or read-write • Change the collection binding to a different instance The metrics and audit systems store information in collections in a domain named SystemData. and CPS services. 24 EMC Documentum xPlore Version 1. search. The following diagram shows the services of a simple xPlore system: Two installed instances. xml. (Each collection is bound to an instance.2 Administration and Development Guide 25 . the document is assigned to an instance based on a round-robin order. then collection routing is applied. In the following configuration in indexserverconfig.. If it is a new document. The client indexing application.. These latter two collections are used to filter results for permissions before returning them to the client application. Documentum domains and categories Repository domains An xPlore domain generally maps to a single Documentum repository. Documentum index agent.) <domain storage-location-name="default" default-document-category="dftxml" name="TechPubsGlobal"> <collection document-category="dftxml" usage="Data" name="default"/> . Within that domain. </domain> EMC Documentum xPlore Version 1. the document is assigned to a collection in round-robin order. The collections in the domain can be distributed across multiple xPlore instances. If collection routing is not supplied by a client routing class or the Documentum index agent. On that instance. Three collections are defined: one for metadata and content (default). and one for groups. one for ACLs. you can direct documents to one or more collections.Figure 4 Services on two instances Example A document is submitted for indexing. has not specified the target collection for the document. for example. the index service updates the document. if the instance has more than one collection. If the document exists. a repository is mapped to a domain. see Modifying indexserverconfig. All documents are sent to a specific index based on the document category. For example. For information on viewing and updating this file. page 49 for more information. xPlore pre-defines a category called DFTXML that defines the indexes.Documentum categories A document category defines the characteristics of XML documents that belong to that category and their processing. • acl: ACLs that defined in the repository are indexed so that security can be evaluated in the full-text engine. All Documentum indexable content and metadata are sent to this category. • dftxml: XML representation of object metadata and content for full text indexing. create custom categories for them. See Managing Security.xml. page 42. Mapping of domains to xDB Figure 5 Database structure for two instances 26 EMC Documentum xPlore Version 1.2 Administration and Development Guide . • group: Groups defined in the repository are indexed to evaluate security in the full-text engine. click the document in the collection view. The following Documentum categories are defined within the domain element in indexserverconfig. To view the DFTXML representation using xPlore administrator. If your custom types need special configuration and a separate index.xml. There is a subcollection in StatusDB for each xPlore instance.2 Administration and Development Guide 27 . Readonlysave. How Content Server documents are indexed Figure 6 xPlore indexing path 1. or MoveContent operation is performed on a SysObject in the repository. The library is stored in a defined storage area on either instance. has its own domain. is a Superuser created when a repository is created or when an existing repository is upgraded. Repository B. • One content source (Documentum repository A) is mapped to a domain library. (The full-text user. Destroy. a Save. dm_fulltext_index_user. In a client application. Checkin.) The index agent retrieves the EMC Documentum xPlore Version 1. • The SystemInfo library has two subcollections: TrackingDB and StatusDB.xml that contains processing information for objects processed by the instance.• The entire xPlore federation library is stored in xDB root-library. The metrics and audit databases have a subcollection for each xPlore instance. • The Data collection has a default subcollection. • A second repository. Each collection in TrackingDB matches a collection in Data and is bound to the same instance as that data collection. This operation event generates a queue item (dmi_queue_item) in the repository that is sent to the full-text user work queue. The instance-specific subcollection has a file status. • All xPlore domains share the system metrics and audit databases (SystemData library in xDB with libraries MetricsDB and AuditDB). • The ApplicationInfo library contains Documentum ACL and group collections for a specific domain (repository). 2. use the DFC API registerEvent(). Otherwise. • Tracks document indexing status in the StatusDB. The Index Agent sends the DFTXML representation of the content and metadata to the xPlore Server. To change specific events that are registered for full-text. the client application can move on to the next task. (Indexing is asynchronous. The index agent then removes the queue item from the Content Server. The agent then creates a DFTXML (XML) representation of the object that can be used full-text and metadata indexing. Instead.2 Administration and Development Guide . language identification. 4. Enabling indexing for an object type Events in dmi_registry for the user dm_fulltext_index_user generate queue items for indexing. • Saves indexing metrics in the MetricsDB. The xPlore indexing service performs the following steps: • Routes documents to their target collections.) For information on how to troubleshoot latency between index agent submission and searchability. The index agent retrieves the object associated with the queue item from the repository. and for the property Enable indexing check Register for indexing. • Calls xDB to store the DFTXML in xDB. dm_destroy Registering a type for full-text indexing Use Documentum Administrator to change the full-text registration for an object type. (The index service does not provide any indication that an object is searchable. the queue item is left behind with the error status and error message. dm_destroy. dm_saveasnew. dm_checkin. The indexing service notifies the index agent of the indexing status. The object is now searchable. view the properties. 7. The following events are registered for dm_fulltext_index_user to generate indexing events by default: • dm_sysobject: dm_save. Select the type. it creates a watermark queue item (type dm_ftwatermark) to indicate the progress of reindexing. dm_destroy. dm_saveasnew • dm_group: dm_save. page 122. The content is retrieved or staged to a temporary area. • Stores the document location (collection) and document ID in the TrackingDB. queue item. and transformation of metadata and content into indexable tokens. 6. It picks up all the objects for 28 EMC Documentum xPlore Version 1. see Troubleshooting indexing. After an index request is submitted to xPlore. The xPlore indexing service calls CPS to perform text extraction. 5. dm_move_content • dm_acl: dm_save.) 3. • Returns the tokens from CPS to xDB. • Merges the extracted content into the DFTXML representation of the document. Reindexing The index agent does not recreate all the queue items for reindexing. In Documentum Administrator. The query plugin transmits batches of HTTP messages containing XQuery statements to the xPlore search service. Choose Tools > Resubmit all failed queue items. xDB applies the security filter. the Documentum security filter applies ACL and group permissions to results. You can submit for reindexing one or all documents that failed indexing.2 Administration and Development Guide 29 . The client application submits a DQL query or XQuery to the Documentum Content Server. The Server transmits the query to xPlore (Content Server 6. Path of a query from a Documentum client to xPlore 1. The index agent updates the watermark as it completes each batch. CPS tokenizes the query based on the locale declared in the query.5 SPx). The query is executed in the Lucene index against all collections unless a collection is specified in the query. When the reindexing is completed. xDB breaks the query into XQuery clauses for full-text (using ftcontains) and metadata (using value constraints). 4. open Indexing Management > Index Queue. The plugin translates the query into XQuery syntax.6 or higher) or the query plugin (Content Server 6.6 or higher to create the query. (If the client application uses DFC 6. 6. 5. EMC Documentum xPlore Version 1. the watermark queue item is updated to ’done’ status.indexing in batches by running a query. How Content Server documents are queried Several software components control full-text search using the xPlore server: • The Content Server queries the full-text indexes and returns query results to client applications. The results are returned in batches. If configured. • The xPlore server responds to full-text queries from Content Server. DFC translates the query into XQuery syntax.) 2. or select a queue item and choose Tools > Resubmit queue item. 3. with summary and facets. . Chapter 2 Managing the System This chapter contains the following topics: • Opening xPlore administrator • Starting and stopping the system • Viewing and configuring global operations (all instances) • Managing instances • Configuring system metrics • Managing the status database • Configuring the audit record • Troubleshooting system problems • Debugging queries with the xDB admin tool • Modifying indexserverconfig. data management. • port: xPlore primary instance port (default: 9300). instance status. such as indexing service. open the instance in the navigation tree and then choose the service. you can open administration pages for system-wide services. the actions apply to all indexing services in the xPlore installation. and diagnostics and troubleshooting. and runtime information. Open your web browser and enter one of the following: http://host:port/dsearchadmin https://host:port/dsearchadmin • host: DNS name of the computer on which the xPlore primary instance is installed. When you open a service page.xml • Tasks performed outside xPlore administrator • Administration APIs Opening xPlore administrator Most system administration tasks are available in xPlore administrator. instance-specific services. When you open xPlore administrator. you see the navigation tree and the system overview page. From the tree. EMC Documentum xPlore Version 1. 1. To change the indexing service configuration for a specific instance.xml • Customizations in indexserverconfig. JVM information.2 Administration and Development Guide 31 . instance description. From the system overview. click the grid symbol to see the following information for each xPlore instance: component status. 2./xplore.bat "stopNode ’NodeB’" • Unix/Linux export dsearchAdmin=xPlore_Home/dsearch/admin cd $dsearchAdmin .0/server. 2.1. 2. 3. To avoid this. the Windows service may report xPlore as stopped before all processes have terminated. page 138. they report a failed connection to the primary instance when you restart it. it always shuts down its components gracefully before shutting down the JBoss application server. The xPlore administrator home page displays a navigation tree in the left pane and links to the four management areas in the content pane. Choose Global Configuration to configure the following system-wide settings: • Storage Location.) If you did not stop secondary instances. Starting and stopping the system If you run a stop script. 32 EMC Documentum xPlore Version 1. If you are stopping the primary instance. In xPlore administrator. Start or stop the primary instance using the start or stop script in dsearch_home/jboss5. modify the stop script stopNodeB.sh (Unix or Linux) to add the following lines to the beginning of the file: • Windows: set dsearchAdmin=xPlore_Home\dsearch\admin cd %dsearchAdmin% call xplore. Given an xPlore instance with name NodeB.Managing the System Log in as the Administrator with the password that you entered when you installed the primary instance. an automatic service is installed. See Creating a collection storage location. (On Windows. Navigate to the instance in the tree and choose Stop instance or Start instance.2 Administration and Development Guide . Click Home > System overview. Viewing and configuring global operations (all instances) A single xPlore federation is a set of instances with a single primary instance and optional secondary instances. 1.sh "stopNode ’NodeB’" When you run the modified stop script to stop the xPlore instance. Select a service to view the status of each instance of the service. 4. stop all other instances first. 1. Click System Overview in the left tree to get the status of each xPlore instance. Click Global Configuration to configure system-wide settings. especially when the instance is in the middle of a final merge. select System Overview in the left panel. run as the same administrator user who started the instance.cmd (Windows) or stopNodeB. When you stop an instance. set the service to manual and modify the stop script for each instance you want to stop. Start or stop secondary instances in xPlore administrator. See Backup in xPlore administrator. page 35 • Replacing a failed primary instance. page 33 • Using the watchdog service. used to set connections from additional instances. page 34 • Changing the host name and URL. page 140. page 38.. Configure incremental backups. Managing the System • Index Service. • Logging. this value is set to the port number of the JBoss connector + 31. • Auditing. See Auditing queries. • xdb-listener-port: By default. page 35 • Changing a failed instance into a spare. See Document processing and indexing service configuration parameters.xml. see Modifying indexserverconfig. search service. page 34 • Replacing a failed instance with a spare. <node name="primary" hostname="localhost" xdb-listener-port="9330"> . Configuring the primary instance You can set the following attributes on the primary instance element (node) in indexserverconfig. Managing instances • Configuring an instance. page 37 Configuring an instance You can configure the indexing service. page 204. the xDB listener port is set during xPlore installation.2 Administration and Development Guide 33 . and Configuring the audit record. For information on viewing and updating this file. or content processing service for a secondary instance.xml. EMC Documentum xPlore Version 1. page 42.. </node> • primaryNode attribute: Set to true. See Configuring logging. Default: 9331 • url: Specify the URL of the primary instance. Select the instance in xPlore administrator and then click Stop Instance. • admin-rmi-port: Specify the port at which other instances connect to xPlore administrator. See Search service configuration parameters. Troubleshooting data management. Requirements All instances in an xPlore deployment must have their host clocks synchronized to the primary xPlore instance host. • Engine. page 283. page 249. page 150. page 287. By default. • Search Service. the watchdog service can monitor all instances.select param_name. One watchdog service is installed on each xPlore host. To restart the watchdog service: On Windows hosts.l. The watchdog process starts at xPlore installation and when the host is booted up. To turn off the watchdog service: On Windows hosts. 34 EMC Documentum xPlore Version 1. 1. If you run a stop script. If a process such as the indexing or search service fails. 2. Use the object ID returned by the previous step to get the parameters and values and their index positions.l. For example: ?.c. On the node element. If the port was returned as the second parameter.param_value[2] SET>new_port save.l d. start the watchdog service: Documentum Search Services Watchdog.sh in dsearch_home/watchdog. the watchdog service detects the failure and sends an email notification to the administrator. run the script startWatchdog. run as the same administrator user who started the instance.Managing the System Using the watchdog service The xPlore watchdog service is a Windows service or daemon process that monitors and checks the status of various processes in xPlore. Changing the host name and URL Stop all xPlore instances. stop the watchdog service: Documentum Search Services Watchdog. For example: retrieve.c. run the script stopWatchdog.dm_ftengine_config set.c.c.xml in dsearch_home/config. On UNIX and Linux hosts.sh in dsearch_home/watchdog. 1. a.param_value[3] SET>new_hostname save. set the index to 2 as shown in the following example: set.c.c. On UNIX and Linux hosts. change the values of the hostname to have the new host name and URL. Restart the Content Server and xPlore instances. 2. Enter your new host name at the SET command line.2 Administration and Development Guide .l 3. This change takes effect when you restart the repository. Enter your new port at the SET command line. Edit indexserverconfig. It runs as a standalone Java process.c.dm_ftengine_config b. If a host has multiple xPlore instances. Primary instance only: Use iAPI to change the parameters for the host name and port in the dm_ftengine_config object. Do the following iAPI command: retrieve. param_value from dm_ftengine_config where r_object_id=’080a0d6880000d0d’ c. Change the value of the node-name property to PrimaryDsearch. Click Activate Spare Instance. Use shared storage for the spare.1.0/server. page 35.) a. Note the name of the node for the next step. Replacing a failed primary instance 1.xml. PrimaryDsearch. 4. Use the previous primary instance name. for example. The old instance can no longer be used for ingestion or queries. For information on changing a failed instance to spare. Edit indexserver-bootstrap. the spare instance becomes DSearchNode3. (The status attribute is set to spare. run as the same administrator user who started the instance. The failed instance is renamed with Failed appended. Edit indexserverconfig. index. Locate the node element for the old primary instance.1. 3. xPlore recovers failed data using the transaction log. b. page 42. Use xPlore administrator to activate a spare to replace a failed or stopped secondary instance.) If you run a stop script. Shut down all xPlore instances. Choose the instance to replace. 6. see Replacing a failed primary instance. When you activate a spare to replace another instance. Locate the spare node element in indexserverconfig. Set the status to normal.0/server/DctmServer_Spare/deploy/dsearch. EMC Documentum xPlore Version 1. Validate your changes using the validation tool described in Modifying indexserverconfig. Stop the failed instance. 5. Select the spare instance. The shutdown scripts are located in dsearch_home/jboss5. Instance2Failed. 3. When xPlore administrator reports success. 2. 4. each instance is installed as an automatic service. if you activated DSearchSpare to replace DSearchNode3. the spare instance is renamed in the UI with the replaced instance name. see Changing a failed instance into a spare. which is located in dsearch_home/config.properties in the web application for the new primary instance. c. b. When you activate the spare to take over a failed instance. For example.2 Administration and Development Guide 35 . and log directories must all be accessible to the primary instance. the data. Managing the System Replacing a failed instance with a spare You can install a spare instance using the xPlore installer. (On Windows. You cannot change an active instance into a spare instance.war/WEB-INF/classes. Open xPlore administrator and verify that the spare instance is running. When you install a spare instance. Change the value of the name attribute to the name of your previous primary instance.xml.xml. dsearch_home/jboss5. a. the spare takes on the identity of the old instance. Note: Do not change the value of appserver-instance-name. for example. If you are replacing a primary instance. Change the value of the primaryNode attribute to true. 2. page 37. for example. 1. Delete this node element. Change the value of the isPrimary property to true. Use iAPI to change the parameters for the host name and port in the dm_ftengine_config object.properties in the directory WEB-INF/classes of the new primary instance. a.xml). Change the port to match the port for the value of the attribute xdb-listener-port on the new instance. substitute 3 for the parameter index. Update xdb. • Change ESS_PORT to match the value of the port in the url attribute of the new primary instance (in indexserverconfig. Change the host name to match your new primary instance host.bat in dsearch_home/dsearch/xhive/admin.0/server/DctmServer_Indexagent/deploy/IndexAgent. For example: XHIVE_BOOTSTRAP=xhive://NewHost:9330 d. Use the object ID to get the parameters and values and their index positions. • Change ESS_HOST to the new host name. • Change the path for XHIVE_HOME to the path to the new primary instance web application.dm_ftengine_config b.dm_ftengine_config 36 EMC Documentum xPlore Version 1. For example: retrieve. 9. To set the port. If the port was returned as the third parameter in step 3. 11. then start the secondary instances.105\:9432 d. Start the xPlore primary instance. param_value from dm_ftengine_config where r_object_id=’080a0d6880000d0d’ c. a. Change parameter values for parameters that are defined in the element indexer_plugin_config/generic_indexer/parameter_list/parameter.168. 8. Your new values must match the values in indexserverconfig. enter your new port at the SET command line. Find the XHIVE_BOOTSTRAP entry and edit the URL to reflect the new primary instance host name and port. 7.1.2 Administration and Development Guide . b.Managing the System c. Edit xdb. Edit xDB.xml for the new primary instance. do the following iAPI command: retrieve. Update dm_ftengine_config on the Content Server.xml in dsearch_home/jboss5. • Change the parameter_value of the parameter dsearch_qrserver_host to the new host name.war/WEB-INF/classes.c.c. This change takes effect when you restart the repository. Edit indexserver-bootstrap. For example: ?. Update the index agent. 10. • Change the parameter_value of the parameter dsearch_qrserver_port to the new port. To find the port and host parameter index values for the next step. (This bootstrap file is not the same as the indexserver bootstrap file.properties in all other xPlore instances to reference the new primary instance. Shut down the index agent instance and modify indexagent.select param_name.) b.32.properties in all other xPlore instances to reference the new primary instance. for example: xhive-connection-string=xhive\://10. Change the value of xhive-connection-string to match the host of your new primary instance. a.c. c. For information on viewing and updating this file. if wait-timeout is set to 10. check dsearch. Modify indexserver-bootstrap. 1. Open indexserverconfig. Changing a failed instance into a spare Because the identity of a failed instance is assigned to another instance. Log4j.1.log in log4j.dm_ftengine_config set. 2. To set the host name.war from dsearch_home/setup/dsearch into your spare instance jboss5. Configuring system metrics Configure system metrics in indexserverconfig. add the following line to the system-metrics-service element.0/deploy directory.c. For example. page 42. Update the path to dsearchadminweb.c. system metrics are saved in batches of 100 every 5 seconds.properties is located in dsearchadmin.xml.xml.l.c.war/WEB-INF/classes. To change these values. Back up the federation.propertie in the WEB-INF/classes directory of the application server instance. see Modifying indexserverconfig. Use the JBoss script to stop the instance. page 283. 3. b.param_value[4] SET>new_hostname save. 4. enter your new host name at the SET command line: retrieve.c.xml in dsearch_home/config.1. The wait-timeout unit is seconds.” Then configure the new primary instance.c. If you must use the Windows service. Change the node element status attribute to spare.log on the old primary instance to find “The DSS instance PrimaryDsearch is shut down.l.xml. see Document processing and indexing service configuration parameters.l 12.2 Administration and Development Guide 37 . Managing the System set.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.l d. extract dsearchadmin. For information on the settings for indexing and CPS metrics. for example: dsearch_home/jboss5. the latest metrics are available about 10 seconds later (average 5 EMC Documentum xPlore Version 1. Troubleshooting primary failover • If you did not configure xPlore administrator when you set up the spare.param_value[3] SET>new_port save. The best way to confirm instance shutdown on Windows is to set the Windows xPlore service to manual. Restart the primary and then the secondary instances. Back up the xPlore federation. By default. Change the node element name attribute of the failed instance to a new unique name.war/WEB-INF/classes Change the node-name key value to the one you set in indexserverconfig. a. the identity of the failed instance must be changed.properties of the spare instance to a path on the spare instance. • A startup error FEDERATION_ALREADY_OPEN can be encountered when the old primary instance has not fully terminated before you replace it. Default: midnight (00:00:00) – audit-file-size-limit: Size limit units: K | M | G | T (KB.2 Administration and Development Guide . If batch size is reached before timeout. 1. Configuring the audit record In most cases. – lifespan-in-days: Specifies time period before audit record is purged. page 42. The batch size determines how many metrics are accumulated before they are saved to the system metrics database in xDB. For information on viewing and updating this file. You can configure the purge schedule: • auditing/location element: Specifies a storage path for the audit record. Default: 30 – preferred-purge-time: Specifies the time of day at which the audit record is purged. GB. In the following example.xml. – audit-file-rotate-frequency: Period in hours that a file serves as the active storage for the audit records. • audit-config element: Configures auditing. page 42. Control how much data is cached before the status DB is updated. you do not need to modify the default configuration in indexserverconfig. path. MB. TB). status. the cache size is set to 1000 instead of the default 10000 bytes: <node . <persistence-service batch-size="100" wait-timeout="10"/> Managing the status database The status database records the indexing status of ingested documents (success or failure).xml. Conserve disk space on the primary host: Purge the status database when the xPlore primary instance starts up. Default: 2G. see Modifying indexserverconfig. MB. • properties/property element: – audit-save-batch-size: Specifies how many records are batched before a save.xml. location. Set the value of the purge-statusdb-on-startup attribute on the index-server-configuration element to true. Format: hours:minutes:seconds in 24-hour time. Records over 30 days old are automatically purged. Open indexserverconfig. Note: Changes to format and location are not supported in this release. 2. Attributes: component. see Modifying indexserverconfig.. the batch is recorded.> <properties> <property value="1000" name="statusdb-cache-size"/> </properties> For information on viewing and updating this file. Size limit units: K | M | G | T (KB. TB). GB. Attributes: name.xml and set the statusdb-cache-size property for each instance. Default: 100.Managing the System seconds). size-limit. 38 EMC Documentum xPlore Version 1.. format. Use reports to query audit records. Connection refused Indexing fails when one of the xPlore instances is down. Click AuditDB and then click auditRecords. The error in dsearch. You set the mode for collection 2 to search_only and bind the collection to instance 2 and instance 3. When you configure an index agent for a repository. Managing the System Viewing the audit record The entire audit record can be large. </event> Troubleshooting system problems For troubleshooting installation. Change the JBoss startup (script or service) so that it starts correctly. If you run a stop script. see Modifying indexserverconfig. For information on viewing and updating this file. Original message: Connection refused Check the following causes for this issue: • Instance is stopped: A query that hits a document in a collection bound to the stopped instance fails. Timing problems: Login ticket expired All instances in an xPlore deployment must have their host clocks synchronized to the primary xPlore instance host. EMC Documentum xPlore Version 1. run as the same administrator user who started the instance. then full-text is not enabled in the Content Server.2 Administration and Development Guide 39 .235:9330 failed. you create collection 2 and upload a file to it. do the following: 1. Shut down all xPlore instances.. Reports allow you to specify a time period and other report criteria..xml with the new value of the URL attribute on the node element.xml. To view the entire audit record. 2. Stop instance 2 and query for a term in the uploaded file.log is like the following: CONNECTION_FAILED: Connect to server at 10. An audit record has the following format in XML: <event name=event_name component=component_name timestamp=time> <element-name>value</element_name> . Which indexing engine are you using? If there is no dm_ftengine_config object. full-text is automatically enabled. and restart. page 42.32. see Documentum xPlore Installation Guide. drill down to the AuditDB collection in Data Management > SystemData. Update indexserverconfig. synchronize clocks.112. and viewing can cause an out of memory error.xml. • The xPlore host name has changed: If you have to change the xPlore host name. For example. You can customize reports to query the audit record. c.dm_ftengine_config 080012a780002900 ?.select object_name from dm_ftengine_config where r_object_id=’080012a780002900’ DSearch Fulltext Engine Configuration I/O errors. These indexes are merged into a larger final index to help query response time. page 146. Switch to the 64-bit version of xPlore and allocate 4 GB of memory or more to the JVM. The final merge stage can require large amounts of memory.dm_fulltext_index 3b0012a780000100 ?.c.xhive. Substitute the object ID in the following command.bootstrap in dsearch_home/config.XhiveException: IO_ERROR: Failure while merging external indexes.xml to remove binding elements from the collection that has the issue. 40 EMC Documentum xPlore Version 1.dm_ftengine_config The dm_fulltext_index object attribute is_standby must be set to false (0). • I/O error indexing a large collection: Switch to the 64-bit version of xPlore and use 4+ GB of memory when a single collection has more than 5 million documents. Edit xhivedatabase. see Repairing a corrupted index. If memory is insufficient. DSearch Fulltext Engine Configuration is returned for xPlore. 4.c. Restart xPlore instances.select is_standby from dm_fulltext_index where r_object_id=’3b0012a780000100’ 0 Verify which engine is used To verify whether FAST or xPlore is being used.error. Edit indexserverconfig. If not. For information on viewing and updating this file.c. get the object_name attribute of the dm_ftengine_config object after you have retrieved the object. The FAST configuration object name contains FAST: retrieve. page 42.2 Administration and Development Guide . Shut down all xPlore instances. 2. or no such file or directory The following causes can result in this error: • Multiple instances of xPlore: Storage areas must be accessible from all other instances.Managing the System Verify full-text enabled Use the following iAPI command to get the object ID of the indexing engine: retrieve. you see an I/O error when you try to create a collection. see Modifying indexserverconfig. 3. Use the following cleanup procedure: 1. • I/O error during index merge: Documents are added to small Lucene indexes within a single collection. Change the binding node value to primary for segments that have this problem. Substitute your object ID: retrieve. Original message: Insufficient system resources exist to complete the requested service To fix a corrupted index.c. com. the merge process fails and corrupts the index.xml. Stop the application server. Debugging queries with the xDB admin tool Some query optimization and debugging tasks use the xDB admin tool. 4. You can move to a less-used virtual host or add more memory or cores to the virtual host. 2. save. Do one of the following: • Restart before changing the binding. updated pages are not written to the disk. Increase the values of -Xms and -Xmx. xPlore uses UTF-8 encoding. Edit the script that launches an xPlore server or index agent. EMC Documentum xPlore Version 1.xml using a simple text editor like Notepad.1. 1. Managing the System High-volume/low memory errors • Application server out of memory: Increase the default JVM heap size. which results in unexpected text errors. and restart the application server.xml can cause startup to fail. Windows: Each instance is installed as an automatic service. • Virtual environment: Virtual environments are sometimes under powered or other applications in the environment are overusing the resources. Restart the xPlore instances. If you restart the application server without reboot. located in dsearch_home/jboss5. For example. 3. Stop the service. Cannot change binding on a stopped instance If an xPlore instance is stopped or crashed. you cannot change the binding of collections on that instance to another instance. reboot the server before changing binding. and validate your changes using the validateConfigurationFile script in dsearch_home/dsearch/xhive/admin. Windows uses ISO-8859-1. edit the launch script. • If the instance has crashed.0/server.2 Administration and Development Guide 41 . If you edit indexserverconfig. and restart the service. Use an XML editor to edit the file. non-ASCII characters such as ü are saved in native (OS) encoding. Error on startup Non-ASCII characters in indexserverconfig. All changes to the file are immediately saved into the xDB file. 1. 2. and libraries. users. After login. If you remove segments. This file is loaded into xPlore memory during the bootstrap process. Expand the root library to find a library and a collection of interest. and optimize the query. debug the query. you cannot save the file with the same name. you see the tree in the left pane.txt extension. when you save the file. 4. 42 EMC Documentum xPlore Version 1. Click the connection icon to log in. Note: Do not edit this file in xDB. These rarely needed tasks require manual editing of indexserverconfig. Modifying indexserverconfig.xml. The query window has tabs that show the results tree. it is given a .bat or XHAdmin.xml with a file of the same name and extension.xml.2 Administration and Development Guide . Make your changes to this file using an XML editor. then highlight a particular indexed document to see its XML rendition. A simple text editor such as Notepad can insert characters using the native OS encoding. 5. This tool is not aware of xPlore configuration settings in indexserverconfig. groups. Be sure to replace indexserverconfig. because the changes are not synchronized with xPlore. Run the script XHAdmin. Query a library or collection with the search icon at the top of the xDB admin client. and the extension is not shown. 3. The password is the same as your xPlore administrator password. CAUTION: Do not use xhadmin to rebuild an index or change files that xPlore uses. On Windows 2008. Navigate to dsearch_home/dsearch/xhive/admin. which shows segments. your backups cannot be restored. By default. and it is maintained in parallel as a versioned file in xDB. Stop all instances in the xPlore federation.sh.xml Some tasks are not available in xPlore administrator.Managing the System Figure 7 xDB admin tool 1. 2. Changes must be encoded in UTF-8. This file is located in dsearch_home/config. causing validation to fail. and then click Text: Figure 8 XML content in xDB admin Customizations in indexserverconfig. and restart xPlore. Change it to a higher value. 4. When indexserverconfig. reapply your changes. Set the configuration change check interval as the value of config-check-interval on the root index-server-configuration element. You can view the content of each version using the xhadmin tool (see Debugging queries with the xDB admin tool.xml" 5. Troubleshooting You might change the file without effect after xPlore is restarted. Back up the xPlore federation after you change this file.xml using a forward slash. Validate your changes using the CLI validateConfigFile. 6. The system will check for configuration changes after this interval. page 41. Syntax: xplore validateConfigFile path_to_config_file For example: xplore validateConfigFile "C:/xPlore/config/indexserverconfig.xml is malformed. xPlore overwrites your changes. xPlore then resets the revision to the current version in xDB.xml.xml • Define and configure indexes for facets.. Check the value of the revision attribute of the index-server-configuration element in indexserverconfig. Managing the System 3.2 Administration and Development Guide 43 . Drill down to the dsearchConfig library. If the file was changed through xPlore administrator while it was open on your desktop. From the command line. Substitute your path to indexserverconfig. Restart the xPlore system to see changes. type the following. xPlore does not start. EMC Documentum xPlore Version 1. click a version. index. search X metrics (see Configuring system metrics. Register custom routing class X (see Documentum System Search Development Guide) 44 EMC Documentum xPlore Version 1. • Change the xPlore host name and URL. See Tracing.) Table 4 Tasks outside xPlore administrator Action indexserverconfig. Specify the indexes that are defined on the category and the XML elements that are not indexed.2 Administration and Development Guide . • Trace specific classes. tokenization.xml xDB Define/change a category of X documents (see Configuring categories. • Conserve disk space by purging the status database on startup. • Boost metadata and freshness in results scores. Change the collection for a category. • Configure system. Tasks performed outside xPlore administrator Some xPlore administration tasks can be performed in xPlore administrator as well as in indexserverconfig. indexing. and storage of tokens. Use xPlore administrator for those tasks. • Change the xDB listener port and admin RMI port.xml or xDB. • Specify a custom routing-class for user-defined domains. (The following tasks are not common. and search metrics.Managing the System • Add and configure categories: Specifying the XML elements that have text extraction. • Configure indexing depth (leaf node). page 226) Disable system. • Lemmatize specific categories or element content. • Set the security filter batch size and the user and group cache size. • Add or change special characters for CPS processing. page 37) Purge the status DB: X Set the value of the purge-statusdb-on-startup attribute on index-server-configuration to true. • Turn off lemmatization. page 256. page 133) Define a subpath for facets (see X Configuring facets in xPlore. 2 Administration and Development Guide 45 . filter object types. page 68). • Map file stores in shared directories (Sharing content storage. X page 256. • Exclude ACL and group attributes from indexing (Configuring the index agent after installation. page 66). Index agent tasks in the Documentum environment The following index agent tasks are performed outside the index agent UI: • Limit content size for indexing (Maximum document and text size. • Customize indexing and query routing. EMC Documentum xPlore Version 1. page 99) Configure Documentum security X properties (see Changing search results security. page 70 Search configuration tasks in the Documentum environment The following search configuration tasks are performed outside xPlore administrator. • Install an additional index agent for ACLs and groups (Setting up index agents for ACLs and groups. page 126. page 168) Change special characters list (see X Handling special characters. page 83). and inject metadata (see Route a document to a collection. page 90) Add a custom dictionary (see X Adding dictionaries to CPS.xml xDB Change primary instance or xPlore X host (see Replacing a failed primary instance. page 117) Boost metadata and freshness (see X Configuring scoring and freshness. • Map partitions to specific collections (Mapping Server storage areas to collections. page 58. page 68). page 65). Managing the System Action indexserverconfig. page 60). and Injecting data and supporting joins. • Verify index agent migration (Using ftintegrity. Configuring index agent filters. page 49) Trace specific classes (see Tracing. page 35) Configure lemmatization (see X Configuring query lemmatization. page 176) Configure indexing depth (see X Configuring text extraction. and like terms (Configuring full-text wildcard (fragment) support. Administration APIs are wrapped in a command-line interface tool (CLI). 46 EMC Documentum xPlore Version 1. and DFS queries.client.startNode("primary").api.0. 9300. " mypassword"). page 153.0. Administration APIs The xPlore Admin API supports all xPlore administrative functions. System administration APIs are available in the interface IFtAdminSystem in the package com.1". " password"). page 183).emc.interfaces in the SDK jar file dsearchadmin-api. int port. This package is in the SDK jar file dsearchadmin-api.api. wildcards.emc. Call an admin API Invoke the admin API FTAdminFactory via the façade implementation in the package com. page 182). page 178).core.documentum.admin. Parameters are String hostname or IP address. Open an admin connection Create a façade implementation via FtAdminFactory. The information is provided for planning purposes. page 49). page 185). Note: Administration APIs are not supported in this release.documentum.getAdminService("127.interfaces. The Admin API provides you with full control of xPlore and its components.core. The following example starts the primary instance: IFtAdminService srv = FTAdminFactory.2 Administration and Development Guide .fulltext. The password parameter is the xPlore administrator password. page 208). The following example uses the web service transport protocol to open a connection on the local xPlore instance. Index service APIs are available in the interface IFtAdminIndex in the package com.fulltext. • Configure search for fragments. • Turn off XQuery generation to support certain DQL operations (DQL. • Turn on tracing for the Documentum query plugin (sTracing Documentum queries. srv.client.admin.documentum.getAdminService("127.jar.emc. DFC. Each API is described in the javadocs.core. for example: IFtAdminService srv = FTAdminFactory. 9300. • Make types and attributes searchable (Making types and attributes searchable.api.fulltext. • Route a query to a specific collection (Routing a query to a specific collection. • Customize facets and queries (see Facets.0. String password.Managing the System • Turn off xPlore native security (see Managing Security.admin.client. page 225). The syntax and CLIs are described in Automated backup and restore (CLI).1".0.jar. 100.fulltext. Engine configuration keys for setEngineConfig • xhive-database-name: Managed by xDB.100.api.core.emc. Indexing configuration keys for setIndexConfig • index-requests-max-size: Maximum size of internal index queue • index-requests-batch-size: Maximum number of index requests in a batch • index-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch • index-threadpool-core-size: Minimum number of threads used to process a single incoming request.client.100.documentum. Valid values: 1 . • status-executor-queue-size: Maximum size of index executor queue before spawning a new worker thread • status-executor-retry-wait-time: Wait time in milliseconds after executor queue and worker thread maximums have been reached EMC Documentum xPlore Version 1. Valid values: 1 .interfaces. • xhive-cache-pages: Number of pages to hold temporary data for all simultaneous xDB sessions. • index-threadpool-max-size: Number of threads used to process a single incoming request. Usually matches filesystem pagesize. xPlore sets the value to 25% of the maximum memory configured for the JVM. Valid values: 1 .admin. By default. • status-threadpool-max-size: Number of threads used to process a single incoming request.2 Administration and Development Guide 47 .jar.100. Managing the System Configuration APIs Configuration APIs are available in the interface IFtAdminConfig in the package com. Valid values: 1 . • index-executor-queue-size: Maximum size of index executor queue before spawning a new worker thread • index-executor-retry-wait-time: Wait time in milliseconds after executor queue and worker thread maximums have been reached Status update keys • status-requests-max-size: Maximum size of internal status queue • status-requests-batch-size: Maximum number of status update requests in a batch • status-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch • status-threadpool-core-size: Minimum number of threads used to process a single incoming request. • xhive-pagesize: Size in bytes of database pages. This package is in the SDK jar file dsearchadmin-api. a power of 2. For a value of 0. If a query has many facets. • query-summary-highlight-begin-tag: HTML tag to insert at beginning of summary.2 Administration and Development Guide . When this limit is reached. and idle threads are removed down to this minimum number.100. Default: com. See the release notes for supported languages in this release. (Threads are freed immediately after a result is retrieved. • query-summary-display-length: Size in bytes of summary to display. only 12 results are used to compute facets. • query-thread-sync-interval: Interval after which results fetching is suspended when the result cache is full. Default: 65536. Default: milliseconds • query-executor-retry-interval: Wait time in milliseconds after executor queue and worker thread maximums have been reached. Sets the maximum number of results used to compute facet values. • query-threadpool-keep-alive-time: Interval after which idle threads are terminated.fulltext. Default: 60000 • query-threadpool-keep-alive-time-unit: Unit of time for query-thread-pool-keep-alive-time. • query-summary-default-highlighter: Class that determines summary.emc.documentum. Default: 100. • query-default-result-batch-size: Default size of result batches. • query-threadpool-max-size: Maximum number of threads used to process incoming requests.summary. service is denied to additional requests. • query-result-cache-size: Default size of results buffer. No summary calculation is performed. After this limit is reached. Default: 400. no more results are fetched from xDB until the client asks for more results. Default: $Documentum/dss/spool • query-default-timeout: Interval in milliseconds for a query to time out. • query-enable-dynamic-summary: If context is not important. • query-threadpool-queue-size: Maximum size of thread pool queue before spawning a new worker thread. Default: 200. • query-executor-retry-limit: Number of times to retry query execution. For example. • query-thread-max-idle-interval: Query thread is freed up for reuse after this interval. This returns as a summary the first n chars defined by the query-summary-display-length configuration parameter. • query-index-covering-values: The values specified in the path attribute are used for aggregate queries. if query-facet-max-result-size=12. Default: 60000 • query-threadpool-core-size: Minimum number of threads used to process incoming requests. • query-result-spool-location: Location to spool results. Default: 10. Valid values: 1 . Default: 100 milliseconds. These values are pulled from the index and not from the data pages. set to false. Valid values: 1 . the thread waits indefinitely until space is available in the cache (freed up when the client application retrieves results). Default: 0. because the client application has not retrieved the result.) Default: 3600000 milliseconds.indexserver. For summaries evaluated in context.core.DefaultSummary • query-summary-analysis-window: Size in bytes of initial window in which to search for query terms for summary.services. set to true (default). • query-facet-max-result-size: Documentum only. the number of results per facet is reduced accordingly.Managing the System Search configuration keys for setGlobalSearchConfig • query-default-locale: Default locale for queries. Threads are allocated at startup. 48 EMC Documentum xPlore Version 1.100. Default: 100. • query-summary-highlight-end-tag: HTML tag to insert at end of summary. dm_ftengine_config set. You must secure the xPlore environment using network security components such as a firewall and restriction of network access. Enter the following command to turn off filtering. Secure the xPlore administrator port and open it only to specific client hosts. Open the iAPI tool from the Documentum Server Manager on the Content Server host or in Documentum Administrator.c.c EMC Documentum xPlore Version 1. By default.l. To turn on security filtering in the Content Server for DQL queries: 1. Anyone with access to the xPlore host port can connect to it. enter the following command: retrieve. providing faster search results. 2. Chapter 3 Managing Security This chapter contains the following topics: • Changing search results security • Manually updating security • Configuring the security cache • Troubleshooting security xPlore does not have a security subsystem. For DQL queries only.c.c.l.ftsearch_security_mode 3. Note lowercase L in the set and save commands: retrieve.2 Administration and Development Guide 49 . users cannot find the document. Changing search results security Documentum repository security is managed through individual and group permissions (ACLs).ftsearch_security_mode 0 save. The queue sometimes causes a delay between changes in the Content Server and propagation of security to the search server. xPlore security minimizes the result set that is returned to the Content Server. you can turn on Content Server security filtering to eliminate latency and support complete transactional consistency. security is applied to results before they are returned to the Content Server (native xPlore security).c. To check your existing security mode.c.dm_ftengine_config get. If the index agent has not yet processed a document for indexing or updated changes to a permission set.l reinit. Content Server queues changes to ACLs and groups. l. Locate the line beginning with "%JAVA_HOME%\bin\java". (Do not confuse with the dm_ACLReplication job. Make sure that xPlore security is configured (default).param_value T save.Managing the System 4.server..impl. you can run the ACL replication job dm_FTACLReplication in Documentum Administrator. and dynamic state of the user (group membership). Set the repository name.c. Use the following iAPI command: retrieve. whether MACL is enabled.bat or . Locate the script aclreplication_for_repositoryname.l.c.dm_ftengine_config append.c.) With this option.documentum. ACL (dm_acl) and group (dm_group) objects are stored in XML format in the xPlore xDB. the results are filtered again in the Content Server for changes to permissions that took place after the security update in xPlore. xPlore primary instance host. 3.method. repository user. Edit the script before you run it. Restart all xPlore instances. Save as new. You can manually populate or update the ACL and group information in xPlore for the following use cases: • You are testing Documentum indexing before migration. 1. and xPlore domain (optional).2 Administration and Development Guide . 2.sh in dsearch_home/setup/indexagent/tools.. The security filter receives the following information: User credentials. Using both security filters You can configure xPlore and the Content Server to use both security filters. They are updated when a Save. After xPlore security is applied.fulltext. Dual mode (FAST and xPlore on two Content Servers) only: Set the parameter -ftEngineStandby to true: -ftEngineStandby T For example: . ACLs and groups are indexed. The security filter is applied in xDB to filter search results per batch.param_name acl_check_db append.com. privileges. or Destroy event on an ACL or group takes place in the repository. (This option does not apply to the DFC search service.aclreplication.c. The XML format is ACLXML for ACLs and GroupXML for groups. • You start using xPlore in a repository that has no full-text system • You deploy a Documentum application that creates ACLs and groups. The XML for ACLs and groups is stored as a collection in xDB: domain_name/dsearch/ApplicationInfo/acl or /ApplicationInfo/group.l Manually updating security When you set the index agent UI to migration mode. xPlore port. minimum permit level. results are not returned to the Content Server unless the user has permissions. password.FTIndexACLGroups dsearch11 batchm mypwd localhost 9300 myContentServer -ftEngineStandby F Alternatively.) 50 EMC Documentum xPlore Version 1. Global LRU cache that is shared between search sessions. two Content Servers). and -dsearch_domain are no longer supported in xPlore. Per-query LRU cache that contains ACLs and granted permissions for users. dm_group or both (default) -verbose Set to true to record replication status for each object in the job report. Set to F to replicate ACLs to the FAST Content Server. You set the following properties for security: • groups-in-cache-size: Number of groups a user belongs to. 3. If you encounter the error "XQUERY_ERROR_VALUE: Tail recursive function" you can edit the property to a value greater than 10000. Default: 800. Configuring the security cache Increase the cache size for large numbers of users or large numbers of groups that users can belong to.core. -group_where_clause DQL where clause to retrieve dm_group objects -max_object_count Number of dm_acl and dm_group objects to be replicated. • max-tail-recursion-depth: Sets the maximum number of subgroup members of a group. • acl-cache-size: Number of users in the cache. all objects are replicated.fulltext. • batch-size: Size of batches sent for search results security filtering. Change the size of a cache in the security-filter-class element: <security-filter-class name="documentum" default="true" class-name=" com. For information on viewing and updating this file.documentum.2 Administration and Development Guide 51 . to prevent runaway security recursion.SecurityJoin"> <properties> <property name="groups-in-cache-size" value="1000"/> <property name="not-in-groups-cache-size" value="1000"/> <property name="acl-cache-size" value="400"> <property name="batch-size" value="800"> EMC Documentum xPlore Version 1. page 42. see Modifying indexserverconfig. first-out (FIFO) basis. Per-query LRU cache.security. Default: false Note: The arguments -dsearch_host. Default: 10000. • not-in-groups-cache-size: Number of groups that a user does not belong to. -replication_option Valid values: dm_acl.services. If not set. Entries in the caches are replaced on a first-in. Edit indexserverconfig. Stop all xPlore instances 2. Default: 1000. Set to T to replicate ACLs to standby (xPlore) dm_ftengine_config object (second Content Server). -dsearch_port. Managing the System Table 5 ACL replication job arguments Argument Description -acl_where_clause DQL where clause to retrieve dm_acl objects -ftengine_standby Dual mode (FAST and xPlore. Default: 400.xml. Default: 1000.emc.indexserver. 1.xml. If necessary. Click dsearch log to view the following information: • The XQuery expression. query-locale=en. The default is 7200 sec (2 hours). page 42. ftcontains ’default’] order by $s descending return <d> {$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name } { $i/dmftmetadata//r_modifier } </d> return subsequence($j. Returns the user’s minimum permit level for results. • Filter-output: Total number of hits after security has filtered the results.query-string=let $j:= for $i score $s in /dmftdoc[ . 52 EMC Documentum xPlore Version 1.200) is running]]</message> • Security filter applied.xml.2 Administration and Development Guide .log: • Minimum-permit-level. Validate your changes using the validation tool described in Modifying indexserverconfig. the following information is saved in dsearch. change the Groups-in cache cleanup interval by adding a property to the security-filter-class properties.log using xPlore administrator. Troubleshooting security Viewing security in the log Check dsearch.Managing the System <property value="10000" name="max-tail-recursion-depth"/> </properties> </security-filter-class> 4. For example:<message><![CDATA[Security Filter invoked]]></message> • Security filtering statistics. Choose an instance and click Logging. For example. <property name="groupcache-clean-interval" value="7200"> 5.1. search for the term default:<message > <![CDATA[QueryID=PrimaryDsearch$a44343c2-2b4e-4eb7-b787-aee904339c13. For example:<message Minimum-permit-level="2" Total-group-probes="0" Filter-output="38" Total-values-from-index-keys="0" QueryID=" PrimaryDsearch$cbad31f6-d106-4b42-9506-ba36719a7c41" Total-values-from-data-page="0" Filter-input="38" Total-not-in-groups-cache-hits="0" Total-matching-group-probes="0" Total-ACL-index-probes="0" Total-groups-in-cache-hits="0" Total-ACL-cache-hits="0"> <![CDATA[]]></message> When search auditing is turned on (default). • QueryID: Generated by xPlore to uniquely identify the query. Levels: 0 = null | 1 = none | 2 = browse | 3 = read | 4 = relate | 5 = version | 6 = write | 7 = delete • Filter-input: Number of results returned before security filtering. Reset the password from the xDB admin tool.select r_object_id from dm_type where name=’dm_group’ Verify that the ACL IDs are registered for the events dm_save.c. The not-in-groups cache was probed 30 times for this query. Click the connection icon to log in.event from dmi_registry where user_name=’ dm_fulltext_index_user’ Changing the administrator password Perform the following steps to reset the xPlore administrator password.2 Administration and Development Guide 53 . • Total-matching-group-probes: How many times the query added a group to the group-in cache. b. for example: ?. and dm_saveasnew. Managing the System • Total-ACL-cache-hits: Number of times the ACL cache contained a hit. • Total-group-probes: Total number of groups checked for user • Total-groups-in-cache-hits: Number of times the group-in cache contained a hit. 1. c. for groups that the user did not belong to: <USER_NAME>tuser4</USER_NAME> <TOTAL_INPUT_HITS_TO_FILTER>2200</TOTAL_INPUT_HITS_TO_FILTER> <HITS_FILTERED_OUT>2000</HITS_FILTERED_OUT> <GROUP_IN_CACHE_HIT>0</GROUP_IN_CACHE_HIT> <GROUP_OUT_CACHE_HIT>30</GROUP_OUT_CACHE_HIT> <GROUP_IN_CACHE_FILL>0</GROUP_IN_CACHE_FILL> <GROUP_OUT_CACHE_FILL>3</GROUP_OUT_CACHE_FILL> Verifying security settings in the Content Server Use iAPI to verify that dm_fulltext_index_user is registered to receive events for security updates (changes to dm_acl and dm_group) with the following commands. Navigate to dsearch_home/dsearch/xhive/admin. returning 200 results to the client application. Verify that the group IDs are registered for the events dm_save and dm_destroy.sh. a.bat or XHAdmin. 2000 were filtered out. EMC Documentum xPlore Version 1. They return at least one ACL object ID and one group object ID: ?.c. • Total-values-from-data-page: Number of hits on owner_name. the query returned 2200 hits to filter. In the following example from the log. acl_name and acl_domain for the document retrieved from the data page.c. The cache was filled with 3 entries. dm_destroy. Of these hits.select r_object_id from dm_type where name=’dm_acl’ ?. • Total-not-in-groups-cache-hits: Number of times the groups-out cache contained a hit (groups the user does not belong to) • Total-values-from-index-keys: Number of index hits on owner_name. Run the script XHAdmin. All xPlore instances must share the same instance owner and password. acl_name and acl_domain for the document. The password is the same as your xPlore administrator password. • Total-ACL-index-probes: How many times the query added an ACL to the cache.select registered_id. 3.properties: admin=new_password • dsearch_home/jboss5. Use the new superuser password that you created. Edit indexserver-bootstrap.instance.0/server/DctmServer_PrimaryDsearch/conf/props/jmx-console-users. Enter the adminuser-password and superuser-password that you created in the xDB admin tool.sar/ web-console. The file is located in dsearch_home/jboss5. and set the admin password. Change the JBoss password. 4.password. 54 EMC Documentum xPlore Version 1. The passwords are now encrypted. If you use CLI for backup and restore. choose Reset admin password.Managing the System d.encrypted=new_password 6.2 Administration and Development Guide .properties of each instance.war/WEB-INF/classes/web-console-users. In the Federation menu.0/server/DctmServer_PrimaryDsearch/deploy/management/console-mgr. Stop all xPlore instances. edit the password in xplore.properties admin=new_password • dsearch_home//installinfo/instances/dsearch/PrimaryDsearch.1. They can be the same.xml to the following locations: • dsearch_home/jboss5. In the Database menu. 5.properties. choose Change superuser password. Copy the new.1. Enter the old and new passwords. 2.0/DctmServer_PrimaryDsearch/deploy/dsearch. This file is located in dsearch_home/dsearch/admin. Copy the encrypted password from indexserver-bootstrap.1. e.properties ess.properties.war/WEB-INF/classes. encrypted password from indexserverconfig. Restart all xPlore instances. The script names contain the index agent server name that you specified when you configured the index agent.6) • Configuring index agent filters • Migrating documents • Using ftintegrity • ftintegrity output • ftintegrity result files • Running the state of index job • state of index and ftintegrity arguments • Indexing documents in normal mode • Resubmitting documents for indexing • Removing entries from the index • Mapping content to collections • Indexing metadata only • Making types non-indexable • Configuring the index agent after installation • Setting up index agents for ACLs and groups • Documentum attributes that control indexing • Injecting data and supporting joins • Custom content filters • Troubleshooting the index agent Starting the index agent To start or stop the index agent service. Default: Indexagent.2 Administration and Development Guide 55 . Chapter 4 Managing the Index Agent This chapter contains the following topics: • Starting the index agent • Installing index agent filters (Content Server 6. • Windows EMC Documentum xPlore Version 1.5 SPX or 6. Start the instance. run the following commands.) 1. enter the user name and password for a valid repository user and optional xPlore domain name. c. the filters are already installed.2 Administration and Development Guide . 3. Choose one of the following: • Start Index Agent in Normal Mode: The index agent will index content that is added or modified after you start. maximum 8).5 SPX or 6.0/server/startIndexagent. Reset the CPS daemon_count to 1 and connection_pool_size to 4 after reindexing is complete.jsp. You can install index agent filters that exclude cabinets. • host is the DNS name of the machine on which you installed the index agent. Change the value of connection_pool_size to 2 e. Reindexing only: Temporarily configure multiple CPS daemons for reindexing. b.1. Edit the CPS configuration file in the CPS host directory dsearch_home/dsearch/cps/cps_daemon.sh or stopIndexagent. In the login page. Filters and custom routing are applied. f. d. Installing index agent filters (Content Server 6. • Start new reindexing operation: All content in the repository is indexed (migration mode) or reindexed. or indexagent_home\jboss5.1. The date and time you stopped is displayed. Note: Do not exclude folders if users make folder descend queries.Managing the Index Agent The Documentum Index_agent Windows service. Viewing index agent details Start the index agent and click Details. Choose Instances > Instance_name > Content Processing Service and click Stop CPS.0\server\startIndexagent. return to the previous screen and click Refresh. You see accumulated statistics since last index agent restart and objects in the indexing queue. Use your browser to start the index agent servlet. Restart all xPlore instances.cmd. • Continue: If you had started to index this repository but had stopped.sh. Stop the CPS instance in xPlore administrator. folders. 2. http://host:port/IndexAgent/login_dss. 4. then view Details again.jsp Every index agent URL has the same URL ending: IndexAgent/login_dss.7. Change the value of element daemon_count to 3 or more (default: 1.cmd or stopIndex_agent. Start the index agent UI. • Linux indexagent_home/jboss5. If you are connecting to Content Server version 6. or object types from indexing. 56 EMC Documentum xPlore Version 1. To refresh statistics. Proceed to the next step in this task. a.6) Documents indexed before the filters are installed are not filtered. Only the port and host differ. start indexing. 5. • port is the index agent port number that you specified during configuration (default: 9200). d.2 Administration and Development Guide 57 . Copy IndexAgentDefaultFilters. page 58. • Use the following DQL statement.0.dar. Specify the full path to IndexAgentDefaultFilters. a list of object IDs and names of the filters is returned: select r_object_id. This package is installed with Content Server at $DOCUMENTUM/product/version/install/composer/ComposerHeadless.dar including the file name.693 INFO FileConfigReader [http-0. 5.xml in the temporary working directory (excluding the file name) as the value of BUILDFILE. c.emc.properties is located in the subdirectory plugins/com. Find the following lines and verify that the IP address and port of the connection broker for the target repository are accurate.sh (Unix or Linux) to install the filters. Specify the repository superuser password as the value of the password attribute.fc.install dar="C:\Downloads\tempIndexAgentDefaultFilters.dar" docbase="DSS_LH1" username="Administrator" password="password" /> 3. Edit DarInstall. Specify the repository superuser name as the value of the username attribute. Managing the Index Agent 1. the FoldersToExclude filter was loaded: 2010-06-09 10:49:14. The home directory on the index agent host is referred to as indexagent_home. Specify a workspace directory for the generated Composer files. c.IDfCustomIndexFilter’ • Verify filter loading in the index agent log. In the following example. Specify your repository name as the value of the docbase attribute. (The xPlore installer installs the index agent.0/documentum.docbroker.0.sh (Linux or Unix) a. Specify the path to the file DarInstall.bat or DarInstall. for example: set ECLIPSE="C:\Documentum\product\6.0.external. Edit DarInstall. which is located in the logs subdirectory of the index agent JBoss deployment directory.port[N]=connection_broker_port EMC Documentum xPlore Version 1. DarInstall. /System/Sysadmin/Jobs.host[N]=connection_broker_ip_address dfc. and DarInstall. Specify the path to the composerheadless package as the value of ECLIPSE. dfc. Launch DarInstall.xml: a.documentum. b. as the value of the dar attribute.docbroker.config. Configure the filters in the index agent UI.bat (Windows) or DarInstall.) 2. For example: <emc.0-9820-1] Filter FoldersToExclude Value:/Temp/Jobs.5\install\composer\ComposerHeadless" set BUILDFILE="C:\DarInstall\temp" set WORKSPACE="C:\DarInstall\work" 4.bat (Windows) or DarInstall. The file dfc.xml from indexagent_home/setup/indexagent/filters to a temporary install directory.properties in the composerheadless package. /System/Sysadmin/Reports. See Configuring index agent filters. 6.sh.indexagent. If the filters are installed.dfc_1. Troubleshooting the index agent filters Open dfc.object_name from dmc_module where any a_interfaces=’ com. b.ide. Test whether the filters are installed. You do not have to stop the index agent JVM. page 70. security indexing in xPlore fails. 2. To remove document content from the index. In the Current Filters section. • Remove folder: List one or more folder paths to remove from the filter. filters are recorded in the index agent log. If you do not add this check. with a comma separator. For details.0/server/DctmServer_Indexagent/logs. page 59 58 EMC Documentum xPlore Version 1. with a comma separator. In the Update filters section. with a comma separator./Temp2 • Add folder(s): List one or more folder paths to filter out before indexing. Both scripts generate a file ObjectId-filtered-out.cabinet2 • Add cabinet: List one or more cabinets to filter out before indexing. You can create a custom index agent BOF filter that implements IDfCustomIndexFilter. Subtypes are not filtered out. For example: /Temp1/subfolder.2 Administration and Development Guide . with a comma separator. • Remove cabinet(s): List one or more cabinets to remove from the filter.type2. see How to make custom IA filter class with FAST compatible with xPlore. These types are indexed. 1. For example: type1. with a comma separator. folder paths.txt that records all IDs of filtered-out objects.1. Click Check or update filter settings. page 60. • Folders to exclude: Objects in the folder path that are filtered out before indexing. located in dsearch_home/jboss5.type3 • Add type: List one or more object types to filter out before indexing. page 65. Stop the index agent in the IA UI before you configure the filters. • Cabinets to exclude: Cabinets that are filtered out before indexing. If you are migrating a FAST index agent filter. or cabinets to exclude before indexing: • Types to exclude: Object types that are filtered out before indexing. you can modify the existing filters: • Remove type: List one or more object types to remove from the filter. Note: Do not exclude folders or cabinets if users make folder descend queries. your filter class must test for dm_sysobject. 3.Managing the Index Agent Invoking the filters in ftintegrity and state of index See Using ftintegrity. Base the filter on a date attribute. 4. see Injecting data and supporting joins. For example: cabinet1. xPlore index agent filters. with a comma separator. Does not include subtypes. you can set up filters for object types. see Removing entries from the index. Migrating documents • Migrating content (reindexing). For information on creating a BOF filter. When the index agent starts. Configuring index agent filters All index agents in a repository share the filter configuration. perform the following steps in migration mode.0/server/DctmServer_indexagent/deploy/IndexAgent.0/server or using the Windows services panel. 6. 2. Log in to the index agent UI and choose Start new reindexing operation. Migrating a limited set of documents If you wish to migrate few object types.xml. Migrating documents by object type To migrate or reindex many documents of a specific object type. 1. Update the indexagent. This indexes all indexable documents in the repository except for the documents excluded by the index agent filter. See Manually updating security . page 59 Migrating content (reindexing) Note: You cannot filter content after it has been indexed. 4. Set allow-move-existing-documents to false in index-config. Run the aclreplication script to update permissions for users and groups in xPlore. Start the index agent and log in to the UI. Substitute the object type name for the value of the parameter_value element.xml. page 65. 3. you can use the index agent UI. Edit indexagent. Stop and restart the index agent using the scripts in indexagent_home/jboss5. no more documents in the queue). Instead. EMC Documentum xPlore Version 1. Managing the Index Agent • Migrating documents by object type. For information on removing documents from the index. click Stop IA.war/WEB-INF/classes. which allows a restart of the indexing process. Updates are performed for existing documents. page 50. do not use the DQL or object ID file in the index agent UI. 2. 1.2 Administration and Development Guide 59 . Add a parameter_list element with the following content to the indexagent_instance element. located in indexagent_home/jboss5. Custom routing is applied. <parameter_list> <parameter> <parameter_name>type_based_migration</parameter_name> <parameter_value>TYPE_NAME_HERE</parameter_value> </parameter> </parameter_list> Note: The parameter_list element can contain only one parameter element.1. Choose Start new reindexing operation. 5. Index agent filters are applied in this migration. To skip custom routing for documents that are already indexed. see Removing entries from the index. edit indexserverconfig. page 59 • Migrating a limited set of documents. 7. When indexing has completed (on the Details page.xml file to index another type or change the parameter_value to dm_document. see Migrating documents by object type.1. page 59. To migrate a large set of objects by type. comparing the object ID and i_vstamp between the repository and xPlore.sh (Linux) and edit the script. page 65. or -EndDate. Repeat for each type that you want indexed. or collection. The value of this parameter is the full path to a file that contains sysobject IDs. Check Index selected list of objects. Optional: Add the option -checkfile to the script.. see Removing entries from the index. 60 EMC Documentum xPlore Version 1. This option compares the i_vstamp on the ACL and any groups in the ACL that is attached to each object in a specified list.. Substitute the repository instance owner password in the script (replace <password> with your password). -StartDate. To remove from the index documents that have already been indexed.. page 50..Managing the Index Agent 1. 5. page 61 Running the state of index job. domain. Start the index agent in normal mode. Groups that are out of sync are listed in ObjectiD-dctmOnly. -CheckType.7) are used to verify indexing after migration or normal indexing. The tool is a standalone Java program that checks index integrity against repository documents. page 62 state of index and ftintegrity arguments. page 63 ftintegrity and the state of index job (in Content Server 6.txt as the content of the elements acl_name|domain/acl i_vstamp in docbase/acl i_vstamp in xDB. Note: ftintegrity can be very slow. 3. page 61 ftintegrity result files. Use these instructions for running the index verification tool after migration or restoring a federation. For example: . Do not run ftintegrity when an index agent is migrating documents.. If this option is used with the option -checkUnmaterializeLWSO. 2. Select the type in the From dropdown list. Using ftintegrity ftintegrity output. Navigate to dsearch_home/setup/indexagent/tools. 1.FTStateOfIndex DSS_LH1 Administrator mypassword Config8518VM0 9300 -checkfile . Run ftintegrity as the same administrator user who started the instance. 2. 4. these latter options are not executed. one on each line.2 Administration and Development Guide . ACLs that are out of sync are listed in ObjectId-common-version-mismatch. Open the script ftintegrity_for_repositoryname. Replicate ACLs and groups to xPlore by running the aclreplication script. The tool automatically resolves all parameters except for the password. See Manually updating security . then check DQL. because it performs a full scan of the index and content. 3.txt as the content of the elements Mismatching i_vstamp group:/Sysobject ID: id/Group ids in dctm only:/group id.bat (Windows) or ftintegrity_for_repositoryname. It verifies all types that are registered to dmi_registry_table with the user dm_fulltext_index_user. 2 Administration and Development Guide 61 . so security is updated. They were filtered out by index agent filters. In the example. add the usefilter argument to ftintegrity (slows performance). or they are objects in the index agent queue that have not yet been indexed. • objects in Index Server only: Any objects here indicate objects that were deleted from the repository but the updates have not yet propagated to the index. There are 147 objects in the repository that are not in the xPlore index. – New objects not yet indexed. To eliminate filtered objects from the repository count. • different ivstamp: Objects that have been updated in Content Server but not yet updated in the index. – Objects filtered out by index agent filters. • match ivstamp: Objects that have been synchronized between Content Server and xPlore. Interpreting the output: • objects from dm_acl and dm_group: Numbers fetched from repository (docbase) and xPlore. Managing the Index Agent ftintegrity output Output from the script is like the following: Executing stateofindex Connected to the docbase D65SP2M6DSS 2011/03/14 15:41:58:069 Default network framework: http 2011/03/14 15:41:58:163 Session Locale:en 2011/03/14 15:41:59:913 fetched 1270 object from docbase for type dm_acl 2011/03/14 15:41:59:913 fetched 1270 objects from xPlore for type dm_acl 2011/03/14 15:42:08:428 fetched 30945 object from docbase for type dm_sysobject 2011/03/14 15:42:08:428 fetched 30798 objects from xPlore for type dm_sysobject 2011/03/14 15:42:08:756 fetched 347 object from docbase for type dm_group 2011/03/14 15:42:08:756 fetched 347 objects from xPlore for type dm_group 2011/03/14 15:42:09:194 **** Total objects from docbase : 32215 **** 2011/03/14 15:42:09:194 **** Total objects from xPlore : 32068 **** 2011/03/14 15:42:09:194 3251 objects with match ivstamp in both DCTM and Index Server 2011/03/14 15:42:09:194 17 objects with different ivstamp in DCTM and Index Server 2011/03/14 15:42:09:194 147 objects in DCTM only 2011/03/14 15:42:09:194 0 objects in Index Server only ftintegrity is completed. ftintegrity result files The script generates four results files in the tools directory: EMC Documentum xPlore Version 1. the ACLs and groups totals were identical in the repository and xPlore. • objects in DCTM only: These objects are in the repository but not xPlore for one or more of the following reasons: – Objects failed indexing. Running the state of index job Repository configuration for Content Server 6. it records the object ID.primary_class from dmc_module where any a_interfaces=‘com. because they perform a full scan of the index and content. run the following DQL query. These objects were removed from the repository during or after migration.IDfCustomIndexFilter’ • ObjectId-indexOnly.txt: This report contains the object IDs and i_vstamp values of objects in the repository but not in the index. or new objects generated in the repository during or after migration. documents that were filtered out. • Total number of objects: Content correctly indexed.indexagent. This job is implemented as a Java method and runs in the Content Server Java method server. These objects could be documents that failed indexing.Managing the Index Agent • ObjectId-common-version-match.object_name. Do not run ftintegrity or the dm_FTStateOfIndex job when an index agent is migrating documents. select r_object_id.txt: This file contains the object IDs and i_vstamp values of all objects in the index and the repository and having identical i_vstamp values in both places.txt This report contains the object IDs and i_vstamp values of objects in the index but not in the repository.2 Administration and Development Guide . and i_vstamp value in the index. You can resubmit this list after you start the index agent in normal mode.txt: This file records all objects in the index and the repository with identical object IDs but nonmatching i_vstamp values.documentum. Navigate to the file and then choose Submit. The state of index job compares the index content with the repository content. i_vstamp value in the repository. Open xPlore Administrator > Reports and choose Document processing error summary. You can resubmit this list after you start the index agent in normal mode.txt file into the index agent UI to see errors for those files. • ObjectId-dctmOnly. Note: ftintegrity and the dm_FTStateOfIndex job can be very slow. • Status of the index server: Disk space usage. If one or more rows are returned. The ftintegrity script calls the dm_FTStateOfIndex job. The error codes and reasons are displayed. check Index selected list of objectsand then check Object file. a filter was applied. Click Object File and browse to the file. The mismatch is on objects that were modified during or after migration. and objects with no content 62 EMC Documentum xPlore Version 1. Execute the state of index job from Documentum Administrator (DA). The job generates reports that provide the following information: • Index completeness and comparison of document version stamps. To check whether filters were applied during migration. content that had some failure during indexing.fc. • ObjectId-common-version-mismatch.7 SPx installs a job called state of index. For each object. before the event has updated the index. You can input the ObjectId-common-version-mismatch. You can also use ftintegrity tool to check the consistency between the repository and the xPlore index. Click Object File and browse to the file. After you have started the index agent. instance statistics. and process status. -check_unmaterial. ftintegrity arguments are not. -check_type -checkType Specifies a specific object type to check (includes subtypes). ACLs that are out of sync will be listed in ObjectId-common-version-mismatch. The default value is 1000. This option compares compares the i_vstamp on the ACL and any groups in the ACL that is attached to each object in a specified list. Number of objects to be retrieved from the index in each not option) batch..txt as the content of the elements acl_name|domain/acl i_vstamp in docbase/acl i_vstamp in xDB. view the job properties in Documentum Administrator and change the state to inactive. with some slight differences. Job arguments are case sensitive. For dual mode (FAST and xPlore). the user is dm_fulltext_index_user_01.. -StartDate. Default: All collections. -CheckType. Other types will not be checked. for range comparison. -check_file -CheckFile The value of this parameter is the full path to a file that contains sysobject IDs. Table 6 State of index and ftintegrity arguments Job arg ftintegrity option Description -batchsize batchsize (argument. -end_date -EndDate Local end date of sysobject r_modify_date. Groups that are out of sync will be listed in ObjectiD-dctmOnly. Cannot use this argument with the ftintegrity script.2 Administration and Development Guide 63 . Managing the Index Agent To disable the job. one on each line. -checkLWSO Sets whether to check unmaterialized lightweight ized_lwso sysobjects during comparison. -collection_name Not available Compares index for the specified collection to data in the repository.txt as the content of the elements Mismatching i_vstamp group:/Sysobject ID: id/Group ids in dctm only:/group id. If this option is used with the option -checkUnmaterializeLWSO. Default: false. or -EndDate. state of index and ftintegrity arguments state of index and ftintegrity arguments are similar. You can set the state of index job argument values in DA using the ftintegrity form of the argument. EMC Documentum xPlore Version 1. Format: MM/dd/yyyy HH:mm:ss -ftengine_standby -ftEngineStandby Dual mode (FAST and xPlore on two Content Servers) only: set the parameter -ftEngineStandby to true: -ftEngineStandby T -fulltext_user -fulltextUser Name of user who owns the xPlore instance. these latter options will not be executed. indexable documents are queued up for indexing. 4. Migration also reindexes all ACLs and groups. page 58. 1. 2. Navigate to the file ObjectId-common-version-mismatch. -start_date -StartDate Local start date of sysobject r_modify_date.txt file to remove all data from the file except object IDs.Managing the Index Agent Job arg ftintegrity option Description -get_id_in_indexing Not available If specified. Use ftintegrity or the state of index job to generate the list. see Configuring index agent filters. Can be used to evaluate the time period since the last backup. Instead. In addition. Indexing documents in normal mode In normal mode. Start the index agent in normal mode. Remove all data from the file ObjectId-common-version-mismatch. the job is installed with the -queueperson and -windowinterval arguments set. ObjectId-in-indexing. Edit the output file ObjectId-common-version-mismatch. Default: 1. 64 EMC Documentum xPlore Version 1. keeping security up to date.txt. You can select objects for indexing with a DQL statement or a list of objects. Default: False. page 60.txt and then click Submit.txt. -usefilter value -usefilter Evaluates results using the configured index agent filters. Start the index agent and click Start index agent in normal mode. Cannot use this argument with the ftintegrity script. IDs that have not yet been indexed will be dumped to a file. for range comparison. In the index agent UI. Format: MM/dd/yyyy HH:mm:ss -timeout_in_minute -timeout Number of minutes to time out the session. 1. You can also index a small set of documents in the normal mode page of the index agent. Note: Do not use this option for many documents. Submit either a file of object IDs or DQL. check Index selected list of objects and then check Object file.2 Administration and Development Guide . 3. For information on filters. 2. See Using ftintegrity. use migration mode (reindexing). Resubmitting documents for indexing You submit a list of objects for indexing in the index agent UI. The job runs more slowly with -usefilter. You get a page that allows you to input a selected list of objects for indexing. Default: false. Run ftintegrity or the state of index Content Server job to get a list of objects that failed in indexing. The -queueperson and -windowinterval arguments are standard arguments for administration jobs and are explained in the Content Server Administration Guide. which allows a restart of the indexing process. Change the value of the key file_contains_id_to_delete to the path to your object IDs. an integer is appended to the IndexAgent WAR file name. 3. Content Server is on the host Dandelion and filestore_01 is on the same host at the directory /Dandelion/Documentum/data/repo1/content_storage_01. On the index agent host. Mapping content to collections • Sharing content storage.similar entry for each file store --> EMC Documentum xPlore Version 1. The content storage area must be mountable as read-only by the Index Agent and xPlore hosts. Make sure that the host and port values correspond to your environment. as the value of the key ids_to_delete. you can choose to share the content storage. Save the list of object IDs in a text file.war. This temporary content is deleted after it has been indexed. The index agent and xPlore server are on a separate host with a map to the Content Server host: /mappingtoDandelion/repo1/content_storage_01. Alternatively. 4. 5. CPS has direct read access to the content. you can list the object IDs. Specify the store name and local mapping for each file store. You map the path to the file store in index agent web application. page 65 • Mapping Server storage areas to collections. 2.0/server/DctmServer_Indexagent/deploy/IndexAgent. Open deletedocs. For performance reasons. page 66 Sharing content storage By default. You can execute a DQL query to get object IDs of the documents that you wish to delete from the index. which is located in indexagent_home/jboss5. 1. change the value of the child element all_filestores_local to true.2 Administration and Development Guide 65 . open indexagent. Navigate to dsearch_home/dsearch/xhive/admin.war/WEB-INF/classes. for example. or objects that meet other criteria such as dates. separated by commas. 1. If you installed multiple index agents on this host. add a file store map within the exporter element. from the index. IndexAgent1. The following map is added to the exporter element: <local_filestore_map> <local_filestore> <store_name>filestore_01</store_name> <local_mount>/mappingtoDandelion/repo1/content_storage_01 </local_mount> </local_filestore> <!-. the index agent retrieves content from the content storage area to the index agent temporary content location. Managing the Index Agent Removing entries from the index You can remove certain object types. 2. Set the value of dss_domain to the xPlore domain from which you wish to delete indexed documents. • If the file system paths are different. With shared content storage. In the following example. Set the path in the exporter element: • If the file system paths to content are the same on the Content Server host and xPlore host.1. No content is streamed.xml.properties in a text editor. and then restart the Content Server.xml and restart the index agent instance. you can mount the content storage to the xPlore index server host and set all_filestores_local to true. File store mapping to a collection allows you to keep collection indexes separate for faster ingestion and retrieval. Create a local file store map as shown in the following example: <all_filestores_local>true</all_filestores_local> <local_filestore_map> <local_filestore> <store_name>filestore_01</store_name> <local_mount>\\192. 3.similar entry for each file store --> </local_filestore_map> Note: Update the file_system_path attribute of the dm_location object in the repository to match this local_mount value.195.2 Administration and Development Guide .similar entry for each file store --> </local_filestore_map> Mapping Server storage areas to collections A Content Server file store can map to an xPlore collection. Save indexagent. For better performance. Figure 9 Filestore mapping to an xPlore collection You can create multiple full-text collections for a repository for the following purposes: • Partition data 66 EMC Documentum xPlore Version 1.129\DCTM\data\ftwinora\content_storage_01 </local_mount> </local_filestore> <!-.168.Managing the Index Agent </local_filestore_map> Example with UNC path: <local_filestore_map> <local_filestore> <store_name>filestore_01</store_name> <local_mount>\\CS\e$\Documentum\data\dss\content_storage_01 </local_mount> </local_filestore> <!-. 2. and 02 to ’coll02’. located in the indexing agent WAR file in the directory ($DOCUMENTUM//jboss5.can_index F save. Each repository has one default collection named default. the filter overrides this setting.xml.2 Administration and Development Guide 67 . 2. In Documentum Administrator. EMC Documentum xPlore Version 1.c.l. As a result. If the option is checked but you have enabled an index agent filter. 3. filestore_01 maps to collection ’coll01’. contents of that format are not full-text indexed.dm_format where name = ’tiff’ set. In the following example.war/WEB-INF/classes). The rest of the repository is mapped to the default collection. Uncheck Enable for indexing to exclude this type from indexing.1. <partition_config> <default_partition> <collection_name>default</collection_name> </default_partition> <partition> <storage_name>filestore_01</storage_name> <collection_name>coll01</collection_name> </partition> <partition> <storage_name>filestore_02</storage_name> <collection_name>coll02</collection_name> </partition> </partition_config> Indexing metadata only Use iAPI to set the can_index attribute of a dm_format object to F(alse). Right-click for the list of attributes.c.0/server/DctmServer_IndexAgent/deploy/IndexAgent. Add partition-config and its child elements to the element index-agent/indexer_plugin_config/indexer to map file stores to collections. This setting turns off indexing events. Managing the Index Agent • Scale indexes for performance • Support storage-based routing 1. Open indexagent.l Making types non-indexable 1.c. select the object type (Administration > Types in the left pane). For example: retrieve. Restart the index agent web application after editing this file. not the size of the text within the content. You set the actual document size.war/WEB-INF/classes. Setting up index agents for ACLs and groups By default. (The configurator is the file configIndexagent.Managing the Index Agent Configuring the index agent after installation The Documentum index agent is configured for the first time through the index agent configurator. and set the value of parameter_value to aclgroup as follows: 68 EMC Documentum xPlore Version 1.sh in indexagent_home/setup/indexagent.0/server/DctmServer_Indexagent2/deploy/IndexAgent. Set the Max text threshold in bytes (default: 10 MB). To set the maximum document size.xml file. you select a local content temporary staging location. (This file is located in indexagent_home/jboss5. see Documentum xPlore Installation Guide. Add an acl_exclusion_list and group_exclusion_list element to the parent element indexer_plugin_config/generic_indexer/parameter_list. edit the contentSizeLimit parameter within the parent element exporter.xml for the second index agent. For information on the configurator. see Index agent configuration parameters. Limit content size for indexing You can set a maximum size for content that is indexed.xml. Exclude ACL and group attributes from indexing By default.0/server/DctmServer_Indexagent/deploy/IndexAgent.1.war/WEB-INF/classes. which is located in indexagent_home/jboss5. Create a second index agent.) 2. you configure an index agent for each Documentum repository that is indexed. Run the index agent configurator and give the agent instance a name and port that are different from the first agent. For descriptions of the settings. page 280. Parameter default values have been optimized for most environments. You can also set up multiple index agents to index dm_acl and dm_group separately from sysobjects. which can be run after installing xPlore. You can change this location by editing the local_content_area element in indexagent. Add one parameter set to your new indexagent. Note: For multi-instance xPlore.0/server/DctmServer_Indexagent/deploy/IndexAgent. Note: You can also limit the size of text within a file by configuring the CPS instance.1. They can be changed later using iAPI or by editing indexagent. Set the value of parameter_name to index_type_mode. the temporary staging area for the index agent must be accessible from all xPlore instances.) 3. You can specify that certain attributes of ACLs and groups are not indexed. Default: 20 MB.bat or configIndexagent.war/WEB-INF/classes. Eit indexagent.xml. 1. This file is located in indexagent_home/jboss5. The value is in bytes.1. all attributes of ACLs and groups are indexed.2 Administration and Development Guide . Change the local content storage location When you configure the index agent. xml for sysobjects (the original index agent).. In the indexagent.xml. turn off facet compression. the document is indexed.c. format_class from dm_format where name = ’bmp’ To find all formats that are indexed. page 58.0/server or the Windows services. – If format_class is ft_always. Sample DQL to determine these attribute values for the format bmp: select can_index.) Supporting millions of ACLs If you have many ACLs (users or groups). add a similar parameter set. If the object has renditions that are not of the format_class ft_always or ft_preferred. Note: Index agent filters can override the settings of a_full_text and can_index. It is reserved for use by Content Server client applications.can_index from dm_format The dm_ftengine_config object has a repeating attribute ft_collection_id that references a collection object of the type dm_fulltext_collection. Metadata is indexed. Change the value of the compress attribute to false. find the subpath element whose path attribute value is dmftsecurity/acl_name. Each ID points to a dm_fulltext_collection object. Managing the Index Agent <indexer_plugin_config> <generic_indexer> <class_name>… </class_name> <parameter_list> . renditions are examined starting with the primary rendition. 5.2 Administration and Development Guide 69 .select name. and set the value of parameter_value to sysobject. the document is indexed. content is indexed based on the following attributes on the dm_format associated with the document: – If can_index is true. For information on viewing and updating this file. the rendition is indexed. EMC Documentum xPlore Version 1. (Use the scripts in indexagent_home/jboss5. see Modifying indexserverconfig. If format_class is ft_preferred (a preferred rendition). In indexserverconfig. Set the value of parameter_name to index_type_mode.1.. See Configuring index agent filters.xml. use the following command from iAPI: ?. <parameter> <parameter_name>index_type_mode</parameter_name> <parameter_value>aclgroup</parameter_value> </parameter> </parameter_list> </generic_indexer> </indexer_plugin_config> 4. the content is not indexed. page 42. Restart both index agents. The first rendition with can_index set to true is indexed. • If a_full_text attribute is true. Documentum attributes that control indexing Documents are selected for indexing in the Content Server based on the following criteria: • If a_full_text attribute is false. The following pseudocode example adds data from an attached object into the DFTXML for the main object.getDocument(). create a type-based object (TBO) in DFC that denormalizes multiple objects into a single XML structure. To inject metadata or content. The following figure shows the object model.Managing the Index Agent Injecting data and supporting joins You can enhance document indexing with metadata or content from outside a Documentum repository. Document document = context.item( 70 EMC Documentum xPlore Version 1. Define specific indexes for data that is added to these nodes. the metadata is added to the dmftcustom node of DFTXML.getElementsByTagName( DfFtXmlElementNames. Content is added to the dmftcontent node. For information on creating TBOs. DFC invokes your customization for objects of the custom type. refer to Documentum Foundation Classes Development Guide. Related content such as an MS Word document or a PDF is referenced under dmftcontent. Element rootElem = nodes. To support queries on multiple related objects (joins). When you create a TBO for your type and deploy the TBO. public class AnnotationAspect extends DfSysObject {protected void customExportForIndexing (IDfFtExportContext context) throws DfException { super. Figure 10 Custom indexing object model Sample TBO class Extend DfPersistentObject and override the method customExportForIndexing as shown in the following example.getLength() > 0 ? (Element) nodes. The example triggers reindexing of the main object when the attached object is updated.customExportForIndexing(context). In xPlore indexing.2 Administration and Development Guide . create a type-based object (TBO) in DFC and deploy it to the repository. It also prevents the attached object from being indexed as a standalone object. The denormalized data is placed under the dmftcustom node of DFTXML. The customization adds the data for the dm_note object to the dm_document object.//gets the main document NodeList nodes = document.ROOT ). ROOT ). rootElem.getLength() > 0 ? (Element) nodes. if (rootElem == null) { rootElem = document.getString("r_modifier”). dmftcustom.createElement("mediaAnnotations").getElementsByTagName( DfFtXmlElementNames. } nodes = rootElem. Document doc = domBuilder. authorElement.createElement("annotation"). // Add the node content annotationNode. try { domBuilder = DocumentBuilderFactory.createElement( DfFtXmlElementNames.CUSTOM ).2 Administration and Development Guide 71 . // This will get the dm_note object IDfDocument note = (IDfDocument) getSession(). mediaAnnotations. try { IDfId id = childRelations.getId("child_id").appendChild(annotationNode).setTextContent(note. Managing the Index Agent 0 ) : null.getObject(id). } Element mediaAnnotations = document.appendChild( dmftcustom ). // Add a node for the author of a note Element authorElement = document. annotationNode.next()) { Element annotationNode = document.appendChild(authorElement) } catch (SAXException e) { // Log the error catch (IOException e) EMC Documentum xPlore Version 1. } IDfCollection childRelations = getChildRelatives("dm_annotation").parse(xmlContent). DocumentBuilder domBuilder = null. ByteArrayInputStream xmlContent = note. Element dmftcustom = nodes.CUSTOM ). if (dmftcustom == null) { dmftcustom = document.item( 0 ) : null.appendChild( rootElem ).getTypedObject(). while (childRelations. } catch (ParserConfigurationException e) { throw new DfException(e).appendChild(doc). document.createElement("author").newInstance().newDocumentBuilder().appendChild(mediaAnnotations).getContent().createElement( DfFtXmlElementNames. . false. new DfTime().next()) { IDfId id = parentRelations. the object is queued for indexing by calling IDfDocument. when the related object is modified. IDfDocument annotatedObject = (IDfDocument) getSession().queue. When the object is saved after modification. <dmftcustom> <mediaAnnotations> <annotation> <content> This is my first note </content> <author>Marc</author> </annotation> <annotation> <content> This is my second note </content> <author>Marc</author> </annotation> </mediaAnnotations> </dmftcustom> </dmftdoc> Triggering document reindexing In the previous example.getTypedObject(). annotatedObject. String versionLabels. "").getObject(id).queue("dm_fulltext_index_user". 72 EMC Documentum xPlore Version 1.. while (parentRelations.2 Administration and Development Guide . extendedArgs). " dm_force_ftindex". you can prevent indexing. Use a custom preindexing filter as described in Custom content filters.getId("parent_id"). IDfCollection parentRelations = getParentRelatives("dm_annotation"). public class NoteTbo extends DfDocument implements IDfBusinessObject {protected synchronized void doSaveEx (boolean keepLock. versionLabels. page 73. }}} Preventing indexing of attached objects If the attached object does not need to be searchable. The following TBO for the related type (dm_note) triggers reindexing of the dm_document object. Object[] extendedArgs) throws DfException { super.doSaveEx(keepLock.close(). The TBO applies to dm_note objects.Managing the Index Agent { // Log the error } } childRelations.}} Generated DFTXML <dmftdoc> . the main object is not reindexed. 1. indexagent.SKIP.impl=DEBUG • log4j. Managing the Index Agent Indexing the injected metadata In addition to your TBO.properties.IDfCustomIndexFilter.equals(objectTypeName)) { return DfCustomIndexFilterAction. or object types from indexing. } } Troubleshooting the index agent The index agent log uses log4j. See Creating custom indexes. Configure these filters using the index agent UI.fc.documentum. The following code excludes objects of type dm_note from indexing. see Configuring index agent filters.emc. You can implement other kinds of filters with the DFC interface com.documentum. This file is located in the WEB-INF/classes directory of the index agent WAR file.getString("r_object_type"). set up an index for your injected metadata.INDEX. For more information. if ("dm_note". Deploy your customization as a BOF module. Custom content filters You can configure index agent filters that exclude cabinets.fulltext=DEBUG. } return DfCustomIndexFilterAction.server.2 Administration and Development Guide 73 .documentum. page 121. TBO filtering of indexable objects public class NoteFTFilter extends DfSingleDocbaseModule implements IDfCustomIndexFilter { public DfCustomIndexFilterAction getCustomIndexAction ( IDfPersistentObject object) throws DfException { // We don’t filter any non-sysobjects such as Groups and ACLs // if (!(object instanceof IDfSysObject)) return DfCustomIndexFilterAction. folders.core.category.INDEX.F1 EMC Documentum xPlore Version 1.com. You can change the amount of information by setting the following properties: • log4j.category. page 58. String objectTypeName = object.com. You can filter content by creation date or some custom attribute value. apache. If you see a status 500 on the index agent UI. If the repository name is reported as null.JasperException: An exception occurred processing JSP page /action_dss.index.. and they are not released by closing the command window. examine the stack trace for the index agent instance.078 ERROR IndexingStatus [SubmitterThread] [DM_INDEX_AGENT_PLUGIN] Document represented by key 0881540f80000322 failed to index into collection knowledgeworker. restart the repository and the connection broker and try again.. Restarting the index agent If you stop and restart the index agent before it has finished indexing a batch of documents that you manually submitted through the index agent UI. this error appears in the browser: org. you see the following error message when you attempt to stop the agent: 74 EMC Documentum xPlore Version 1.xhive. and xPlore through the firewall.core.0/server.documentum.XhiveException [TRANSACTION_STILL_ACTIVE] The operation can only be done if the transaction is closed Enable connections between the index agent host.1. root cause com.common.emc. which is located in dsearch_home/jboss5. Startup problems Make sure that the index agent web application is running. Cannot stop the index agent If you have configured two index agents on the same host and port. Make sure the user who starts the index agent has permission in the repository to read all content that is indexed. 2009-01-30 10:11:14.core. the Content Server. If the index agent web application starts with port conflicts.fulltext.jsp at line 39 .. verify that you have instantiated the index agent using the start script in dsearch_home/jboss5.2 Administration and Development Guide .IndexServerRuntimeException: com.client. The index agent locks several ports. If a custom routing class cannot be resolved. resubmit the indexing requests that were not finished.0/server/DctmServer_Indexagent/logs.jasper. For example.documentum.Managing the Index Agent Indexing errors are reported in IndexAgent. verify that the Documentum Indexagent service is running. On Windows. run as the same administrator user who started the instance.fulltext. If you run a stop script. On Linux.1.log.FtFeederException: Error while instantiating collection routing custom class.emc. stop the index agent with the script. error:/indexserver/UpdateServlet Firewall issues The following error is logged in indexagent..log: com.error. select distinct t.user_name from dm_type t. Some types of errors are the following: • [DM_FULLTEXT_E_SEARCH_GET_ROW_FAIL. To sort by queue state when there is a large queue.c. you can check the index agent queue.] There is nothing in the index. task_state from dmi_queue_item where name like ’%fulltext%’ group by task_state To check the indexing status of a single object. securityDomain=jmx-console You can kill the JVM process and run the index agent configurator to give the agents different ports. use the following DQL command in Documentum Administrator: select count(*).. i.] Caused by incorrect query plugin (Content Server hotfix for 6..lang.---------------. Checking the index queue in Documentum Administrator In Documentum Administrator 6.name.SecurityException: Failed to authenticate principal=admin.5 SPx) • [DM_FULLTEXT_E_QUERY_IS_NOT_FTDQL. remove all dmi_queue_items with the following command. ?..c.task_state..5 SPx) • [DM_FULLTEXT_E_EXEC_XQUERY_FAIL.5 SP3 and higher. Use the following DQL to check the status of the queue item: select task_name. and the type of failure. you can find the object ID and type.message from dmi_queue_item where name= username and event=’FT re-index‘ Cleaning up the index queue You can clean up the index queue before restarting the index agent.---------------------- dm_group 0305401580000104 dm_fulltext_index_user dm_acl 0305401580000101 dm_fulltext_index_user dm_sysobject 0305401580000105 dm_fulltext_index_user EMC Documentum xPlore Version 1. get the queue item ID for the document in the details screen of the index agent UI.item_id. Using iAPI in Documentum Administrator. The drop-down list displays Indexing failed.] Caused by incorrect query plugin (Content Server hotfix for 6. Indexing in progress.. From the Indexing failed display. Managing the Index Agent Exception in thread "main" java.r_object_id.. Warning. dmi_registry i where t.delete dmi_queue_item object where name = ’dm_fulltext_index_user’ To check registered types and the full-text user name.user_name like ’%fulltext%’ You see results like the following: name r_object_id user_name --------------------------.r_object_id = i.2 Administration and Development Guide 75 . Navigate to Administration > Indexing Management > Index Queue. Awaiting indexing. t. and All. use the following iAPI command.registered_id and i. inserting the full-text user for the value of name: ?. indexed count. a status summary is displayed until indexing has completed.1. page 60. Warning Metadata indexed by not content Metadata indexed but not content Success Documents indexed by xPlore Documents indexed by xPlore Index agent timeout DM_INDEX_AGENT_ITEM_TIMEOUT errors in the index agent log indicate that the indexing requests have not finished in the specified time (runaway_item_timeout). These indexing requests are stored in the xPlore processing queue. Table 7 Comparing index agent and xPlore administrator indexing metrics Metric Index agent xPlore administrator Failed Documents not submitted to xPlore Errors in CPS processing. For these timeouts. you see the following statistics: • Active items: Error count.2 Administration and Development Guide . completed count. KB/sec indexed.) 76 EMC Documentum xPlore Version 1. plugin blocking max time. It does not mean that the documents were not indexed.Managing the Index Agent Checking indexing status The index agent UI displays indexing status. The following table compares the processing counts reported by the index agent and xPlore administrator. Count does not include failures of the index agent. total count.0/server/DctmServer_Indexagent/deploy/IndexAgent. When you view Details during or after an indexing process. change the following values in indexagent. • List of current internal index agent threads When you start an indexing operation.xml. Reindexing You can submit for reindexing the list of objects that ftintegrity generates (Using ftintegrity.war/WEB-INF/classes. Run the xPlore administrator report Document Processing Error Summary to see timeouts. The summary disappears when indexing has completed. • Averages: Pause time. number of indexed docs/sec. and failure count. • exporter/content_clean_interval: Increase. and warnings count. success count. last update timestamp. • Indexer plugin: Maximum call time • Migration progress (if applicable): Processed docs and total docs. • indexagent_instance/runaway_item_timeout: Set to same value as content_clean_interval. Click Refresh to update this summary. content and metadata not indexed. This file is located in dsearch_home/jboss5. On login. indexed content size. size. you can view information about the last indexing operation: Date and time. warning count. page 73. For username. Each error_config element contains the following elements: Table 8 Index agent error configuration Element name Description error_config Contains error_code. Managing the Index Agent To check on the status of queue items that have been submitted for reindexing..2 Administration and Development Guide 77 . stops indexing. action Valid value: stop Table 9 Error codes error_code Description UNSUPPORTED_DOCUMENT Unsupported format XML_ERROR XML parsing error for document content DATA_NOT_AVAILABLE No information available PASSWORD_PROTECTED Password protected or document encrypted MISSING_DOCUMENT RTS routing error INDEX_ENGINE_NOT_RUNNING xPlore indexing service not running EMC Documentum xPlore Version 1.) Locate the element error_configs.1." To resubmit one document for reindexingPut the object ID into a temporary text file. When the counter exceeds a configurable error_threshold. just after the closing tag of indexer_plugin_config. for example. the index agent performs the configured action. stop the index agent instance and edit the file indexagent.xml." If the task_state is failed.task_state. error_code See Troubleshooting the index agent. (This file is located in dsearch_home/jboss5. the message is "Incomplete batch. error_threshold Number of errors at which the action is executed. the action is executes. error_threshold. a counter is updated for each error message. delete dmi_queue_item object where name=username and event=’FT re-index’ Automatically stop indexing after errors When the index agent receives a response from xPlore. To edit the error thresholds.message from dmi_queue_item where name=username and event=’FT re-index’ If task_state is done. If error_threshold is exceeded.item_id. specify the user logged in to the index agent UI and start reindexing. To remove queue items from reindexingUse the following DQL. time_threshold and action elements.0/server/DctmServer_Indexagent/deploy/IndexAgent. specify the user logged in to the index agent UI and started reindexing. use the following DQL. select task_name.war/WEB-INF/classes. the message is "Successful batch.. Use the index agent UI to submit the upload: Choose Index selected list of objects >Object File option. For username.. time_threshold Time in seconds at which to check the counter.. . Language identification Some languages have been tested in xPlore. and others EMC Documentum xPlore Version 1.2 Administration and Development Guide 79 . For information on customizations to the CPS pipeline. See Manually updating security . page 102. (See the release notes for this version. first replicate security.) Many other languages can be indexed. see Custom content processing. Some languages are identified fully including parts of speech. page 50. Chapter 5 Document Processing (CPS) This chapter contains the following topics: • About CPS • Adding a remote CPS instance • Configuring a dedicated CPS • Administering CPS • Maximum document and text size • Configuring languages and encoding • Indexable formats • Lemmatization • Handling special characters • Configuring stop words • Troubleshooting content processing • Troubleshooting slow ingestion • Adding dictionaries to CPS • Creating European language user dictionaries • Custom content processing About CPS The content processing service (CPS) performs the following functions: • Retrieves indexable content from content sources • Determines the document format and primary language • Parses the content into index tokens that xPlore can process into full-text indexes If you test Documentum indexing before performing migration. 5. Then. If a language is not listed as one of the tested languages in the xPlore release notes. Entity recognition and logical fragments guide the tokenization of content.” Case sensitivity is not configurable. improving the quality of search experience. Specify whether the CPS instance performs linguistic processing (lp) or text extraction (te). special characters are substituted with white space. For Asian languages. This dedicated CPS reduces network overhead. See Handling special characters. For tested languages. or both (the all option). For a list of languages that CPS detects. For a high-volume environment with multiple xPlore instances. See Configuring a dedicated CPS. The installer adds a JBoss instance. 3. 2. page 81. Adding a remote CPS instance By default. CPS ear file. search requests (the search option). you can configure a dedicated CPS for each instance.Document Processing (CPS) require an exact match. you can install CPS on a separate host. linguistic features and variations that are specific to these languages are identified. Open Services > Content Processing Service in the tree and then click Add. Register the remote CPS instance in xPlore administrator. Each CPS service receives processing requests on a round-robin basis. White space White space such as a space separator or line feed identifies word separation.2 Administration and Development Guide . page 90. 80 EMC Documentum xPlore Version 1. see Basistech documentation. 1. Case sensitivity All characters are stored as lowercase in the index. Install the remote CPS instance using the xPlore installer. Enter the URL to the remote instance using the following syntax: http://hostname:port/services For example: http://DR:8080/services 4. every xPlore instance has a local CPS service. TE and LP are sent to CPS as a single request. Note: The remote instance must be on the same operating system as other xPlore instances. Specify whether the CPS instance processes indexing requests (the index option). the phrase "I’m runNiNg iN THE Rain” is lemmatized and tokenized as "I be run in the rain. and CPS native daemon on the remote host. Configure the instance for CPS only. If a value is not specified. search must be for an exact match. For example. white space is not used. You must have low network latency. To improve indexing or search performance. (On Windows. <content-processing-services analyzer="rlp" context-characters=" !..primaryNode="true". url="http://host2:9300/dsearch/"... All CPS services receive processing requests on a round-robin basis.?’&quot.> <content-processing-service usage="all" url=" http://host1:20000/services"> <content-processing-service usage="all" url=" http://host1:20000/services"> </content-processing-services> EMC Documentum xPlore Version 1. you can configure one or more CPS services to handle all processing requests for a specific xPlore instance. If a CPS instance is configured to process text only.. every xPlore instance has a local CPS service..bat or startCPS.&gt. the standalone instance is installed as an automatic service.. Add or change the capacity attribute on this element. cps_daemon.sh in dsearch_home/jboss5. For linguistic processing. The element is added when you install and configure a new CPS instance..1.. This element identifies each CPS instance. TE is logged in the message. b.log is located in cps_home/jboss5. with the following syntax: http://hostname:port/services/cps/ContentProcessingService?wsdl Check the CPS daemon log file cps_daemon.2 Administration and Development Guide 81 .xml. Dedicated CPS instances reduce network overhead. and the remote CPS instance processes text extraction.0/server..1. In the following example. Document Processing (CPS) a.. a local CPS instance analyzes linguistics./\[]{}"> <content-processing-service capacity="lp" usage="all" url="local"/> <content-processing-service capacity="te" usage="index" url=" http://localhost:9700/services"/> </content-processing-services> 6.. <node appserver-instance-name="DsearchNode2" xdb-listener-port="9430" primaryNode="false" .xml in an XML editor. <content-processing-services.0/server/cps_instance_name/logs.1. linguistic processing. In the following example.. Edit indexserverconfig. 1.. <node appserver-instance-name="PrimaryDsearch" . The capacity attribute determines whether the CPS instance performs text extraction. the log is in dsearch_home/jboss5. hostname="host1" name="PrimaryDsearch"> .. LP is logged...0/server/DctmServer_ PrimaryDsearch/logs. hostname=" host2" name="DsearchNode2"> . locate the content-processing-services element.:()-+=&lt. url="http://host1:9300/dsearch/". Configuring a dedicated CPS By default. two instances are shown in node elements.. Both kinds of processing log CF messages. or all. Stop all xPlore instances. Test the remote CPS service using the WSDL testing page. Restart the CPS instance using the start script startCPS.) 7. 2. For a local process. In indexserverconfig." special-characters="@#$%^_~‘*&amp.log for processing event messages.. The content-processing-services element specifies the two CPS instances that are available to all xPlore instances. For a remote CPS instance. For high-volume environments with multiple xPlore instances. .?’&quot. In the left pane.:()-+=&lt.. expand it. Click Stop CPS and then click Suspend.. Stop CPS: Select an instance in the xPlore administrator tree. Make sure that you have a remaining global CPS instance after the last node element." special-characters="@#$%^_~‘*&amp.) For example: <content-processing-services analyzer="rlp" context-characters="!....:()-+=&lt. 82 EMC Documentum xPlore Version 1.. <properties> <property value="10000" name="statusdb-cache-size"/> </properties> <content-processing-services analyzer="rlp" context-characters=" !. Click Start CPS and then click Resume.. expand the instance and click Content Processing Service./\[]{}"> </content-processing-services> <logging>. 2. like the following: <node appserver-instance-name="PrimaryDsearch" .. Place their definitions within the content-processing-services element that you created.... url="http://host1:9300/dsearch/".primaryNode="true". Start CPS: Select an instance in the xPlore administrator tree.2 Administration and Development Guide .?’&quot.. Click Configuration . where they become dedicated CPS instances for that node..?’&quot. Move the shared CPS instances to each node. Create a content-processing-services element under each node element. and choose Content Processing Service.. 4..&gt. the CPS manager tries to restart it to continue processing. <properties> <properties> <property value="10000" name="statusdb-cache-size"/> </properties> <content-processing-services analyzer="rlp" context-characters=" !. (You have moved one or more of the content-processing-service elements under a node.." special-characters="@#$%^_~‘*&amp.. hostname="host1" name="PrimaryDsearch"> . 1.. and choose Content Processing Service.... hostname="host1" name="PrimaryDsearch"> . For example: <node appserver-instance-name="PrimaryDsearch" ./\[]{}"> <content-processing-service usage="all" url="local"/> </content-processing-services> Administering CPS Starting and stopping CPS You can configure CPS tasks in xPlore administrator.... expand it.&gt..Document Processing (CPS) 3.:()-+=&lt.primaryNode="true". url="http://host1:9300/dsearch/". If CPS crashes or malfunctions./\[]{}"> <content-processing-service usage="all" url=" http://host1:20000/services"> </content-processing-services> <logging>.&gt. after the node/properties element." special-characters="@#$%^_~‘*&amp. 3 documents with 10 MB text.war/WEB-INF/classes. in bytes.0/server/DctmServer_Indexagent/deploy/IndexAgent. You can adjust up these values for a 64-bit environment. which is located in indexagent_home/jboss5. Maximum setting: 2 GB. Document Processing (CPS) CPS status and statistics To view all CPS instances in the xPlore federation. The value is in bytes. Edit the contentSizeLimit parameter within the parent element exporter. with the default of 30 MB.xml. CPS will tokenize only the metadata. Max text threshold sets the size limit. all documents in the batch fail.1. maximized for 32-bit environment. If a document content text size is larger than this value. For a hypothetical example. or 300 documents of 100 KB of text would fill up the batch. if an email has a zip attachment. Includes expanded attachments. Default: 10485760 (10 MB). Larger values can slow ingestion rate and cause more instability. 30 documents of 1 MB text. ingestion performance can degrade under heavy load. For example. Larger documents will be skipped. see Adding a remote CPS instance. optimized for a 32-bit environment. EMC Documentum xPlore Version 1. If you increase this threshold. only the document metadata is tokenized. page 80. not the content. For more information on remote instances. Maximum document and text size The default values for maximum document and text size have been optimized for 32-bit environments. If the batch text size is exceeded. expand the instance and click Content Processing Service. Maximum setting: 2 GB. Maximum text size in CPS batch Edit the CPS configuration file configuration. To view version information and statistics about a specific CPS instance.xml. the zip file is expanded to evaluate document size. Text maximum size Set the maximum size of text within a document and the text in CPS batch in CPS configuration. Incoming requests will be put on hold if the total content size for all documents in a CPS batch exceeds this parameter Default: 30 MB.2 Administration and Development Guide 83 . You can them refeed them through the index agent. expand Services > Content Processing Service. Edit max_data_per_process: The upper limit in bytes for a batch of documents in CPS processing. Above this size. Other documents in the same batch are not affected. Default: 20 MB. Choose an instance in xPlore administrator and click Configuration. Document maximum size Set the maximum document size in the index agent configuration file indexagent. which is located in the CPS instance directory dsearch_home/dsearch/cps/cps_daemon. the session locale from one of the following is used as the language for linguistic analysis: • Webtop login locale • DFC API locale or dfc. Edit the value of rebuild-index-embed-content-limit. Edit indexserverconfig.Document Processing (CPS) Embedded content used for index rebuild Set the maximum in xPlore administrator.xml and set the value of the file-limit attribute of the xml-content element. not embedded. Some languages are identified fully including parts of speech. 1. Embedded XML content maximum size Set the maximum size of XML content in bytes that is embedded within the xDB index. Configure a default language if one is not identified. Larger content is passed in a file. (See the release notes for this version.) Many other languages can be indexed. Configuring language identification The language of the content determines how the document is tokenized. CPS identifies the language of the document. CPS uses the following Documentum object attributes to identify the language of the metadata: object_name. title. 2. For example: <search-config><properties> 84 EMC Documentum xPlore Version 1. content is embedded in DFTXML and stored in the index. Open indexserverconfig. Add a property named ‘query-default-locale’ with the desired language under the search-config element. Below this value. The default language must be one of the supported languages listed in the linguistic_processor element of dsearch_home/dsearch/cps/cps_daemon/PrimaryDsearch_local_configuration. Default: 2048 bytes. Choose Indexing Service in the tree and click Configuration. • Metadata By default. subject. see the Basistech documentation. It will be used for rebuilding the index.xml in dsearch_home/config. Configuring languages and encoding Some languages have been tested in xPlore. and keywords. and others are restricted to an exact match.2 Administration and Development Guide . For the list of identified languages and encodings for each language.xml. For a query. Default: 512 KB.properties • iAPI and iDQL Content Server locale Languages are identified by content or metadata: • Content CPS uses the first 65536 characters in the document to identify the language of the document. During indexing. 2 Administration and Development Guide 85 .setString("session_locale". query-locale=en.. For example: <dmftcontentref content-type="" lang="en" encoding="utf-16le" .sessionconfig.7 documentation.. (Optional) Check the identified language for a document: Use xPlore administrator to view the DFTXML of a document. Look at the xPlore log event that prints the query string (in dsearch.> <message > <![CDATA[QueryID=primary$f20cc611-14bb-41e8-8b37-2a4f1e135c70..c. For some formats. it is listed in the xPlore administrator report Document Processing Error Detail. under Data Management. For a full list of supported formats. see Oracle Outside In 8. locale) Indexable formats Some formats are fully indexed.. For example: <linguistic-process> <element-for-language-identification name="object_name"/> <element-for-language-identification name="title"/> <element-for-language-identification name="subject"/> <element-for-language-identification name="keywords"/> </linguistic-process> 4. Change the metadata that are used for language identification..getSessionConfig): IDfTypedObject. Document Processing (CPS) <property value="en" name "query-default-locale"/>.session_locale The DFC command to set session locale on the session config object (IDfSession.xml. The event includes the query-locale setting used for the query. If a format cannot be identified. Click the document in the collection view.xml. The language is specified in the lang attribute on the dmftcontentref element..3. See Modifying indexserverconfig..log of the primary xPlore instance). For example: <event timestamp. The session_locale attribute on a Documentum object is automatically set based on the OS environment. 5. Choose File format unsupported to see the list. To search for documents in a different language. change the local per session in DFC or iAPI.. (Optional) Change the session locale of a query. page 42. only the metadata is indexed.. Set an attribute as the value of the name on the element element-for-language-identification.> 6. The iAPI command to change the session_locale: set. 3. Lemmatization About lemmatization Configuring lemmatization Lemmatizing specific types or attributes EMC Documentum xPlore Version 1. (Optional) Check the session locale for a query. Validate your changes to indexserverconfig.> 7. The following queries are lemmatized: • IDfXQuery: The with stemming option is included. A document that contains swimming is found on a search for swim. company. Am. and win. Lemmatization and index size Lemmatization saves both the indexed term and its canonical form in the index. The extracted lemmas are actual words. saw is lemmatized to see or to saw depending on the context.2 Administration and Development Guide . • The DQL query has a search document contains (SDC) clause (except phrases). ponies becomes poni. Alternate lemmas Alternative forms of a lemma are also saved. For example. page 87. swim is identified as a verb.” This behavior contrasts with stemming. If you turn off alternate lemmas. Studies have found that some form of stemming or lemmatization is almost always helpful in search. a different normalization process in which stemmed words are reduced to a string that sometimes is not a valid word. The noun lemma swimming is also saved. • The query is built with the DFC search service. and is are normalized to "be. are. xPlore uses an indexing analyzer that performs lemmatization. 86 EMC Documentum xPlore Version 1. Multiple alternates for the lemma also increase the size of the index. See Configuring lemmatization. For example. winning. • The query from the client application contains a wildcard. a word like books is normalized into book by removing the plural marker. For example. Lemmatization analyzes a word for its context (part of speech). and the canonical form of a word (lemma) is indexed. you see variable results depending on the context of a word. effectively doubling the size of the index. Lemmatization is applied to indexed documents and to queries. the query select r_object_id from dm_document search document contains ‘companies winning’ produces the following tokens: companies.Document Processing (CPS) Troubleshooting lemmatization Saving lemmatization tokens About lemmatization Lemmatization is a normalization process that reduces a word to its canonical form. For example. For example. Evaluate your environment for index storage needs and lemmatization expectations ofr your users. Query lemmatization Lemmatization of queries is more prone to error because less context is available in comparison to indexing. Add or edit a linguistic-process element. To turn off alternate lemmas. If this element exists. Document Processing (CPS) Configuring lemmatization 1. Specify the maximum size of documents in bytes as the value of the attribute extract-text-size-less-than.xml located in dsearch_home/dsearch/cps/cps_daemon. page 42. see Extensible Documentum DTD. They are used in determining a summary or highlighting. Set the value attribute on the following property to false (default = true): <property name="com. The value attribute contains a value of the attribute. add an enable-lemmatization attribute to the domain element in indexserverconfig. If the extracted text does not exceed 262144 bytes (extract-text-size). Locate the category element for your documents (not ACLS and groups). page 292. save-tokens-for-summary-processing Child of element-with-name. For the DFTXML extensible DTD.alternatives" value="false"/> Lemmatizing specific types or attributes By default.2 Administration and Development Guide 87 . To turn off lemmatization for both indexing and search. modify the file cps_context. the parent element tokens are saved. 2. See Modifying indexserverconfig. the content of an element with the attribute dmfttype with a value of dmstring is lemmatized. element-with-attribute The name attribute on this element specifies the name of an attribute on an element. an element with the name dmftcustom is processed .basistech. Set the value to false. In the following example. This element can specify elements or their attributes that are lemmatized when indexed.xml in dsearch_home/config. <linguistic-process> <element-with-attribute name="dmfttype" value="dmstring"/> <element-with-name name="dmftcustom"> <save-tokens-for-summary-processing extract-text-size-=" 262144" token-size="65536"/> </element-with-name> EMC Documentum xPlore Version 1. element-for-language-identification Specifies an input element that is used by CPS to identify the language of the document. Several elements are specified for language identification. all input is lemmatized unless you configure lemmatization for specific types or attributes.xml. Table 10 linguistic-process element Element Description element-with-name The name attribute on this element specifies the name of an element that contains lemmatizable content.1 and higher.bl. as shown in the following table of child elements. Tokens will not be saved for larger content. Set the maximum size of tokens for the element as the value of the attribute token-size. 2. 1. the element content is lemmatized. Alternate lemmas are generated by default in xPlore 1. When the value is matched. Open indexserverconfig. These elements are in a DFTXML file that the index agent generates.xml. Document Processing (CPS) <element-for-language-identification name="object_name"/> ... </linguistic-process> Note: If you wish to apply your lemmatization changes to the existing index, reindex your documents. Troubleshooting lemmatization If a query does not return expected results, examine the following: • Test the query phrase or terms for lemmatization and compare to the lemmatization in the context of the document. (You can test each sample using xPlore administrator Test Tokenization. • View the query tokens by setting the dsearch logger level to DEBUG using xPlore administrator. Expand Services > Logging and click Configuration. Set the log level for dsearchsearch. Tokens are saved in dsearch.log. • Check whether some parts of the input were not tokenized because they were excluded from lemmatization: Text size exceeds the configured value of the extract-text-size-less-than attribute. • Check whether a subpath excludes the DFTXML element from search. (The sub-path attribute full-text-search is set to false.) • If you have configured a collection to save tokens, you can view them in the xDB admin tool. (See Debugging queries with the xDB admin tool, page 41. ) Token files are generated under the Tokens library, located at the same level as the Data library. If dynamic summary processing is enabled, you can also view tokens in the stored DFTXML using xPlore administrator . The number of tokens stored in the DFTXML depends on the configured amount of tokens to save. To see the DFTXML, click a document in a collection. 88 EMC Documentum xPlore Version 1.2 Administration and Development Guide Document Processing (CPS) Figure 11 Tokens in DFTXML Saving lemmatization tokens You can save the tokens of metadata and content. Tokens are used to rebuild the index. 1. Open indexserverconfig.xml in dsearch_home/config. 2. Set the property save-tokens to true for the collection. The default is false. For example: <collection document-category="dftxml" usage="Data" name="default"> <properties> <property value="true" name="save-tokens" /> </properties> </collection> 3. You can view the saved tokens in the xDB tokens database. Open the xDB admin tool in dsearch_home/dsearch/xhive/admin. EMC Documentum xPlore Version 1.2 Administration and Development Guide 89 Document Processing (CPS) Figure 12 Tokens in xDB The tokens database stores the following tokens: • Original and root forms (lemmas) of the text • Alternate lemmas • The components of compound words • The starting and ending offset relative to the field the text is contained in • Whether the word was identified as a stop word Handling special characters Special characters include accents in some languages. Accents are removed to allow searches for words without supplying the accent in the term. You can disable diacritics removal. Other special characters are used to break text into meaningful token: characters that are treated as white space, and punctuation. Handling words with accents (diacritics) Words with accents, such as those in English, French, German, and Italian, are normalized (accents removed) to allow search for the same word without the accent. To prevent diacritics from being normalized: 90 EMC Documentum xPlore Version 1.2 Administration and Development Guide Document Processing (CPS) 1. Change diacritics removal in the CPS configuration file configuration.xml located in dsearch_home/dsearch/cps/cps_daemon. 2. Locate the element linguistic_processing/properties/property and set the value of normalize_form to false: <property name="normalize_form">false</property> 3. Restart the CPS instance. Characters that are treated as white space The default special characters are defined in indexserverconfig.xml as the value of the special-characters attribute on the content-processing-services element: @#$%^_~‘*&:()-+=<>/\[]{} Note: The special characters list must contain only ANSI characters. For example, a phrase extract-text is tokenized as extract and text, and a search for either term finds the document. Characters that are required for context (punctuation) The default context characters are defined in indexserverconfig.xml as the value of the context-characters attribute of the content-processing-services element: !,.;?’&quot; Note: The context characters list must contain only ANSI characters. White space is substituted after the parts of speech have been identified. For example, the email address [email protected] contains a special character (@) and two instances of a context special character ( . ) Because the context special character . is not punctuation in this example, it is not replaced as white space. The string is tokenized as two tokens: john.smith emc.com For the phrase "John Smith is working for EMC.” the period is filtered out because it functions as a context special character (punctuation). Special characters in queries When a string containing a special character is indexed, the tokens are stored next to each other in the index. A search for the string is treated as a phrase search. For example, an index of home_base stores home and base next to each other. A search for home_base finds the containing document but does not find other documents containing home or base but not both. If a query fails, check to see whether it contains a special character. Note: Reindex your documents after you change the special characters list. Configuring stop words To prevent searches on common words, you can configure stop words that are filtered out of queries. Stop words are not removed during indexing, to support phrase searches. Exceptions: EMC Documentum xPlore Version 1.2 Administration and Development Guide 91 Document Processing (CPS) • Stop words in phrase searches are not filtered. • Stop words that appear within the context of special characters are not filtered. For example, if the query term is stop_and_go, and is a stop word. The underscore is defined in indexserverconfig.xml as a special character. CPS does not filter the underscore. A sample stop words list in English is provided in en-stopwords.txt in the directory dsearch_home\dsearch\cps\cps_daemon\shared_libraries\rlp\etc. You can edit this file. The following stop word lists are provided for Chinese, Korean, and Japanese in subdirectories of dsearch_home\dsearch\cps\cps_daemon\shared_libraries\rlp\. Edit them in UTF-8 encoding. Each line represents on lexeme (stop word). Blank lines or comments beginning with # are ignored. • Chinese: cma/dicts/zh_stop.utf8 • Korean: kma/dicts/kr_stop.utf8 • Japanese: jam/dicts/JP_stop.utf8 Stop word filtering is not configurable from Documentum interfaces. Stop words are always filtered for single/multi-term searches, and stop words are not filtered in phrase searches. Stop word filtering is configurable in an XQuery expression. Add the XQFT option "using stop words default" to the query constraint. Adding stop word lists to xPlore To add stop words lists for other languages, register your lists in the file stop-options.xml in dsearch_home\dsearch\cps\cps_daemon\shared_libraries\rlp\etc. The stop words file must contain one word per line, in UFT-8 format. The following example adds a stop words list in Spanish after the English list: <dictionarypath language="eng"> <env name="root"/>/etc/en-stopwords.txt</dictionarypath> <dictionarypath language="es"> <env name="root"/>/etc/es-stopwords.txt</dictionarypath> A sample Spanish stopwords file: a adonde al como con conmigo... Troubleshooting content processing CPS connection errors are recorded in the index agent log like the following: ERROR IndexingStatus [PollStatusThread-PrimaryDsearch] [DM_INDEX_AGENT_PLUGIN] Document represented by key 0800000c80002116 failed to index into Repo, error:Unable to connect to CPS [CPS_ERR_CONNECT]. Workaround: Install the 64-bit version of xPlore. 92 EMC Documentum xPlore Version 1.2 Administration and Development Guide Document Processing (CPS) CPS log files If CPS is installed as an in-process service on an xPlore instance, it shares the log4j.properties of the indexserver web application. This file is located in dsearch_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.war/WEB-INF/classes. The log files cps.log and cps_daemon.log are located in dsearch_home/jboss5.1.0/server/DctmServer_PrimaryDsearch/logs. Logging in a standalone instance The log4j.properties file is located in the CPS war file, in WEB-INF/classes. The log files cps.log and cps_daemon.log are located in cps_home/jboss5.1.0/server/cps_instance_name/logs. Logging separate instances Make sure that each CPS instance specifies a different log file path. The log output file is specified in the log4j.properties file of the instance. If CPS is installed as a standalone service, the log4j.properties file is located in the CPS war file, in WEB-INF/classes. If CPS is installed as an in-process service on an xPlore instance, it shares the log4j.properties of the indexserver web application. This file is located in dsearch_home\jboss5.1.0\server\DctmServer_PrimaryDsearch\deploy\dsearch.war\WEB-INF\classes. Log levels In order of decreasing amount of information logged: trace, debug, info, warn, error, and fatal. Set the log level to INFO to troubleshoot CPS. The log output file is specified in the log4j.properties file of the instance. Log output Each CPS request is logged with the prefix PERFCPSTS#. You see prefixes for following libraries in CPS logging: • CPS daemon: DAEMON-CORE • Text extraction: DAEMON-TE STELLENT • HTTP content retrieval: DAEMON-CF_HTTP • Language identification: DAEMON-LI_RLI • Language processing: DAEMON-LP_RLP Following is an example from cps.log. (Remote CPS log is named cps_manager.log.) 2008-10-21 13:35:40,402 WARN [DAEMON-CORE-(1324)] max_batch_size in configuration file is invalid. Use default:65536 instead. Example: CPS performance by format Use the timestamp difference between PERFCPSTS9 and PERFCPSTS10 to find the processing time for a particular document. PERFCPSTS9 indicates that Content fetching of the single request is finished. PERFCPSTS10 indicates that text extraction of the single request is finished. CPS startup and file not found issues The CPS process (CPSDaemon) runs on port 64321. Make sure that the process has started. If it has not started, check the cps_daemon log in dsearch_home/jboss5.1.0/server/primary_instance/logs. EMC Documentum xPlore Version 1.2 Administration and Development Guide 93 Other applications (not xPlore) can also be using the temp space. the CPS configuration can be invalid. Check the use of memory on the CPS instance. This file is located in the CPS host directory dsearch_home/dsearch/cps/cps_daemon.1. 3.. Stop all xPlore instances again. Restart all xPlore instances.. err-msg C:/xPlore/dsearch/cps/cps_daemon/export/080004578000 0e5bdmftcontentref1671968925161715182.646 ERROR [Daemon-Core-(3520)] . Perform the following steps: 1. 2. Substitute your temp directory for MYTEMP_DIR: -Djava. startPrimaryDsearch. • Install a standalone CPS instance.0/server..txt cannot be opened. causing out of memory errors..xml in dsearch_home/dsearch/cps/cps_daemon. Stop all xPlore instances and restart. for example. Add the following Java option to the primary instance start script. which is located in the CPS instance directory 94 EMC Documentum xPlore Version 1.2 Administration and Development Guide . The system cannot find the file specified.xml to a path accessible from all xPlore instances. temporarily for high ingestion periods. Add temp space for Lucene: Configure Lucene to use a separate file system for temporary index items.. If you change this file. A message in the index agent log indicates the problem: IO_ERROR: error during add entry. Set the export_path location in the remote CPS configuration.Document Processing (CPS) CPS does not start on an unsupported OS. Cannot rebuild index on remote CPS If you try to rebuild an index without configuration. Go to dsearch_home/dsearch/cps/cps_daemon/bin and try to run the daemon directly to see whether it can be started.io. Add temp space for CPS: Increase the space or change the location in the CPS configuration file configuration.tmpdir=MYTEMP_DIR 4.xml. Restarts often The CPS load can be too high.sh in jboss5. restart all instances including the remote CPS. Both the Lucene index and the CPS daemon use temp space. you see an error in the remote CPS daemon log like the following: 2011-02-16 22:53:32. err-code 513. or permanently. Indexing fails: no space left Indexing fails when there is insufficient temp space. Suggested workarounds: • Change the CPS configuration: Decrease the number of worker threads.. Check to see whether you have changed the file InstanceName_local_configuration. • Resubmit any failed files using the Documentum index agent. If CPS fails to start. A restart finalizes the temporary index items in the /tmp directory. Original message: No space left on device. 7 documentation. Check whether indexing is enabled (the type is a subtype of a registered type). You can get a listing of all registered types using the following iAPI command: ?. The Start and End offsets display the position in raw input. t. After you change the temp space location. try to resubmit them using the index agent UI. Results can differ from tokenization of a full document for the following reasons: EMC Documentum xPlore Version 1. page 83.select distinct t. The results table displays the original input words. the index agent filters out content larger than 20 MB. the system displays the Enable Indexing checkbox selected but disabled. If a supertype is registered for indexing.c. Different tokenization rules are applied for each language.) Uppercase characters are rendered as lowercase. Some documents are not indexed Is the collection set to read-only? Documents submitted for updating fail to be indexed.3. dmi_registry i where t.---------------- dm_group 0305401580000104 dm_acl 0305401580000101 dm_sysobject 0305401580000105 You can register or unregister a type through Documentum Administrator. For other languages. Run casample. Document Processing (CPS) dsearch_home/dsearch/cps/cps_daemon. Is the format supported by CPS? Unsupported format is the most common error. reindex or resubmit the documents that failed indexing.log: Content size for XXX exceeds limit of 20000000 skipping content Testing tokenization Test the tokenization of a word or phrase to see what is indexed. page 69 for more information.name.exe or casample. Is the format indexable? Check the class attribute of the document format. Check an individual file by submitting it to a test. (Only languages that have been tested are listed. leave blank. Is indexing enabled for the object type? Documents are not indexed if the document type is not registered or is not a subtype of a registered type. If the files pass the casample test.r_object_id = i. You can check whether indexing is enabled in Documentum Administrator by viewing the type properties. Components are displayed for languages that support component decomposition.registered_id You see results like the following: name r_object_id --------------------------. Input the text and select the language. See Documentum attributes that control indexing.sh in dsearch_home/dsearch/cps/cps_daemon/bin. White space replaces special characters.r_object_id from dm_type t. The type must be dm_sysobject or a subtype of it. such as German. Expand Diagnostic and Utilities in the xPlore administrator tree and then choose Test tokenization. The root form is the token used for the index. Check the list of supported formats in Oracle Outside In 8. By default. Is the document too large? See Maximum document and text size. You cannot clear the checkbox. The following message is logged in indexagent.2 Administration and Development Guide 95 . it displays the following error. pre-production sizing. If the document uses an unsupported encoding. The average processing latency is the average number of seconds between the time the request is created in the indexing client and the time xPlore receives the same request. Troubleshooting slow ingestion Slow ingestion is most often seen during migration. for example. and benchmarking. For example. For supported encodings. Suggested workarounds: For migration. See the Documentum xPlore sizing tool on Powerlink. the Documents ingested per hour reports shows number of documents and text bytes ingested. If the file is empty. If the CPS analyzer cannot identify the file type. they will be displayed after the processing statistics. The XML element that contains the error is displayed: *** Error: file is empty in element_name. slow ingestion is usually not an issue. The State of repository report in Content Server also reports document size. CPS instances are used in a round-robin order. Check CPU consumption during ingestion. For day-forward (ongoing) ingestion. or change in metadata.Document Processing (CPS) • The document language that is identified during indexing does not match the language that is identified from the test. If migration is spread over days. update. see EMC Documentum xPlore Installation Guide.2 Administration and Development Guide . Use the executable CASample in dsearch_home/dsearch/cps/cps_daemon/bin to test the processing of a file. 96 EMC Documentum xPlore Version 1. Syntax: casample path_to_input_file CPS errors If there are processing errors for the file. The XML element that contains the error is displayed: *** Error: file is corrupt in element_name. Use xPlore administrator reports to see the average size of documents and indexing latency and throughput. the following error is displayed. These documents also contain more text to process. Insufficient CPU Content extraction and text analysis are CPU-intensive. Large documents Large documents can tie up a slow network. add temporary CPU capacity. a 1027 error code is displayed. • The context of the indexed document does not match the context of the text. tens of millions of documents ingested over two weeks. A corrupt file returns the following error. Most ingestion issues can be resolved with planning. CPU is consumed for each document creation. Divide bytes ingested by document count to get average number of bytes per document processed. The XML element that contains the error is displayed: *** Error: no filter available for this file type in element_name. add permanent CPU or new CPS instances. <parameter_name>contentSizeLimit</parameter_name> <parameter_value>20000000</parameter_value> </parameter> • CPS limits the size of text within a document that is indexed. Low CPU utilization and high I/O response time for ingestion or query indicate an I/O problem. EMC Documentum xPlore Version 1. Increase the number of drives available for the xPlore instance. You can change the contentSizeLimit parameter to a different value (in bytes). Other suggested workarounds: Add CPU. in the WEB-INF/classes/ directory of the index agent WAR file. Improve network performance. Suggested workarounds: • NAS: Verify that the network has not been set as half duplex. A document can have a much greater size (contentSizeLimit) compared to the indexable text within the document.xml. If a query is very slow the first time and much faster the second time.xml. This limit is changed in indexagent. memory. If it is slow the second time. maximized for 32-bit environment. If other measures have not resolve the problem. Maximum setting: 2 GB. Verify that the SAN has sufficient memory to handle the I/O rate. Edit max_data_per_process: The upper limit in bytes for a batch of documents in CPS processing. 2. 5. this performance is probably not due to insufficient I/O capacity. Increase network bandwidth and/or improved network I/O controllers on the xPlore host. which is located in the CPS instance directory dsearch_home/dsearch/cps/cps_daemon. Units are bytes and the range is 5-40 MB. Document Processing (CPS) Several configuration properties affect the size of documents that are indexed and consequently the ingestion performance. If the SAN is multiplexing a set of drives over multiple application. You can change the value of Max Text Threshold in the xPlore Administrator CPS configuration screen.2 Administration and Development Guide 97 . • CPS limits the size of text that can be processed in a batch. you could have an I/O problem. • SAN (check in the following order) 1. 4. Default: 30 MB. Stop the index agent instance to change the size limit. change underlying drives to solid state. • Indexing agent (Documentum only) limits the size of the documents submitted for indexing. Use striped mapping instead of concatenated mapping so that all drives can be used to service I/O. Disk I/O issues You can detect disk I/O issues by looking at CPU utilization. You can also measure I/O performance on Linux using the bonnie benchmark. Test the network by transferring large files or using Linux dd (disk dump). Edit the CPS configuration file configuration. 3. and possible disk I/O capacity. move the "disk space" to a less contentious set of drives. These values are optimized for 32-bit environments and must be adjusted upward in 64-bit environments for faster ingestion. the CPS log file reports one of the following errors (cps_daemon. Slow content storage area Ingestion is dependent on the speed of the content source. independent of xPlore operations. Suggested workaround: Add temporary CPU for migration or permanent CPU for ongoing load. Content storage issues are especially noticeable during migration. or switch to Linux platform. Interference by another guest OS In a VM environment. Workarounds: Exclude temp and xPlore working and data directories. Increase network capacity. You can determine the location of the original content by using the State of the repository report in Content Server. or CPU capacity. Check for faulty hubs or switches. Development is on a small volume of content on NAS but production content is on a higher-latency device like Centera. File transfers via FTP or network share are also slow.log): 98 EMC Documentum xPlore Version 1. Consumption is low even when the disk subsystem has a high capacity.2 Administration and Development Guide .Document Processing (CPS) Slow network A slow network between the Documentum Content Server and xPlore results in low CPU consumption on the xPlore host. Workaround: Consult with your infrastructure team to load balance the VMs appropriately. Workaround: Extend the migration time window. due to the complexity of extracting text from the spreadsheet structure. You can detect the number of Excel documents using the State of repository report in Content Server. Suggested workarounds: Verify that network is not set to half duplex. Concurrent large file ingestion When CPS processes two or more large files at the same time. I/O capacity. For example. you find that migration or ingestion takes much longer in production than in development. Large number of Excel documents Microsoft Excel documents require the most processing of all text formats. document size. the physical host can have several guest operating systems. This contention could cause intermittent slowness in indexing unrelated to format. Virus checking software Virus checking software can lead to high disk I/O because it continually checks the changes in xPlore file structures during indexing. EMC Documentum xPlore Version 1. Use these same steps for other supported languages 1. You can also prevent words from being decompounded. This processor does not interfere with query processing. • Windows compilation scripts: In dsearch_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bin/ia32-w32-msvc71. PERSON. you specify the part of speech for ambiguous nouns or verbs. PROPER_NOUN. In your custom dictionary. Call build_cla_user_dictionary. Document Processing (CPS) ERROR [Daemon-Core-(3400)] Exception happened. ACCESS_VIOLATION. FATAL [DAEMON-LP_RLP-(3440)] Not enough memory to process linguistic requests. making queries more precise. Error message: bad allocation Use xPlore administrator to select the instance. 3. which assists CPS in determining the context for a sentence.exe.. Each entry in the file is on a single line with the following syntax: word TAB part_of_speech TAB decomposition_pattern part_of_speech: NOUN. • Linux compilation scripts: In dsearch_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bin/ia32-glibc23–gcc34. The following example is decomposed into two four-character sequences: 2. GIVEN_NAME. and then choose Configuration. ORGANIZATION. Put the compiled dictionary into dsearch_home/cps/cps_daemon/shared_libraries/rlp/cma/dicts. The sum of the digits in the pattern must match the number of characters in the entry. Call chmod a+x to set permissions and then call build_cla_user_dictionary. For example. A value of 0 indicates no decomposition. decomposition_pattern: A comma-delimited list of numbers that specify the number of characters from word to include in each part of the compound. including personal names and foreign words. Change the following to smaller values: • Max text threshold • Thread pool size You can add a separate CPS instance that is dedicated to processing. or FOREIGN_PERSON. PLACE. Create a UTF-8 encoded file.2 Administration and Development Guide 99 . Repairing Lucene merges Adding dictionaries to CPS You can create user dictionaries for words specific to an industry or application. the following entry indicates that the word is decomposed into three two-character sequences.. Compile the dictionary. The following procedure creates a Chinese user dictionary. Attempt to read data at address 1 at (connection-handler-2) . contractions and elisions.favor_user_dictionary" value="true"/>. Each entry is a single line. The source file for a user dictionary is UTF-8 encoded.. For example: <contextconfig><properties> <property name="com. Polish. The three characters ’[’. In a few cases (such as "New York"). and words with clitics (German. The maximum size of an analysis is 128. it may contain one or more space characters. The word is a TOKEN.xml in dsearch_home/cps/cps_daemon/shared_libraries/rlp/etc. Dutch. Morphological tags are used to help identify named entities (Dutch.. modify cps_context. Creating European language user dictionaries You can create one or more user dictionaries for each of the languages supported by the European Language Analyzer: Czech. and Portuguese). Russian. English. Portuguese. Uppercase English. and Italian).bin </dictionarypath> </claconfig> 5. Tags are placed in square brackets ([ ]). ’]’.. Hungarian. enter it as ’ABC\\XYZ’.cla. For optimal performance. Special tags are used to divide compound words into their components (German. for example. To prevent a word that is also listed in a system dictionary from being decomposed.. 100 EMC Documentum xPlore Version 1. French. <dictionarypath><env name="root"/>/cma/dicts/user_dict. Greek. If your dictionary contains a word like ’hi there’.basistech. Add the property com. Dutch.2 Administration and Development Guide .. Note: European Language Analyzer user dictionary lookups occur after tokenization. .. and ’\’ must be escaped by prefixing the character with ’\’. French. POS tags and morphological tags begin with "+". German. Hungarian. Create the source file. The file may begin with a byte order mark (BOM). and Spanish. where normal characters count as 1 and tags count as 2.xml in dsearch_home/cps/cps_daemon. and Hungarian) and to define boundaries for multi-word baseforms.cla. Dutch.basistech. and set it to true. it will not be found because the tokenizer identifies ’hi’ and ’there’ as separate tokens. and a required POS (Part-Of-Speech) tag. Upper-Case English. If. Italian. English. The Tab character separates word from analysis: word Tab analysis In some cases (as described below) the word or analysis may be empty.bin: <claconfig> . You add a dictionarypath element to cla-options. The analysis is the LEMMA with 0 or more morphological tags and special tags. German. Italian. the word is ’ABC\XYZ’. The following example adds a user dictionary named user_dict. Edit the CLA configuration file to include the user dictionary.Document Processing (CPS) 4. Empty lines are ignored. keep the number of dictionaries you create per language to a minimum.favor_user_dictionary if it does not exist. French. Perform the following steps to create an European language user dictionary: 1. input is the pathname of the dictionary source file.bin for the big-endian dictionary. for example. Windows example with English dictionary: . • The BT_BUILD environment variable must be set to the platform identifier. If you are generating a little-endian and a big-endian dictionary.sh. To compile the dictionary into a binary format that European Language Analyzer can use. if xPlore is installed in /usr/local/xPlore. Note: The dictionary lookup is case sensitive. Note: Choose a descriptive name for user_dict.2 Administration and Development Guide 101 . Prerequisites: • Unix or Cygwin (for Windows) • Python 2.sh lang input output lang is the two-letter language code (en_uc for Upper-Case English).sh en user_dict.utf8 user_dict-LE. telephone telephone[+NOUN] telephone[+VI] dog dog[+NOUN] Dog 2. the BT_ROOT environment variable must be set to /usr/local/xPlore/dsearch/cps/cps_daemon/shared_libraries. output is the pathname of the binary dictionary file.bin EMC Documentum xPlore Version 1. Compile the user dictionary. include lines with empty analyses (word + Tab). For example. and two renditions of "dog" for the same analysis (noun). Document Processing (CPS) A POS tag identifying part of speech is required and is the last (right-most) tag in the analysis. The following example includes two analyses for "telephone" (noun and verb). The script for generating a binary dictionary is xplore_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bl1 /dicts/tools/build_user_dict.4 on your command path • The Unix sort command on your command path • The BT_ROOT environment variable must be set to BT_ROOT./build_user_dict. amd64-w64-msvc80. A line with an empty word uses the previous non-empty word. use user_dict-LE. European Language Analyzer uses the dictionary in a binary form. To avoid unnecessary repetition. A line with an empty analysis uses the previous non-empty analysis. the Basis root directory. and lines with empty words (Tab + analysis).bin for the little-endian file and user_dict-BE. issue the following command: build_user_dict. English examples: dog dog[+NOUN] Peter Peter[+Masc][+PROP] NEW YORK New[^_]York[+Place][+City][+PROP] doesn’t does[^=]not[+VDPRES] Variations: You may want to provide more than one analysis for a word or more than one version of a word for an analysis. on Windows. These processes are thoroughly tested and supported... <user-dict><env name="root"/>/bl1/dicts/en/userdict-<env name="endian"/>.. language identification.xml. You must compile separate dictionaries for each. To organize the placement of user dictionaries. Example: English user dictionary is available as both a little-endian and a big-endian binary dictionary. and they are adequate for most content processing needs. Example: German user dictionary available in only one form (little-endian or big-endian). For example.. customer IDs could be stored as 123–abc-456-789 but only the 456-789 digits are significant.Document Processing (CPS) Unix example with English dictionary: .bin</user-dict></bl1-options> <env-name="endian"> evaluates to LE on little-endian platforms and BE on big-endian platforms.bin</user-dict> </bl1-options> Custom content processing About custom content processing Text extraction Annotation Custom content processing errors About custom content processing CPS uses embedded content processors for text extraction. Normalization would extract this from the text so that users would find the document when they search for “456789” or “456-789”. 4. add a <user-dict> element to the appropriate language section in xplore_home/dsearch/cps/cps_daemon/shared_libraries/rlp/etc/bl1-config. . .bin 3..utf8 user_dict-BE. you may want to put the binary dictionary file in a xplore_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bl1/dicts language directory where the directory name matches the language code... and linguistic analysis.2 Administration and Development Guide . put an English user dictionary in xplore_home/dsearch/cps/cps_daemon/shared_libraries/rlp/bl1/dicts/en. <bl1-options language="eng"> . For example. Update the configuration file. You can add custom content processors to address the following use cases: • Custom text extraction that processes certain mime-types.sh en user_dict./build_user_dict. For each user dictionary.. <bl1-options language="deu"> . <user-dict><env name="root"/>/bl1/dicts/de/userdict. Put the binary dictionary. 102 EMC Documentum xPlore Version 1. • Normalization: Normalizing varied inputs of data like phone numbers or customer IDs. xml. A few custom plugins have been tested to validate the plugin environment. public xPlore adaptor interfaces. Documents that meet criteria in your plugin or UIMA module are routed to this CPS instance. Custom text extractors are usually third-party modules that you configure as text extractors for certain formats (mime types). Place DLLs or jar files in CPS classpath. configure your custom component on a separate CPS instance. Text extractor for certain content formats. but these plugins themselves are not supported. language identification. 2. You can customize the CPS document processing pipeline at the following stages. and plugin issues are reported separately. extracting the locations of people or places from the full-text content. Figure 13 Custom content processing Support for plugins EMC supports the quality of service of the xPlore default text extraction. 1. both Java and C/C++ based. Specified in CPS configuration. The plugin runtime is sandboxed. and stability of your plugins and annotators. Custom annotators require software development of modules in the Apache UIMA framework. EMC Documentum xPlore Version 1. Repeat for each CPS instance. You must create adaptor code based on proprietary. and perform name indexing for faster retrieval. and linguistic analysis. Write plugin (Java or C/C++) or UIMA annotator. Customization steps 1. memory consumption.2 Administration and Development Guide 103 . annotate metadata. Supports custom file decryption before extraction. 3. Annotators: Classify elements in the text. 2. EMC is not responsible for the performance. For best troubleshooting of your plugins. Sample Java and C++ text extraction plugins and readmes are provided in dsearch_home/dsearch/cps/cps_daemon/sdk and cps/cps_manager. Document Processing (CPS) • Entity extraction and XML annotation: For example. type.xml in dsearch_home/dsearch/cps/cps_daemon. Test content processing. the xPlore default extractor Oracle Outside In (formerly Stellent) or Apache Tika. deploy a custom text extractor on a separate CPS instance. Perform a backup of your customization DLLs or jars when you back up the xPlore federation. properties 104 EMC Documentum xPlore Version 1. page 80. For instructions on configuring a remote CPS instance for text extraction. for example. lib_path.Document Processing (CPS) 4. formats.2 Administration and Development Guide . contains name. Each extractor requires the following information: Table 11 Child elements of text_extraction Element Description text_extractor_preprocessor Preprocessing specification. see Adding a remote CPS instance. The following diagram shows three different mime types processed by different plugins Figure 14 Custom text extraction based on mime type Configuring a text extraction plugin Add your plugins to configuration. Text extraction The text extraction phase of CPS can be customized at the following points: • Pre-processing plugin • Plugins for text extraction based on mime type. • Post-processing plugin For best reliability. 5. emc.cps.xml in dsearch_home/dsearch/cps/cps_daemon. Sample jar files and Tika configuration file are installed in cps_home/dsearch/cps/add-ons/tika. located in dsearch_home/dsearch/cps/add-ons/tika. properties text_extractor_postprocessor Postprocessing specification.xml. return_attribute_name is a property required by the Tika text extractor. to dsearch_home/dsearch/cps/cps_daemon. contains name.cma. properties Contains user-defined property elements. Back up configuration.processor. 4.xml. 3.cps.textextractor.cma. lib_path. Implement the abstract class CPSTextExtractor in the package com. Document Processing (CPS) Element Description text_extractor Contains name..processor. Creating a text extractor adaptor class xPlore provides public. formats. The following example contains a preprocessor and an extractor plugin. Rename the copied file to configuration. properties name Used for logging type Valid values: Java or native (C/C++) lib_path Fully qualified class name in CPS host classpath. Dowload and place Tika jar files in dsearch_home/dsearch/cps/add-ons/tika. formats Contains one or more format element.CPSPasswordCracker </lib_path> <properties> EMC Documentum xPlore Version 1.emc.. <text_extraction> <!--preprocessor chain invoked in the configured sequence--> <text_extractor_preprocessor> <name>password_cracker</name> <type>java</type> <lib_path> com. Copy configuration_tika. <cps_pipeline> . proprietary interfaces that you can implement for your plugin. lib_path.2 Administration and Development Guide 105 . In this example. 2. 1. The properties are passed to the plugin for processing. Other samples are in cps_home/dsearch/cps/cps_daemon/sdk. formats. property The value of each named property element is read by your plugin. type. Sample Tika text extractor The Apache Tika toolkit extracts metadata and structured text from documents. Each format element corresponds to a mime type.textextractor. type. processing is blocked and requires a CPS restart.emc. and cps_daemon.cps.. processing is blocked and requires a CPS restart.textextractor.. The CPS daemon does not start. Annotation Documents from a Content Server are submitted to CPS as DFTXML files. </cps_pipeline> Troubleshooting custom text extraction Set up custom text extraction (TE) on a remote CPS instance. the CPS daemon log records a timeout error and the daemon restarts.CPSTikaTextExtractor </lib_path> <properties> <property name="return_attribute_name">false</property> </properties> <formats> <format>application/pdf</format> </formats> </text_extractor> . Text can be extracted to customer-defined categories. • A file with the incorrect mime type is routed to the TE plugin.Document Processing (CPS) <property name="return_attribute_name">false</property> </properties> <formats> <format>application/pdf</format> </formats> </text_extractor_preprocessor> <text_extraction> <text_extractor> <name>tika</name> <type>java</type> <lib_path> com. If two text extractors are available for a given format. Change the default log level from WARN to DEBUG and ingest the same document. • If the text extractor hangs during daemon processing. Custom text extraction plugins raise the following errors: • Missing or misplaced custom library. </text_extraction> .log reports an initialization error. and the metadata can be annotated with information. If the text extractor hangs on the manager side.. If the text extractor crashes on the manager side. the CPS daemon log restarts. Configure the instance to do text extraction only. • If the text extractor crashes during daemon processing..2 Administration and Development Guide .processor. 106 EMC Documentum xPlore Version 1. The CPS annotation framework analyzes the DFTXML content.cma. the first one in the configuration receives the request. The descriptor describes the name of the annotator.xml pipeline element to specify the descriptor. and puts it into a consumable XML output.0/server/DctmServer_PrimaryDsearch/deploy/dsearch. also known as a consumer descriptor. b. Document Processing (CPS) Annotation employs the Apache UIMA framework. 4.2 Administration and Development Guide 107 . Restart the xPlore instances. A UIMA annotator module extracts data from the content. 2. • Annotator source files in /samples/src/com/emc/documentum/core/fulltext/indexserver/uima/ae. Specify UIMA modules and configure their usage in indexserverconfig. A UIMA annotator module has the following content: • An XML annotation descriptor file. page 110 for implementation details. e.xml for the example in /samples/config • Sample type descriptor and annotation definition files in /samples/src/com/emc/documentum/core/fulltext/indexserver/uima/descriptors/ Steps in creating a UIMA annotator for xPlore Following is a high-level description of the steps. c. Create the annotator descriptor XML file that describes the annotator detail. EMC Documentum xPlore Version 1.xml. Create annotator code that extends JCasAnnotator_ImplBase in the UIMA framework. and the runtime configuration parameters. • Annotation implementation code that extends JCasAnnotator_ImplBase in the UIMA framework. Compile the Java classes and package with the XML descriptor files in a jar file. Copy the jar file to dsearch_home/jboss5. optionally adds information. • A common analysis structure (CAS) that holds the input and output for each stage in the sequence. Change indexserverconfig. • An analysis engine consisting of one or more annotators in a sequence. defines the type structure for the CAS. Create a CAS results type definition file that specifies a feature element for each data type that you are annotating. • Sample indexserverconfig. and any properties for the annotator. See the example. Stop all xPlore instances. 5. 1.war/WEB-INF/lib. the output type.jar) in /samples/dist. The CAS tool generates supporting code from the type descriptor file. Download the UIMA Java framework and SDK binaries from Apache and add it to your project build path. Deploy to xPlore: a. 7. The type system descriptor file. Add pipeline-usage for the associated category of documents. 6.bat (Windows) or jcasgen. Increase the value of the revision attribute on the index-server-configuration element to ensure that xDB loads it. Use the Apache UIMA tool jcasgen. d. process-class. 3. Sample annotation module files are provided in the xPlore SDK: • Sample annotator module (dsearch-uima.1.sh (Linux) to generate type implementation classes from the type definition file. a processing class.uima. The path to the descriptor file is from the base of the UIMA module jar file. properties Contains property elements that define runtime parameters.xml. The default class. The descriptor file.Document Processing (CPS) Note: To deploy the UIMA module as a remote service. com. and process class are hypothetical.core.xml.xml process-class= xPlore_UIMAProcessFactory_class name= name-of-pipeline /> </pipeline-config> Table 12 Pipeline configuration attributes and elements Element Description name Unique identifier of the UIMA module descriptor UIMA annotator descriptor file path. This class must implement IFtProcessFactory.xml Stop all xPlore instances and add your UIMA references to indexserverconfig.UIMAProcessFactory. The path is resolved using the classpath. and an optional name.UIMAProcessFactory" name=" phonenumber_pipeline"/> </pipeline-config> The xPlore UIMAProcessFactory class is provided with xPlore. 108 EMC Documentum xPlore Version 1. The pipeline-config element has the following syntax: <pipeline-config> <pipeline descriptor=UIMA-pipeline-descriptor. The following example configures a UIMA module.uima. provides for most processing needs.xml Configure the usage of the UIMA module within a category element in indexserverconfig. You specify a descriptor XML file. you can use the Vinci service that is included in the UIMA SDK.emc.fulltext. Specifying UIMA modules in indexserverconfig.emc. Properties control the level of concurrency and the data elements from the DFTXML document that are sent to the UIMA annotator.core. It executes the pipeline based on the definitions provided.fulltext. process-class The xPlore factory class.indexserver. Most applications annotate the dftxml category.xml" process-class=" com.documentum.in- dexserver. Add a pipeline-config element as a child of index-server-configuration between content-processing-services and category-definitions. <pipeline-config> <pipeline descriptor="descriptors/PhoneNumberAnnotator.documentum.2 Administration and Development Guide . Configuring UIMA module usage in indexserverconfig. located in dsearch_home/config. Add one pipeline-usage element after the indexes element. set to false. Default: false. The required element-name attribute corresponds to the XML element that the feature is mapped to. The element-name attribute specifies the XML element that holds the result value from the annotator. properties Child element of pipeline-usage. mapper-class Optional attribute of pipeline-usage that maps the annotation result to an XML sub-tree. root-result-element Attribute of pipeline-usage that specifies an element in the DFTXML document that will be the root of the annotated results. The content of the element r_object_type (input-element element-path) and object_name are passed to the UIMA analyzer. which can then be added to DFTXML. Specifies concurrency and the elements that are passed to the annotator. In the following example of the Apache UIMA room number example. property name= send-content-text Controls whether to pass the content text to the annotator. property name= thread-count Sets concurrency level in the underlying pipeline engine. The type-name attribute specifies the fully qualified class name for the class that renders the result. This element becomes a new child element within DFTXML. If the annotator operates on tokens.2 Administration and Development Guide 109 . This value should match the typeDescription name in the descriptor file. The name attribute is a unique identifier for logging. For the object_name EMC Documentum xPlore Version 1. input-element One or more optional child elements of pipeline-usage that specifies the DFTXML elements that are passed to UIMA analysis engine. Default: true. property name= send-content-tokens Controls whether to pass the content tokens to the annotator. Document Processing (CPS) Table 13 Pipeline usage attributes and elements Element Description name Attribute of pipeline-usage that references a pipeline module defined in pipeline-config. The name attribute must be unique within the same pipeline. If the annotator does not require content text. By default. The required feature attribute corresponds to a data member of the type. The element-path attribute has an xpath value that is used to locate the XML element in the DFTXML document. it has the same value (100) as CPS-threadpool-max-size in xPlore administrator indexing configuration. type-mapping One or more elements that specify how to store the analytic processing results in the original DFTXML document. feature-mapping One or more optional child elements of type-mapping. set to true. the annotated content is placed under dmftcustom in the DFTXML file (root-result-element). fulltext.DateTimeAnnot" element-name="datetime"/> </pipeline-usage> See the Apache UIMA documentation for more information on creating the annotator class and descriptor files. and the default log for UIMA logging is dsearch.core. uima.xml for category dftxml]]></message> </message></event> When an annotator is applied (tracing for dsearchindex is INFO).org. <pipeline-usage root-result-element="dmftcustom" name="test_pipeline"> <input-element element-path="/dmftdoc/dmftmetadata//r_object_type" name=" object_type"/> <input-element element-path="/dmftdoc/dmftmetadata//object_name" name=" object_name"/> <type-mapping element-name="room" type-name="com.properties as the value of log4j. you can configure logging in log4j.fulltext. The default log level is WARN.uima. page 110 for xPlore.037" level="INFO" thread=" CPSWorkerThread-1" logger="com.ae. It is used to create a UIMA module that normalizes phone numbers for fast identification of results in xPlore. Apply Annotator ’Phone Number Annotator’ to document testphonenumbers5_txt1318881952107]]></message></event> UIMA example The following example is from the UIMA software development kit.documentum.emc.util.common" timeInMilliSecs="1315350557451"> <message ><![CDATA[Create Annotator ’Phone Number Annotator’ from descriptor descriptors/PhoneNumberAnnotator.documentum.documentum.451" level="INFO" thread=" CPSWorkerThread-1" logger="com. fulltext.RoomNumber"> <feature-mapping element-name="building" feature="building"/> <feature-mapping element-name="room-number" feature="roomnumber"/> </type-mapping> <type-mapping type-name="com.index" timeInMilliSecs="1315350558037"> <message ><![CDATA[Domain test category dftxml.indexserver. When an annotator is initialized.apache.core. Next.fulltext.core. you see the domain name and category in the log like the following: <event timestamp="2011-09-06 16:09:18. UIMA in the log If there are errors in configuration or UIMA module.logger.documentum.core. See a simple annotator example.emc.ae.properties with the package names of the annotator. If your annotator uses log4j or java. you see no UIMA logging.logging. the building and room-number elements are generated by a lookup of those features (data members) in the RoomNumber class.emc.Document Processing (CPS) value.indexserver. a room element is generated by the RoomNumber class.2 Administration and Development Guide . UIMA logging is configured in log4j.uima. you see the category in the log like the following: <event timestamp="2011-09-06 16:09:17. 110 EMC Documentum xPlore Version 1.log.emc. Document Processing (CPS) Prepare the environment Download the UIMA Java framework and SDK binaries from Apache and add it to your project build path.String</rangeTypeName> </featureDescription> <featureDescription> <name>normalizedForm</name> <description /> <rangeTypeName>uima.xml creates the following string-type features.org/resourceSpecifier"> <name>TutorialTypeSystem</name> <description>Phone number Type System Definition</description> <vendor>The Apache Software Foundation</vendor> <version>1. The following example PhoneAnnotationTypeDef.cas.0</version> <imports> <import location="AnnotationTypeDef.0" encoding="UTF-8" ?> <typeSystemDescription xmlns="http://uima. which handle file access. The class that handles the types is specified as the value of types/typeDescription/name: FtPhoneNumber.Annotation</supertypeName> <features> <featureDescription> <name>phoneNumber</name> <description /> <rangeTypeName>uima. This file specifies a feature element for each data type that you are annotating.: • phoneNumber: Phone number as it appears in a document.xml" /> <import location="FtDocumentAnnotationTypeDef.xml"/> </imports> <types> <typeDescription> <name>FtPhoneNumber</name> <description></description> <supertypeName>uima.apache.tcas. Define a CAS feature structure type Define a CAS feature structure type in an XML file called a type system descriptor.2 Administration and Development Guide 111 .cas. • normalizedForm: Normalized phone number Note that the type descriptor must import the xPlore type definition descriptors. <?xml version="1.String</rangeTypeName> </featureDescription> </features> </typeDescription> </types> </typeSystemDescription> EMC Documentum xPlore Version 1. Provide the filename as the value of the descriptor attribute of the pipeline element.java</frameworkImplementation> <primitive>true</primitive> <annotatorImplementationName>PhoneNumberAnnotator</annotatorImplementationName> <analysisEngineMetaData> <name>Phone Number Annotator</name> <description>Searches for phone numbers in document content.apache.bat (Windows) or jcasgen.sh (Unix) to generate type implementation classes from the type definition file.xml. Two classes are generated: • FtPhoneNumber • FtPhoneNumber_Type You will later reference the first class in the type-mapping element of indexserverconfig. when dsearchindex logging is set to INFO. Eclipse and other programming environments have UIMA plugins that generate the classes. The descriptor describes the name of the annotator. the annotator name is Phone Number Annotator. The outputs are the features referenced in the same type definition. the class FtPhoneNumber generates the output for both the phoneNumber and normalizedForm features. <analysisEngineDescription xmlns="http://uima.</description> <version>1. Create an annotator descriptor Create the annotator descriptor XML file that describes the annotator.org/resourceSpecifier" xmlns:xi="http://www.0</version> <vendor>The Apache Software Foundation</vendor> <configurationParameters> <configurationParameter> <name>Patterns</name> <description>Phone number regular expression pattterns. and the runtime configuration parameters that the annotator accepts.</description> <type>String</type> <multiValued>true</multiValued> <mandatory>true</mandatory> </configurationParameter> </configurationParameters> 112 EMC Documentum xPlore Version 1.xml. In this descriptor. For example.log to specify annotator instantiation and.Document Processing (CPS) Generate the type implementation class Use the Apache UIMA tool jcasgen.w3. each in a value/array/string element.org/2001/XInclude"> <frameworkImplementation>org. The typeSystemDescription element references the type system descriptor PhoneAnnotationTypeDef. In the following example PhoneNumberAnnotator.xml. You specify this descriptor file when you configure UIMA in xPlore. The configurationParameters element defines a parameter named Patterns.2 Administration and Development Guide .apache.uima. The element configurationParameterSettings defines the patterns as two regular expression patterns. This name is used in dsearch. inputs and outputs as defined in the type system descriptor. the class that generates the output is referenced along with the feature. annotator application to a document. Matcher.ResourceInitializationException.resource.util.JCasAnnotator_ImplBase.jcas.uima.uima.util.apache.uima.regex.analysis_component.xml"/> </imports> </typeSystemDescription> <capabilities> <capability> <inputs></inputs> <outputs> <type>FtPhoneNumber</type> <feature>FtPhoneNumber:phoneNumber</feature> <feature>FtPhoneNumber:normalizedForm</feature> </outputs> <languagesSupported></languagesSupported> </capability> </capabilities> <operationalProperties> <modifiesCas>true</modifiesCas> <multipleDeploymentAllowed>true</multipleDeploymentAllowed> <outputsNewCASes>false</outputsNewCASes> </operationalProperties> </analysisEngineMetaData> </analysisEngineDescription> Create annotator code to generate output The annotator is a subclass of JCasAnnotator_ImplBase in the Apache UIMA framework.AnalysisEngineProcessException.UimaContext. import java.Pattern.2 Administration and Development Guide 113 . import java.uima. Class: public class PhoneNumberAnnotator extends JCasAnnotator_ImplBase { EMC Documentum xPlore Version 1. import org. import org.apache.apache.uima.apache.JCas. The main method to implement is the following: public void process(JCas aJCas) throws AnalysisEngineProcessException Imports: import org. import org.analysis_engine. Document Processing (CPS) <configurationParameterSettings> <nameValuePair> <name>Patterns</name> <value> <array> <string>\b+\d{3}-\d{3}-\d{4}</string> <string>\(\d{3}\)\s*\d{3}-\d{4}</string> </array> </value> </nameValuePair> </configurationParameterSettings> <typeSystemDescription> <imports> <import location="PhoneAnnotationTypeDef.apache.regex. import org. } } } } 114 EMC Documentum xPlore Version 1.setPhoneNumber(text). for (int i = 0.setBegin(matcher.setNormalizedForm(normalizePhoneNumber(text)). while (matcher. } public void process(JCas aJCas) throws AnalysisEngineProcessException { // get document text String docText = aJCas.compile(patternStrings[i]). // Get config.getConfigParameterValue(" Patterns"). annotation. parameter values from PhoneNumberAnnotator.length.2 Administration and Development Guide . // loop over patterns for (int i = 0.start()). annotation. i++) { Matcher matcher = mPatterns[i]. i < mPatterns.addToIndexes(). index < input. // compile regular expressions mPatterns = new Pattern[patternStrings. i++) { mPatterns[i] = Pattern. annotation.find()) { // found one .getCoveredText(). if (c >= ’0’ && c <= ’9’) buffer.length]. } } private String normalizePhoneNumber(String input) { StringBuffer buffer = new StringBuffer().toString(). i < patternStrings.charAt(index). annotation.setEnd(matcher.getDocumentText(). annotation.end()). public void initialize(UimaContext aContext) throws ResourceInitializationException { super.xml String[] patternStrings = (String[]) aContext. String text = annotation.length().Document Processing (CPS) private Pattern[] mPatterns.matcher(docText). ++index) { char c = input.append(c).initialize(aContext).create annotation FtPhoneNumber annotation = new FtPhoneNumber(aJCas).length. } return buffer. for (int index = 0. .xml. specify the type definition class. err-code 770.PackageParser@161022a6) EMC Documentum xPlore Version 1.xml" process-class=" com.. specify the output XML element for element-name and the feature name that is registered in the annotator descriptor PhoneNumberAnnotator.log and dsearch. Error Text examples: Password-protected or encrypted file (native error:TIKA-198.2 Administration and Development Guide 115 . see Extensible Documentum DTD.core. The error type is Unknown error.uima. The following example specifies the annotator descriptor file: </content-processing-services> <pipeline-config> <pipeline descriptor="descriptors/PhoneNumberAnnotator. so you place pipeline-usage in category name="dftxml".xml. Document Processing (CPS) Configure UIMA modules in indexserverconfig.) You can identify the origin of the CPS processing error in the files cps.xml.emc. For example: </indexes> <pipeline-usage root-result-element="ae-result" name="phonenumber_pipeline"> <input-element name="content" element-path="/dmftdoc/dmftcontents"/> <type-mapping element-name="FtPhoneNumber" type-name="FtPhoneNumber"> <feature-mapping element-name="phoneNumber" feature="phoneNumber"/> <feature-mapping element-name="phoneNormlizd" feature="normalizedForm"/> </type-mapping> </pipeline-usage> Custom content processing errors The report Document processing error summary retrieves errors in custom content processing.indexserver.log: • CPS log examples: Failed to extract text from password-protected files:2011-08-02 23:27:23.parser. specify an element name for logging (usually the same as type-name). Add one pipeline-usage element after the indexes element.. For type-mapping.apache. Corrupt file. For input-element. For feature-mapping.) Unknown error during text extraction (native error:TIKA-198. err-msg: Corrupt file (native error:TIKA-198: Illegal IOException from org. Most applications annotate the dftxml category. specify a name for logging and a path to the input element in DFTXML. It executes the pipeline based on the definitions provided.xml Stop all xPlore instances and add your UIMA references to indexserverconfig. For type-name.288 ERROR [MANAGER-CPSLinguisticProcessingRequest-(CPSWorkerThread-1)] Failed to extract text for req 0 of doc VPNwithPassword_zip1312352841145. located in dsearch_home/config.documentum. (For a sample DFTXML. The name of the custom content processing pipeline is retrieved from CPS configuration.UIMAProcessFactory" name=" phonenumber_pipeline"/> </pipeline-config> Configure the usage of the UIMA module within a category element in indexserverconfig.tika..xml as the value of text_extraction/name. page 292.fulltext.pkg. Add a pipeline-config element as a child of index-server-configuration between content-processing-services and category-definitions. or Password-protected or encrypted file. The xPlore UIMAProcessFactory class is provided with xPlore. ]]></message> </event> <event timestamp="2011-08-02 23:36:28.tika.188 ERROR [MANAGER-CPSLinguisticProcessingRequest-(CPSWorkerThread-2)] Failed to extract text for req 0 of doc tf_protected_doc1312353385777.tika.emc.documentum. err-code 770.fulltext.parser.fulltext.parser.]]></message> </event> 116 EMC Documentum xPlore Version 1.OfficeParser@3b11d63f) • dsearch log examples: "Corrupt file":<event timestamp="2011-08-02 23:27:24.apache. message: CPS Warning [Corrupt file (native error:TIKA-198: Illegal IOException from org.microsoft.2 Administration and Development Guide .Document Processing (CPS) 2011-08-02 23:36:27.emc.apache.994" level="WARN" thread=" IndexWorkerThread-1" logger=" com.documentum. err-msg: Corrupt file (native error: Unexpected RuntimeException from org.core.core.index" timeInMilliSecs="1312353388306"> <message > <![CDATA[Document id: tf_protected_doc1312353385777.parser.306" level="WARN" thread=" IndexWorkerThread-2" logger=" com.microsoft.apache.pkg.PackageParser@161022a6)].OfficeParser@3b11d63f)].index" timeInMilliSecs="1312352844994"> <message > <![CDATA[Document id: VPNwithPassword_zip1312352841145. message: CPS Warning [Corrupt file (native error: Unexpected RuntimeException from org.tika. You can tune the indexing configuration for specific needs. page 42.xml. EMC Documentum xPlore Version 1. page 121. You can configure the same indexing parameters on a per-instance basis by choosing Indexing Service on an instance and then choosing Configuration. You can configure compression and how XML content is handled. Documentum content and metadata are indexed. For information on scalability planning. Modify indexes by editing indexserverconfig. For information on creating Documentum indexes. which extracts tokens for indexing and returns them to the indexing service. You can configure all indexing parameters by choosing Global Configuration from the System Overview panel in xPlore administrator. Excluding content from extraction shrinks the index footprint and speeds up ingestion. The index requests are passed to the content processing service. CPS then transforms the results into UTF-16 half-width encoding.2 Administration and Development Guide 117 . Configuring text extraction CPS performs text extraction by getting text and metadata from binary or XML documents. By default. For information on viewing and updating this file. A full-text index can be created as a path-value index with the FULL_TEXT option.xml. see Modifying indexserverconfig. Configure indexing in dsearch_home/config/indexserverconfig. see Documentum xPlore Installation Guide. see Creating custom indexes. Chapter 6 Indexing This chapter contains the following topics: • About indexing • Configuring text extraction • Defining an index • Creating custom indexes • Managing indexing in xPlore administrator • Troubleshooting indexing • Running the standalone consistency checker • Indexing APIs About indexing The indexing service receives batches of requests to index from a custom indexing client like the Documentum index agent.xml. Compression may slow the ingestion rate by 10-20%. you must specify how that content should be handled.) Set the value of index-value-leaf-node-only in the index-plugin element to false. For the fail option.) Specify an XPath value to the element whose content requires text extraction for indexing. Table 14 Extraction configuration options Option Description do-text-extraction Contains one or more for-element-with-name elements. compress Compresses the text value of specified elements to save storage space. Tokens will not be saved for larger content. xml-content index-as-sub-path Boolean parameter that specifies whether the path is stored with XML content when xml-content embed attribute is set to true.2 Administration and Development Guide . (For information on DFTXML. 118 EMC Documentum xPlore Version 1. content is not searchable. xml-content file-limit Sets the maximum size of embedded XML content. Valid values: embed_as_cdata | ignore | fail. Separate storage is not supported for this release. You can configure indexing to return all node values instead of the leaf node value.Indexing Indexing depth: Only the leaf (last node) text values from subelements of an XML node with implicit composite indexes are returned. The option embed_as_cdata stores the entire XML content as a CData sub-node of the specified node. for-element-with-name/ Sets tokenization of content in specific elements for save-tokens-for-summary-processing summaries. dmftcontentref (content of a Documentum document). see Extensible Documentum DTD. (This change negatively affects performance. Handles errors such as syntax or external entity access. Specify the maximum size of documents in bytes as the value of the attribute extract-text-size-less-than. Set the maximum size of tokens for the element as the value of the attribute token-size. compress/for-element Using XPath notation. page 292. for-element-with-name Specifies elements that define content or metadata that should be extracted for indexing. It can be stored within the input document or separately (store="embed | separate | none"). Compressed content is about 30% of submitted XML content. Reindex your documents to see the other nodes in the index. The ignore option does not store the XML content. The paths in the configuration file are in XPath syntax and see the path within the DFTXML representation of the object. for example. It can be tokenized or not (tokenize=”true | false"). xml-content on-embed-error You can specify how to handle parsing errors when the on-embed-error attribute is set to true. for-element-with-name/ When a document to be indexed contains embedded xml-content XML content. specifies the XML node of the input document that contains text values to be compressed. see Facets. Increases index size while enhancing performance. For information on modifying this file. page 121.2 Administration and Development Guide 119 . The path to the indexes element is category-definitions/category/indexes. specifies whether the content should be compressed. for example. The following child elements of node/indexes/index define an index. add a sub-path element on //*. multi-path indexes do not have all content indexed. By default. Subpath indexes must be configured to support Documentum facets. it is not indexed. (continued) compress: Boolean. EMC Documentum xPlore Version 1. The path attribute contains an XPath notation to a path within the input document and options for the IndexServerAnalyzer. The symbols < and > must be escaped. path-value-index/sub-path See Subpaths. page 225.xml. Indexing Defining an index Indexes are configured within an indexes element in the file indexserverconfig. To index all element content in a multi-path index. and multi-path. Default: 1 (no boost). Specifies the path to an element for which the path information should be saved with the indexed value. Four types of indexes can be configured: fulltext-index. page 42. use the path dmftmetadata//*. For information on facets. path-value index.xml. Table 15 Index definition options Index option Description path-value-index The options attribute of this element specifies a comma-delimited string of xDB options: GET_ALL_TEXT (indexed by its string value including descendant nodes)| SUPPORT_PHRASES (optimizes for phrase search and increases index size) | NO_LOGGING (turns off xDB transaction logging) | INCLUDE_START_END_TOKEN_FLAGS (stores position information) | CONCURRENT (index is not locked) The path attribute specifies the path to an attribute that should be indexed. Used for correlated repeating attributes. If an element does not match a configuration option. For example. sub-path attributes boost-value: Increases the score for hits on the subpath metadata by a factor. value-index. Applies only to path-value-indexes that contain the IndexServerAnalyzer option INDEX_PATHS. see Modifying indexserverconfig. to index all metadata content. specifies whether the position of the element in the path should be indexed.’blue’. media objects with prop_name=’dimension’ and prop_value=’800x600’. (continued) enumerate-repeating-elements: Boolean. specifies whether the indexed value will be returned. (continued) path: Path to element relative to the index path. see Facets. the full-text index does not need to be updated. Use for comparisons such as =.emc. INDEX_PATHS&gt. Also. indexserver.w3. If false. Value indexing requires additional storage. 120 EMC Documentum xPlore Version 1. specifies whether the subpath supports leading wildcard searches. Valid values: string | integer | boolean | double | date | datetime. If true. IndexServerAnalyzer: GET_ALL_TEXT.w3. starts-with.documentum. For example. speeding up queries. Cost of this option: Lowers the indexing rate and increases disk space. (continued) type: type of content in subpath.fulltext.xhive.core. so you should not index fields that will not be searched for as comparisons or starts-with.FULL_TEXT: com. For information on facets.org/1999/ 02/22-rdf-syntax-ns#}RDF/{http://www. (continued) leading-wildcard: Boolean. The following example has a namespace: path="/{http://www. Default: false. when excluded elements in a document are modified. (continued) value-comparison: Boolean. >. page 225.2 Administration and Development Guide . such as Documentum dmftmetadata.org/ 2004/02/skos/core#}Concept&lt. The xpath for the example in the path attribute is the following: xpath==”/RDF/Concept” (continued) returning-contents: Boolean.SUPPORT_PHRASES." name="Concept" (continued) xpath: Value of path attribute using xpath notation without namespace. Supports XQuery typed expressions such as date range or boolean value. the user may search for documents with specific characteristics in a certain repository folder path. but the folder path does not need to be returned. specify the xpath to it as the value of the xpath attribute. the token for this instance will have a copy of all descendant tokens. If the path value contains a namespace.Indexing Index option Description (continued) full-text-search: Specifies whether the subpath content should be tokenized. <. specifies that the value in this path should be indexed. Used for facets only.index.core. you do not need to duplicate this information in the no-tokenization element. (continued) include-descendants: Boolean. Default: false. INCLUDE_START_END_TOKEN_FLAGS. This exclusion reduces the binary index size. Use for nodes with many small descendant nodes. Set to true if the tokens will be queried. so you do not need to set up indexes.2 Administration and Development Guide 121 . except for the following use cases: • Store facet values in the index. Indexing Subpaths A subpath definition specifies the path to an element. • Add paths to support XQuery of XML content. non-repository metadata or content added to your documents during indexing. The path information is saved with the indexed value. indexed (covering) Yes No values Creating custom indexes xPlore indexes full-text content and metadata for all objects in the repository. Create a custom index for the following: • A subpath for each facet used by the client application • Custom. For example: <sub-path description="leading wildcard queries" returning-contents="false" value-comparison="true" full-text-search="true" enumerate-repeating-elements=" false" leading-wildcard="true" type="string" path="dmftmetadata//object_name"/> Table 16 Path-value index with and without subpaths Feature Without subpath With subpath Key set combinations Limited Flexible Single key query latency Low High (performs better with complex predicates) ftcontains (full-text) Single per probe Supports multiple ftcontains in a probe Updates Low overhead High overhead Returnable. A subpath increases index size while enhancing performance. the content must be reingested. Requires a TBO. EMC Documentum xPlore Version 1. For example: <sub-path description="Used by CenterStage to compute facets persons" returning-contents=" true" compress="true" value-comparison="true" full-text-search="true" enumerate-repeating-elements="false" type="string" path="dmftcustom/entities/person"/> • Add paths for dmftcustom area elements (metadata or content that a TBO injects). If you add an index on existing content. Set up indexes before content is ingested. such as supporting leading wildcard searches for certain paths. you do not need to modify the definitions of the subpath indexes. • Modify the capabilities of existing subpaths. For most Documentum applications. You can cancel any indexing batch requests in the queue. Click Enable or Disable. page 239 for more information on these reports. Indexes are defined within indexserverconfig. along with its indexable attributes. Indexing new attributes You can add path-value indexes to indexserverconfig. rebuild the index. The default values have been optimized for most environments. The queue is displayed.xml. Each indexable object type is represented in DFTXML. Statistics are displayed in the right panel: tasks completed. Refer to Extensible Documentum DTD. • View the indexing queue: Expand an instance in the tree and choose Indexing Service. Testing upload and indexing If xPlore administrator is running on the same instance as an index service. in a predictable path within the DFTXML representation. You can view the index agent queue in the index agent UI or in Documentum administrator 6.xml for new data types so that documents of the new type can be ingested without rebuilding the index. Managing indexing in xPlore administrator You can perform the following administrative tasks in xPlore administrator. page 292 for additional information on DFTXML. in dsearch_home/config.2 Administration and Development Guide . within a category definition. Shut down all xPlore instances before changing this file. Note: This queue is not the same as the index agent queue. • Start or stop indexing: Select an instance in the tree and choose Indexing Service. page 283. To enhance performance and reduce storage. You can configure the various options described in Document processing and indexing service configuration parameters. Click Configuration. • Configure indexing across all instances: Expand Services > Indexing Service in the tree. 122 EMC Documentum xPlore Version 1. with a breakdown by document properties. If you add a new attribute for documents that have already been indexed.5 SP3 or higher. you can test upload a document for indexing. you can specify categories and attributes that are not indexed. Shut down all xPlore instances before changing this file.Indexing The Documentum index agent generates metadata in a data input format called DFTXML. See Using Reports. and performance. Troubleshooting indexing You can use reports to troubleshoot indexing and content processing issues. • View indexing statistics: Expand an instance in the tree and choose Indexing Service. indexserver. Checking the indexing log The xPlore log dsearch.0-9200-2" logger="com. not generated by the Documentum index agent. • Remote File: Enter the path to a remove file that is accessible from the xPlore administrator host.emc.0. The object name in xPlore is created by concatenating the file name and timestamp.2 Administration and Development Guide 123 . Set it as a path to the shared storage. text/plain. In the following example. Checking network bandwidth and latency Bandwidth or latency bottlenecks can degrade performance during the transfer of content from a source file system to xPlore.log shows that a document was indexed and inserted into the tracking DB: <event timestamp="2009-04-03 08:40:54. Validate that file transfers take place with the expected speed. Indexing Note: If you are testing Documentum indexing before migration.FtIndexObject" elapsedTime="1238773254536"> <message ><![CDATA[[INSERT_STATUSDB] insert into the StatusDB with docId 0965dd8980001dce operationId primary$3cb3b293-8790-452e-af02-84c9502a45e4 status NEW (message) ]]> </message> </event> EMC Documentum xPlore Version 1. you see the DFTXML. then edit the dmcontentref element. it is available immediately for search unless the indexing load is high.0. To do test upload in xPlore administrator.core. There can be slight differences in the DFTXML when you submit the document through the Documentum index agent. CPS reports measure indexing latency. and type raw XML such as DFTXML for testing. This issue is seen more frequently in virtual environments.index. • Specify raw XML: Click Specify raw XML. run the ACL replication script to update the security index. logging was set to INFO (not the default) in xPlore administrator. you start with a sample DFTXML file. expand the Diagnostic and Utilities tree and choose Upload testing document. You can test upload in the following ways: • Local File: Navigate to a local file and upload it. The following text from dsearch. specify a MIME type such as application/msword.536" level="INFO" thread=" http-0. To verify remote CPS. For the Content Type option (local or remote file). If the document has been successfully uploaded.log is located in the logs subdirectory of the JBoss deployment directory. application/pdf. Results are displayed. Manually updating security . When you click the object name.fulltext. page 50. This DFTXML rendition is template-based. then paste the DFTXML in the text box. or text/xml.core.documentum. 124 EMC Documentum xPlore Version 1. (Only document size averages are reported. • Large documents tie up ingestion A large document in the ingestion pipeline can delay smaller documents that are further back in the queue.)}</status> <message> {$i/data(. • DONE The document has been processed successfully. page 252. Metadata and content cannot be indexed. For example. You must know the document ID. Detect this issue using the Documents ingested per hour report in xPlore administrator. the document is not indexed. see CPS logging. • CPS restarts frequently Under certain conditions.Indexing Checking the status of a document Using xPlore administrator. for $i in collection(’/domainname>/dsearch/SystemInfo/StatusDB’)/trackinginfo/operation where $i [@doc-id = ’<document-id>’] return <r> <status>{$i/@status/data(. Restart is logged in cps. Workaround: Configure multiple index agents for redundancy. Use xPlore administrator to determine whether documents were sent for ingestion. • WARN Only the metadata was indexed. Monitor the index agents and restart when they fail.log and cps_daemon.) If a document is larger than the configured maximum limit for document size or text size. • ERROR xPlore failed to process the request. but the restart causes a delay.)} </message></r> The status returned is one of the following: • NEW The indexing service has begun to process the request. CPS fails while processing a document. xPlore restarts the CPS process. These documents are reported in the xPlore administrator report Content too large to index. For information on these logs. you can issue the following XQuery to find the indexing status of a document.log.2 Administration and Development Guide . Troubleshooting high save-to-search latency The following issues can cause high latency between the time a document is created or updated and the time the document is available for search: • Index agent is down Detect this problem by monitoring the size of the index agent queue. The document metadata are indexed but the content is not. the Documents ingested per hour report shows 0 for DocCount when the index agent is down. xml. page 266. For a comparison or performance on various storage types. or memory are highly utilized. EMC Documentum xPlore Version 1. fix-trackDB set to true. see Disk space and storage. disk I/O. It also validates the library-path in the database entry. Indexing Workaround: Attempt to refeed a document that was too large. • Insufficient hardware resources If CPU. page 83. see Modifying indexserverconfig. Review the ingestion reports in xPlore administrator to find bytes processed and latency. The batch size for a list is configurable (default: 1000). If you have changed configuration using xPlore administrator. it inserts the missing entries in the tracking database. page 131.2 Administration and Development Guide 125 . then reconciles it with the corresponding collection. page 42. the tool queries the tracking database to find the number of document IDs.xml. The tool checks consistency with the following steps. • Ingestion batches are large During periods of high ingestion load. page 125. increase the capacity. Changes to index configuration do not take effect If you have edited indexserverconfig. 5. Workaround: Set up a dedicated index agent for the batch workload. repeat your configuration changes and restart xPlore. For information on the database consistency checker in xPlore administrator. 4. Document IDs are returned as a list from the corresponding collection. Use dsearch. Increase the maximum size for document processing. 3. see Domain and collection menu actions. 2. Performance on a virtual server is slower than on a dedicated host. • Detects tracking database corruption and rebuilds it. 1. For each collection in the domain. documents can take a long time to process. Differences are reported. Document IDs are returned as a list from the tracking DB. The consistency checker is invoked from the command line. Batches are processed until all documents have been checked. Running the standalone consistency checker The standalone consistency checker has two functions: • Checks data consistency for all collections in a specific domain. Note: Do not check consistency during migration or index rebuild. The list sizes are compared. Run only one instance of the tool at a time. See Maximum document and text size. See also: Running the standalone consistency checker. For information on viewing and updating this file. and your system crashed before the changes were flushed to disk. your changes are not applied unless the system is stopped.log to determine when a specific document was ingested. It deletes the entries in the database that do not exist in the collection. The following topics describe the use of indexing APIs.2 Administration and Development Guide . If indexing has not been turned off. domain. ’export/space1/temp’ " 4. ’1000’. Default base directory is the current working directory. Route a document to a collection You can route a document to a collection in the following ways: • A custom routing class. page 154. Set to false first and check the report. Indexing APIs Access to indexing APIs is through the interface com.IDSearchClient. batch-size. Non-numeric. page 127. (See Using the CLI. ’C:\\tmp’ " Linux example: Checks consistency of "defaultDomain”: . Set the target collection. ’defaultDomain’. Report is created in a subdirectory report-directory/time-stamp/domain_name|collection_name. Windows example: Checks consistency of a default collection in defaultDomain and fixes the trackingDB: xplore "checkDataConsistency ’collection’. true. ’true’. • batch-size: Numeric value greater than or equal to 1000. Stop all indexing activity on the instance. Back up your domain or collection.core. or null values default to 1000. View the report in the current working directory. page 65 126 EMC Documentum xPlore Version 1.’default1’ true. See Mapping content to collections.documentum. Syntax (on one line): xplore checkDataConsistency unit. inconsistencies are reported.) The default batch size is 1000.client. • Index agent configuration. ’2000’. report-directory Valid values: • unit: collection or domain • domain: domain name. ’defaultDomain’. ’2000’. or federation to read-only or maintenance mode.null. 2. domain. • collection: Collection name (null for domain consistency check) • fix-trackDB: true or false.sh checkDataConsistency " ’domain’. collection. negative. ’null’. See Creating a custom routing class. ’defaultDomain’.Indexing 1. Invoke the checker using CLI. ’C:\\tmp’ " Checks consistency of all collections in defaultDomain and fixes the trackingDB: xplore "checkDataConsistency ’domain’. Each API is described in the javadocs.fulltext. Edit the script to change the batch size.emc. • report-directory: Path for consistency report./xplore. 3. fix-trackDB. Register your custom routing class in indexserverconfig. The collection determined by the routing class takes precedence over a collection that is specified in the indexing API. Create your custom class. You can provide a class that implements the interface and specify this class name in the xPlore configuration file indexserverconfig.) <customization-config> <collection-routing class-name="SimpleCollectionRouting"> </collection-routing> </customization-config> 3. dsearch_home//jboss5. Creating a custom routing class xPlore defines an interface for routing a document to a collection: IFtIndexCollectionRouting in the package com. long getRequestId ().1. Stop the xPlore instances. Indexing For a detailed example of routing to specific collections and targeting queries to that collection. 1. for example. EMC Documentum xPlore Version 1. String getCategory (). The sample Java class file in the SDK/samples directory assumes that the Documentum index agent establishes a connection to the xPlore server.) Import IFtIndexRequest in the package com.common. which is located in the xPlore config directory.xml. You can use custom routing for all domains or for specific domains. Add the following element between the system-metrics-service and admin-config elements. (The xPlore server must be stopped before you add this element.core. Then xPlore invokes the custom class for routing.fulltext.2 Administration and Development Guide 127 . FtOperation getOperation ().custom.emc. IFtDocument getDocument ().emc.fulltext.core. //returns doc to be indexed public String getClientId(). update or delete String getDomain (). } SimpleCollectionRouting example This example routes a document to a specific collection based on Documentum version.war/WEB-INF/classes.index. (See example. //returns add. This class encapsulates all aspects of an indexing request: public interface IFTIndexRequest { String getDocId (). see "Improving Webtop Search Performance Using xPlore Partitioning" on the EMC Community Network (ECN). 2. String getCollection ().0/server/DctmServer_Indexagent/deploy/IndexAgent. public void setClientId(String id). void setCollection(String value). Place the compiled class SimpleCollectionRouting in the Documentum index agent classpath.documentum.client.documentum.index.xml. 2 Administration and Development Guide . import javax.Indexing This class parses the input DFTXNL representation from the index agent.common. private String m_version = null. Node versionnode = (Node) result.core.client.emc.NODE). version = versionnode.common. XPathConstants.common. import org. import com.fulltext. Required method setEssServerInfo The index agent environment sets the xPlore server info.core. XPathFactory factory = XPathFactory.index.IDSearchServerInfo.w3c.emc. System.fulltext.core. import com. The class gets a metadata value and tests it.documentum.FtFeederException.newXPath().core.xpath. The Documentum version is returned to setCustomCollection().newInstance(). private static final String s_collection = "superhot".fulltext.index.core.emc.util. You can implement this method simply without creating the connection: public void setEssServerInfo(IDSearchServerInfo info) { m_serverInfo = info.emc. Add variables for routing private boolean m_updated = false. custom.client. import java.IFtIndexCollectionRouting.documentum.IFtIndexRequest.*.out.compile("//r_version_label/text()").index.xml.index. Object result = expr.documentum. Imports import com.dom. private String parseDocumentVersion(Document inputdoc) throws XPathExpressionException { String version = null.documentum.List. Parse the metadata from DFTXML This method parses the DFTXML representation of the document and metadata.evaluate(inputdoc.*.fulltext.emc. which is passed in from the Documentum index agent. then routes the document to a custom collection if it meets the criterion (new document). XPath xpath = factory.documentum. XPathExpression expr = xpath. import com.fulltext. m_version = version. import com.getNodeValue().println("version: " + version).IFtDocument. } IDSearchServerInfo m_serverInfo. 128 EMC Documentum xPlore Version 1.client. toString(). private boolean setCustomCollection(IFtIndexRequest request) { assert request != null. } return m_updated. } } return m_updated.setCollection(s_collection). } } catch (XPathExpressionException e) { new FtFeederException(e). } EMC Documentum xPlore Version 1.getContentToIndex(). } Set the custom collection This method determines whether the document is being added (not updated or deleted).getDocument(). // Return true after the collection name has been altered for any request // Otherwise returns false. IFtDocument doc = request. for ( IFtIndexRequest request : requests ) { if (request. public boolean updateCollection(List<IFtIndexRequest> requests) throws FtFeederException { assert m_serverInfo != null. The method then calls setCustomCollection to get the version and route the document to the appropriate collection. Document xmldoc = doc.0") { request.2 Administration and Development Guide 129 .getOperation(). try { String version = parseDocumentVersion(xmldoc). if (version. assert requests != null. Indexing return m_version.equals("add")) { setCustomCollection(request). m_updated = true.equals "1. } Add routing logic This method calls parseDocumentVersion() to get the version for routing. It sets the custom collection if the metadata meets the criterion (new collection). . Chapter 7 Index Data: Domains. Note: You must set the domain to maintenance mode before running this check. If the check fails. run the repair-merge utility. The options get query plan and get optimizer debug are used to provide information to EMC technical support. xDB page owners. Note: When the result from this check is inconsistent. • Free and used pages: Touches all pages in database. Enter your XQuery expression in the input area. Select the following options to check. Categories.2 Administration and Development Guide 131 . The check verifies that the necessary Lucene index files exist and that the internal xDB configuration is consistent with files on the file system. Check DB consistency Perform before backup and after restore. and xDB DOM nodes. does not depend on index size. accesses all the nodes in the database. This check determines whether there are any corrupted or missing files such as configuration files or Lucene indexes. Lucene indexes are checked to see whether they are consistent with the xDB records: tree segments. Some options require extensive processing time: • Segments and admin structures: Efficient check. and Collections This chapter contains the following topics: • Domain and collection menu actions • Managing domains • Configuring categories • Managing collections • Troubleshooting data management Domain and collection menu actions Execute XQuery You can query a domain or collection with Execute XQuery in xPlore administrator. • DOM nodes: Expensive operation. • Basic checks of indexes: Efficient check of the basic structure. Check to provide information to technical support. • Pages owner: Touches all pages in database. EMC Documentum xPlore Version 1. • Indexes: Traverses all the indexes and checks DOM nodes referenced from the index. run it two more times. page 125/ View DB statistics Displays performance statistics from xDB operations. Use the standalone consistency checker to check data consistency for specific collections in a domain or specific domains in a federation. You can create a domain and select its storage location before you configure the Documentum index agent. When the storage location is configured. they are re-created. See Changing collection properties. Managing domains A domain is a separate. mode reverts to runtime on xPlore restart. Backup See Backup in xPlore administrator. If tracking entries point to nothing. The check verifies that the necessary files exist and that the internal xDB configuration is consistent with files on the file system. Configuration The document category and storage location are displayed (read-only). See Running the standalone consistency checker. and Collections The basic check of indexes inspects only the external Lucene indexes. you can select it for the index agent. The mode does not persist across xPlore sessions. The Documentum index agent creates a domain for the repository to which it connects. If tracking entries are missing. they are deleted. 132 EMC Documentum xPlore Version 1.2 Administration and Development Guide .) Choose a storage location from the dropdown list. logical. Domains are managed through the Data Management screen in xPlore administrator. Choose a default document category. The domain name must match that of the Content Server repository. (Categories are specified in indexserverconfig. This domain receives indexing requests from the repository. or structural grouping of collections. If inconsistencies are detected. page 150. Categories. To create a storage location. page 137.xml. the tool can rebuild the tracking database. You can set the runtime mode as normal (default) or maintenance (for a corrupt domain).Index Data: Domains. New domain Select Data Management in the left panel and then click New Domain in the right panel. see . New Collection Create a collection and configure it. This check runs much faster than the full consistency check. independent. . To avoid corruption. tokenization. do not detach or attach a domain during index rebuild. More than one collection can map to a category. Log in to xPlore administrator and open the domain in Data Management. dsearch_home/data/GlobalOps.DB"/> <binding-server name="primary"/> </segment> 7.war/WEB-INF/classes/ indexserver-bootstrap.xml. id="GlobalOps#dsearch#ApplicationInfo#acl"> <file id="10" path="C:\xPlore\data\GlobalOps\acl\xhivedb- GlobalOps#dsearch #ApplicationInfo#acl-0. Delete the segment elements for the domain in XhiveDatabase. EMC Documentum xPlore Version 1.XhiveDatabase. Index Data: Domains.name="my_domain:> ..bootstrap in dsearch_home/config. Click Detach to force detach the domain. 3. Delete domain Remove the domain with the following steps. page 150. Configuring categories A category defines a class of documents and their XML structure. You also specify the indexes that are defined on the category and the XML elements that are not indexed. The category definition specifies the processing and semantics that are applied to the ingested XML document. detach or attach a domain For information on backing up a domain.2 Administration and Development Guide 133 . Delete the domain folder under dsearch_home/data. for example. Search on the domain name. change the binding for each collection. 8.. (Optional) Back up the dsearch_home/config and dsearch_home/data/domain_name folders. Force recovery if the domain is corrupted. Stop all xPlore instances.1. 5. For example. Add a property force-restart-xdb=true in dsearch_home/jboss5. you delete the GlobalOps domain in every segment library-path or id attribute: <segment library-id="0" library-path="/GlobalOps/dsearch/ ApplicationInfo/acl" . and Collections Back up.properties 2. like the following: <domain . see Backup in xPlore administrator. </domain> 6. Instead. 4. Restart the xPlore primary and then secondary instances. Remove the domain element from indexserverconfig.0/server/%INSTANCE_NAME%/deploy/dsearch. Categories.. and storage of tokens... 1. xPlore manages categories. Note: Do not use detach to move a domain index. You can specify the XML elements that have text extraction. Categories. edit indexserverconfig. To change the category. In a basic deployment. audit database. All documents submitted for indexing are assigned to a collection. page 138 • Creating a collection storage location. For Documentum track-location DFTXML representations of documents. you see the assigned category. The paths in the configuration file are in XPath syntax and refer to the path within the XML representation of the document. page 140 About collections Collections A collection is a logical group of XML documents that is physically stored in an xDB detachable library. category Contains elements that govern category indexing. When you view the configuration of a collection. all documents in a domain are assigned to a 134 EMC Documentum xPlore Version 1. text extraction settings.xml. (All documents are submitted for ingestion in an XML representation. and thesaurus database. properties/property Specifies whether to track the location (index name) of the content in this category. and compression setting for each category. A collection generally contains one category of documents. security (ACL and group. page 140 • Querying a collection.2 Administration and Development Guide . Managing collections • About collections. tracking database.xml. metrics database. Table 17 Category configuration options Option Description category-definitions Contains one or more category elements. When you create a collection.) Specify an XPath value to the element whose content requires text extraction for indexing. choose a category from the categories defined in indexserverconfig. page 136 • Adding or deleting a collection. page 139 • Deleting and recreating indexes. page 135 • Limitations of subcollections. A collection represents the most granular data management unit within xPlore. Documentum ACLs and groups are not tracked because their index location is known. page 136 • Changing collection properties. page 137 • Attaching and detaching a collection. You can configure the indexes. It cannot be changed in xPlore administrator. page 138 • Moving a temporary collection.Index Data: Domains. and Collections The default categories include dftxml (Documentum content). the location is tracked in the tracking DB. page 137 • Routing documents to a specific collection. page 138 • Rebuilding collections. page 134 • Planning collections for scalability. xml describes what is indexed within the documents that are submitted Specify the target collection for documents using one of the following methods. and off_line. page 127. search only. See Creating a custom routing class. you see the following information about the collection: • Library path in xDB. and documents are indexed to collections within that domain. For more information. ApplicationInfo and SystemInfo. you see each collection name. In the right pane. or update and search). • Document category. index only. page 25. see Documentum domains and categories. and Collections single default collection. Categories. the Documentum index agent creates a domain for each source repository. update_and_search. • Usage: Type of xDB library. in KB Viewing a document in a collection Click Name for an individual document to view the XML representation (DFTXML) of the document. • Current size on disk. A document category definition in indexserverconfig. They have the following order of precedence in xPlore. For xPlore system collections. with highest first: • Custom routing class. You must detach it or bring it online. Viewing collections To view the collections for a domain. EMC Documentum xPlore Version 1. For Documentum environments. For example. A collection contains documents of a single category. Viewing collection properties Choose Data Management and drill down to the collection in the left pane. Planning collections for scalability A collection is a logical grouping of tokenized content and associated full-text indexes within a domain. you have a collection that indexes all email documents. the X is grayed out. • xPlore instances the collection is bound to. There is generally a one-to-one mapping between a category and a collection. usage. There is a red X next to the collection to delete it. Note: You cannot back up a collection in the off-line state. choose Data Management and then choose the domain the left pane. A collection can be bound to multiple instances in read-only state (search-only). Index Data: Domains. Valid types: data (index). category. and instances that the collection is bound to. and the collection cannot be deleted. In the content pane. A collection is bound to a specific instance in read-write state (index and search. • State: Valid states: index and search. state. index only. You can create subcollections under each collection and route documents to user-defined collections.2 Administration and Development Guide 135 . A single collection hierarchy performs better for search. You can create a separate collection for ingestion and then merge that collection with another. For example. Only collections with the state index or index and search can receive documents. the subcollection cannot be index_only or index_and_search. As the data becomes less in demand. page 137. (When you move a collection in xPlore administrator. • Default collection that is created for each domain. Categories. see Changing collection properties. To define a storage location. Valid types: data (index) or applicationinfo. page 65. a path-value index is defined with no subpaths. Adding or deleting a collection 1. Exception: When the parent collection is in update_and_search or index_and_search state. • Binding instance: Existing instances are listed. See Sharing content storage. 3. Set properties for the new collection: • Collection name • Parent domain • Usage: Type of xDB library. such as the folder-list-index. To change the binding of a collection. To plan for high ingestion rates. 2. if the parent is search_only. and Collections • API indexing option or collection mapping in the Documentum index agent. • Storage location: Choose a storage location from the dropdown list. including a collection that is moved to become a subcollection: • Subcollections cannot be detached or reattached when the parent collection is indexed. 136 EMC Documentum xPlore Version 1. Choose a domain and then choose New collection. The storage location is the same as the location for the parent. To create a subcollection. subcollections can be any state.xml.) • Subcollection state cannot contradict the state of the parent. • Document category: Categories are defined in indexserverconfig. The documents are passed to the instance with the specified collection. For example. If the parent is searchable. you can route a collection to high-speed storage for ingestion. When the target collection is not specified. documents are indexed to the collections in round-robin order. Limitations of subcollections The following restrictions are enforced for subcollections. the adopted collection cannot be search-only.2 Administration and Development Guide . see Creating a collection storage location. page 138. click the parent collection in the navigation pane and then click New Subcollection. you can bind the collection to low-cost storage.Index Data: Domains. it is detached and then bound to the same instance as the parent. • Subcollections must be bound to the same instance as the parent. • Subcollections cannot be backed up or restored separately from the parent. Categories. A collection can have only one binding that is index_and_search. do the following. If the collection state is index_and_search. you cannot edit the binding. • Use update and search for read-only collections that have updates to existing content or metadata. to all instances in round robin order. parent domain. To set up storage locations. Collections with the state search_only or off_line cannot be deleted in xPlore administrator. and Collections 4. index only. Select a collection and then choose Configuration. If the collection state is search_only. you can bind the collection to multiple instances for better resource allocation. and storage location. See Sharing content storage. page 65. page 127 • Documentum index agent: Map a file store to a collection before migration. state. • Documentum index agent: Map all documents to a specific collection. choose a domain and then click X next to the collection you wish to delete. update_and_search. with custom routing applied in highest priority. 4. page 138. Configure state: index and search. To delete a collection. Limitations: • If a binding instance is unreachable. A collection must have the state index_and_search or index_only to be deleted. see Creating a collection storage location. To map all documents to a specific collection. • index and search is the default state when a new collection is created. or search only. • You cannot change the binding of a subcollection to a different instance from the parent collection. EMC Documentum xPlore Version 1. or index_only . • Default collection of the domain. update and search. You cannot query a collection that is set to index only. 2. Change storage location. Changing collection properties 1. usage. Set the state of the collection. restore the collection to the same instance or to a spare instance. binding instance. Choose a Binding instance. Index Data: Domains. The Edit collection screen displays the collection name. To delete these collections. Routing documents to a specific collection There are four ways to route documents to a specific collection. document category.2 Administration and Development Guide 137 . • Use index only to repair the index. See Creating a custom routing class. Use for complicated routing logic before migration. • Use search only (read-only) on multiple instances for query load balancing and scalability. 3. • Custom routing class. • To change the binding on a failed instance. Routing is applied in the following order of precedence. Change binding to another xPlore instance: a. You cannot add new content to the collection. b. you can bind to only one instance. use the xDB admin tool. Change Binding instance.2 Administration and Development Guide . 2. see Limitations of subcollections. Locate the element indexer_plugin_config/generic_indexer/parameter_list. • The collection does not have subcollections. Edit the file indexagent. select a target collection. In the Move to dialog.1. then restart the index agent. and then move it to become a subcollection of another collection after ingestion has completed.war/WEB-INF/classes. Choose the collection and click Move. • The collection is not itself a subcollection (does not have a parent collection). 138 EMC Documentum xPlore Version 1. 3. • xPlore enforces additional restrictions after the collection has been moved. Save the close the configuration file. This collection can be a subcollection or a top-level collection.0/server/DctmServer_ Indexagent/deploy/IndexAgent. change the collection binding. See Offline restore.xml. you can move it to become a subcollection: • Facet compression was disabled for all facets before the documents were indexed. Instead. You can create a collection for faster ingestion.xml) as the new parent. To avoid corruption. so one location can be shared among instances. • The collection has the same category and index definitions (in indexserverconfig. 2. The location is not specific to the instance. Creating a collection storage location xPlore stores data and indexes in an xDB database. 1. Attach and detach are required for a restore operation. The index for each collection can be routed to a specific storage location. choose the collection and click Configuration. and Collections 1. Categories. Attaching and detaching a collection Note: Do not use detach to move a collection. This file is located in dsearch_home/jboss5. If a collection meets the following requirements. substituting the name of your target collection as the value of the parameter_value element: <parameter> <parameter_name>collection</parameter_name> <parameter_value>target_collection </parameter_value> </parameter> 4. Moving a temporary collection If you are moving a collection to another xPlore instance. Add the following parameter. page 136. For information on subcollection restrictions.Index Data: Domains. do not detach or attach a collection during index rebuild. page 152. Note: Do not perform the following operations during rebuild: Heavy ingestion load.txt. Rebuilding collections Collection must be online and in the read/write state to rebuild the collection. choose System Overview in the tree. Enter a name and path and save the storage location. 2.0. Use this list in the next step. Index Data: Domains. If the disk is full. run ftintegrity or the State of Index job in Content Server 6. • You added a new facet.2 Administration and Development Guide 139 . page 62. 3. to synchronize with 1. you must back up and restore to the same location. click Rebuild Index. backup or restore. page 64. In xPlore administrator.txt 4. page 60 or Running the state of index job. After you choose the location. At the original ingestion. See Deleting and recreating indexes. page 140. refeed the documents. The list is located in dsearch_home/data/domain_name/collection_name/index_name/ids. create collection. • You changed the data model to add indexable attributes for an object type. You can perform normal ingestion and search during index rebuild. Click Global Configuration and then choose the Storage Management tab. Click Add Storage. For example: C:/xPlore/data/mydomain/default/dmftdoc_2er90/ids. The index is rebuilt in the background and then put into place when the rebuild is complete. To apply custom routing. Rebuild a collection index in the xPlore administrator UI. provide the path to the list of object IDs. and Collections After you create a storage location. After the index is rebuilt. In the index agent UI. some documents are not embedded in DFTXML because the XML content exceeds the value of the file-limit attribute on the xml-content element in indexserverconfig. EMC Documentum xPlore Version 1. See Indexing documents in normal mode. 2. set the collection state to update_and_search and create a new collection in a new storage location. • You want to strip out accents from words indexed in xPlore 1. See Using ftintegrity. In the right pane. A collection can be stored in a location different from other collections in the domain.xml. or check consistency. Reingest large documents. Choose an online collection in the Data Management tree. you can select it when you create a domain or a new collection. 3. and the objects have already been indexed. 1.7. Rebuild the index for the following use cases: • You added a stop words list. Custom routing is not applied when you rebuild an index. 1. The index rebuild process generates a list of object IDs for these documents. • You want to use index agent or DFC filters. The storage location is created with unlimited size. attach or detach a domain or collection. Categories.1 search. For information on configuring the audit record. Delete everything under dsearch_home/config. Check Get query debug to debug your query. Restore your customized indexserverconfig.xml if you have customized it. and Collections Deleting and recreating indexes To remove a collection. Go to Diagnostic and Utilities > Reports and choose Audit records for admin component. You can filter by date. Auditing is enabled by default. 3. rebuild start and end. the following operations on a collection are recorded: • Create and delete collection • Add. see Configuring the audit record. 2. When auditing is enabled. click Execute XQuery. 4. or delete index • Attach or detach • Adopt collection start and end • Backup start and end You can view a report based on the admin audit records. 6. or change binding • Create. To enable or disable auditing. In the right pane. Categories. remove. Click the X next to the collection name to delete it. open Global Configuration in the xPlore administrator left pane. 2. 3. Shut down all xPlore instances. Select the Auditing tab and check admin to enable or disable data management auditing.xml from the temporary location. The query optimizer is for technical support use. navigate to the domain in xPlore administrator left panel. Troubleshooting data management Auditing collection operations You can audit operations on collections. Custom routing is applied. see Routing a query to a specific collection. 7. To route queries to a specific collection. Start xPlore instances. Note: If you are unable to delete the content because xDB is still running.0.1. page 38. 5.2 Administration and Development Guide . page 208. Querying a collection 1. Copy to a temporary location the file dsearch_home/config/indexserverconfig.Index Data: Domains. 140 EMC Documentum xPlore Version 1. Choose a collection in the Data Management tree. 1. stop the Java processes running in dsearch_home/jboss5. Delete everything under dsearch_home/data. Refeed the documents. tokens.DB" id="15"/> <binding-server name="primary"/> </segment> 4. dsearch_home/boss5. This example has a force-detached collection with the name default in the domain defaultDomain. Shut down all xPlore instances. and Collections Force restart of xDB If xDB fails to start up. 1. 2.xml: <collection state="detached" document-category="dftxml" usage="Data" name="default"/> 3. In the following example. it cannot be removed with the Delete command in xPlore administrator. and xmlContent segment (if any) from XhiveDatabase. page 158.XhiveDatabase.properties to true.XhiveDatabase.2 Administration and Development Guide 141 . If this property does not exist in indexserver-bootstrap.1. 3. (xPlore cannot determine whether a collection is corrupted after it has been detached. your backups cannot be restored.) If you do. Delete the physical data folder: EMC Documentum xPlore Version 1.) To remove a corrupted collection. for example. you can force a start. the segment ID starts with defaultDomain and ends with default: <segment id="defaultDomain#dsearch#Data#default" temp="false" version="1" state="detach_point" usage="detachable_root" usable="false"> <file path="c:\DSSXhive\defaultDomain\default\xhivedb-defaultDomain# dsearch#Data#default-0.0/server/DctmServer_PrimaryDsearch/deploy/dsearch. Categories. do the following. This limit protects non-corrupted collections from being removed. Do not remove segments from xDB unless they are orphan segments. Edit indexserver-bootstrap.bootstrap in dsearch_home/config.war/WEB-INF/classes. 1. Remove the collection element from indexserverconfig. Remove the collection-related segment including the data segment and its corresponding tracking DB. 2. Index Data: Domains. Set the value of force-restart-xdb in indexserver-bootstrap. (This file is located Restart the xPlore instance.properties in the WEB-INF/classes directory of the application server instance. (See Orphaned segments CLIs. add the following line: force-restart-xdb=true Note: This property will be removed after restart. A collection that has been force-detached cannot be deleted After a collection has been force-detached.DB" id="14"/> <binding-server name="primary"/> </segment> <segment id="defaultDomain#dsearch#SystemInfo#TrackingDB#default" temp="false" version="1" state="detach_point" usage="detachable_root" usable="false"> <file path="c:\DSSXhive\defaultDomain\default\xhivedb-defaultDomain# dsearch#SystemInfo#TrackingDB#default-0.properties. and Collections dsearch_home/Data/defaultDomain/default 5.2 Administration and Development Guide . Start xPlore instances. Categories. 142 EMC Documentum xPlore Version 1.Index Data: Domains. See Automated backup and restore (CLI). You can use external automatic backup products like EMC Networker. Select Data Management in xPlore Administrator and then choose Check DB Consistency. domain. your backups cannot be restored. If the disk is full.2 Administration and Development Guide 143 . Chapter 8 Backup and Restore This chapter contains the following topics: • About backup • About restore • Handling data corruption • Backup in xPlore administrator • File. This check determines whether there are any corrupted or missing files such as configuration files or Lucene indexes. You can back up an xPlore federation. Backup and restore consistency check Perform a consistency check before backup and after restore. Note: If you remove segments from xDB. or changing a collection binding. Lucene indexes are checked to see whether they are consistent with the xDB records: Tree EMC Documentum xPlore Version 1. If you do not back up. You cannot back up and restore to a different location. or collection using xPlore administrator or use your preferred volume-based or file-based backup technologies. then restoring the domain or xPlore federation puts the system in an inconsistent state. All backup and restore commands are available as command-line interfaces (CLI) for scripting. set the collection state to update_and_search and create a new collection in a new storage location. High availability and disaster recovery planning is described in Documentum xPlore Installation Guide. Perform all your anticipated configuration changes before performing a full federation backup.or volume-based (snapshot) backup and restore • Offline restore • Automated backup and restore (CLI) • Troubleshooting backup and restore About backup Back up a domain or xPlore federation after you make xPlore environment changes: Adding or deleting a collection. page 153. They are incremental. page 150. requiring third-party product such as EMC Timefinder. Backup technology xPlore supports the following backup approaches.Backup and Restore segments. since most files are touched when they are opened. See Backup in xPlore administrator. Backup is warm or cold. • File-based backups: Back up the xPlore federation directory dsearch_home/data. You can back up a domain (Documentum repository index) or collection with indexing suspended (warm) or with the system off (cold). page 60 Backup state: Hot. (These files or ACLs were added or updated after the xPlore backup. A cumulative backup has all backups since the last full backup.) See Using ftintegrity. Backup combinations Periodic full backups are recommended in addition to differential backups. Windows file-based backup software requires exclusive access to a file during backup and thus requires a cold backup. and xDB DOM nodes. xDB page owners. You can back up hot (while indexing). You can perform incremental or cumulative backups only on the xPlore federation and not on a domain or collection. Use ftintegrity after the restore to detect additional files that need indexing. and /dblog files. cumulative (differential) or full. Order of backup and restore in Documentum systems The backup and restore date for xPlore must be at a point in time earlier than the backup of repository content and metadata.2 Administration and Development Guide . Incremental file-based backups are not recommended. page 125. • Volume-based (snapshot) backups: Can be cumulative or full backup of disk blocks. Table 18 Backup scenarios Level Backup state DR technology Backup scope collection warm xPlore full only 144 EMC Documentum xPlore Version 1. warm. warm (search only). see Running the standalone consistency checker. or cold (off-line). and cold You can back up the entire xPlore federation while you continue to search and index (hot backup). For a tool that checks consistency between the index and the tracking database. dsearch_home/config. Backup is warm or cold. • Native xDB backups: These backups are performed through xPlore administrator. the state is set to search_only for domain and collection backup and reverted after the backup completes. In addition. When you back up using xPlore administrator. Back up immediately after upgrading xPlore to 1. page 125. Lucene indexes are checked to see whether they are consistent with the xDB records: Tree segments. (These files or ACLs were added or updated after the xPlore backup. and xDB DOM nodes. and a transaction log. Use ftintegrity after the restore to detect additional files that need indexing.0 backup. The xPlore server must be shut down to restore a collection or an xPlore federation. page 60 EMC Documentum xPlore Version 1. The system uses the transaction log to restore data on an instance. Select Data Management in xPlore Administrator and then choose Check DB Consistency. collection. Scripted restore Use the CLI for scripted restore of a federation. xDB page owners. Note: You cannot restore an xPlore 1. This check determines whether there are any corrupted or missing files such as configuration files or Lucene indexes. Backup and Restore Level Backup state DR technology Backup scope domain warm xPlore full only xPlore federation warm or hot xPlore full.2 Administration and Development Guide 145 . See Automated backup and restore (CLI). page 153.) See Using ftintegrity. because the xDB version has changed.1. If you performed a hot backup using xPlore administrator. Restore a backup to the same location. For a tool that checks consistency between the index and the tracking database. Backup and restore consistency check Perform a consistency check before backup and after restore. see Running the standalone consistency checker. the backup file is restored to the point at which backup began. xPlore supports offline restore only. domain. incremental or cumulative cold or warm volume* full or cumulative cold or warm file* full only About restore All restore operations are performed off-line. If the disk is full. Each xPlore instance owns the index for one or more domains or collections. Order of backup and restore in Documentum systems The backup and restore date for xPlore must be at a point in time earlier than the backup of repository content and metadata. If there are multiple instances. set the collection state to update_and_search and create a new collection in a new storage location. one instance can own part of the index for a domain or collection. page 152. page 149 Recovering from a system crash. You can also use the CLI setDomainMode. See Domain and collection menu actions. page 131. page 146 Repairing a corrupted index.properties to true. Use xPlore administrator to set the domain mode to maintenance. Restore the corrupted domain or collection from a backup. • Only queries from xPlore administrator are evaluated. page 150 Detecting data corruption You can detect data corruption in the following ways: • An XhiveDataCorruptionException is reported in the xPlore server log. the domain mode is always set to normal (maintenance mode is not persisted to disk). The corrupt collection is marked unusable and updates are ignored. When xPlore is restarted. Force the server to start up: Set the value of force-restart-xdb in indexserver-bootstrap.Backup and Restore Handling data corruption Detecting data corruption. it is silently skipped. page 148 Snapshot too old. If the xPlore primary instance does not start. page 148 Dead object. page 159. the following restrictions are applied: • The only allowed operations are repair and consistency check. • Queries from a Documentum client are tried as NOFTDQL in the Content Server. In maintenance mode. • The primary instance does not start up. The xPlore primary instance may not start up. • Run the consistency checker on the xPlore federation. page 146 Handling a corrupt domain. page 146 Too many open files. do the following: a. See Offline restore. 146 EMC Documentum xPlore Version 1.2 Administration and Development Guide . Handling a corrupt domain 1. See Domain mode CLIs. Restore the corrupted domain from backup. Detach the corrupted domain. 1. xPlore does not process them. 2. 3. Repairing a corrupted index A collection that is corrupted or unusable cannot be queried. The console on startup reports XhiveException: LUCENE_INDEX_CORRUPTED. For example: repair-segments –d xhivedb -p emc101! /xplore/dsearch/Data/default LI-af97431d-6c4d-41ae-882d-873e8e94fdcf 7. Set the value of force-restart-xdb in indexserver-bootstrap. rebuild the index. Stop all xPlore instances.sh. 3. usually dmftdoc. go to the next step to clean corrupted data. 2. 6. check the number of folders named LI-* in dsearch_home/data. 3. Cleaning and rebuilding the index Use this procedure for the following use cases: • Data is corrupted on disk and there is no backup • You change collections after backup and see errors when you try to restore. If the number if large.1. EMC Documentum xPlore Version 1. Run the consistency checker again./lucene-index directories. 2. If the xPlore server starts up. Restart all xPlore instances. 5. 2. Launch XHCommand.. 8. • If the consistency check passes. Edit luceneindex.properties to true. Backup and Restore b. xdb repair-segments –d database -p path target database is the name of the xDB database. Open a command window and navigate to dsearch_home/dsearch/xhive/admin. C:\xPlore\jboss5. • If the consistency check fails. go to Too many open files. Enter the following xDB command. Right-click and choose Library management > Check library consistency. click Configuration. and then set the state to off_line. 1. If the xPlore system does not restart. c. Repair or restore from backup the corrupted collection or log. If startup still reports errors. The offending collection and its index are marked as unusable and updates are ignored. choose the collection in the tree. try a force restart.bat or XHCommand. (It is set to false by the system. perform the xDB command repair-segments.bootstrap in dsearch_home/config. Change index/isUsable to true. Start the xDB admin tool xhadmin. page 148. Restart xPlore instances. • If the consistency check passes.) 4. (This file is located in the WEB-INF/classes directory of the application server instance. the system is usable. generally "xhivedb". • If the consistency check fails.2 Administration and Development Guide 147 .bat or xhadmin. target (optional) is the Lucene index name reported in the error. Restart the xPlore instance. 1.0\server\DctmServer_PrimaryDsearch\deploy\dsearch. Continue with the next step. Get the path in the xPlore administrator collection view. for example. path is the full path to the library (collection) containing the index.sh and drill down to the library that was reported as corrupted. You see errors if you added or deleted a collection or changed collection binding after backup.war\WEB-INF\classes. the data files for the new collection are not deleted. 3. The dsearch log reports an error like the following: com.) Too many open files Follow these instructions when you get a Java exception: Too many open files. Start xPlore instances.XhiveException: IO_ERROR.lucene.Backup and Restore 3. For example: soft nofile 65536 hard nofile 1048576 4. change the nofile parameter in /etc/security/limits. Restore it using the procedure described in Offline restore. Stop all xPlore instances. • Make the xdb.lucene. • Make the xdb. 1. 2. Choose one: • You have a backup. page 152.xhive. XhiveDatabase.error. Contact technical support for assistance if you are not able to resolve the issue. Original message: File C:\xPlore\data\doc1\444\xhivedb-doc1#dsearch#Data#444-0.luceneMergeFactor smaller. • Recreate any collection that was added after backup. • Refeed documents for the new collection. 3. This file is in the directory WEB-INF/classes of the primary instance.cleanMergeInterval smaller. Snapshot too old For the XhiveException: LUCENE_SNAP_SHOT_TOO_OLD 1. Try adjusting one or more of the following properties: • Make xdb. If you created a collection after backup.lucene. 4. 2. (Use migration instructions in Documentum xPlore Installation Guide. Delete everything under dsearch_home/config except the file indexserverconfig. 6. • You do not have a backup. and then restored the domain or xPlore federation.2 Administration and Development Guide .confg. Restart all xPlore instances. Locate the collection for which the error message was returned. 5. Some snapshot errors are due to inconsistencies between the Lucene index and xDB.cleaningInterval smaller. 148 EMC Documentum xPlore Version 1. Rebuild the index using xPlore administrator.properties.xml. Shut down all xPlore instances. Tune the Lucene merge interval parameter in xdb.nonFinalMaxMergeSize larger. Refeed all documents. 7. • Make the xdb.DB already exists • Delete the file at the indicated path. If the OS of the xPlore host is Linux. Delete everything under dsearch_home/data. Detect the dead object issue using the xDB command repair-blacklists. 1. Backup and Restore Dead object For the XhiveException: OBJECT_DEAD. Syntax: repair-blacklists -d <db name> <lib path> <index name> --check-dups Output is like the following: ("total docs processed = ("total checked blacklisted objects = " ("total unaccounted for blacklisted objects = " ("total duplicate entries found = ").2 Administration and Development Guide 149 . This utility checks for duplicate entries in the Lucene index. ("total intranode dups found = " EMC Documentum xPlore Version 1. See Deleting and recreating indexes.Backup and Restore Recovering from a system crash Figure 15 System crash decision tree 1. it is a hot backup. In a hot backup. page 60. Choose the instance and click Indexing Service > Operations > Disable. indexing and search continues during the backup process.2 Administration and Development Guide . For a warm backup. See Scripted federation restore. 150 EMC Documentum xPlore Version 1. See Using ftintegrity. page 140. suspend indexing backup on each instance that the collection is bound to. 2. Backup in xPlore administrator When you back up using xPlore administrator. 3. page 156. If you are restoring a full backup and an incremental backup. This procedure assumes that no system changes (new or deleted collections. changed bindings) have occurred since backup. In a domain or collection. Backup and Restore After you change the federation or domain structure. (Do not use native xPlore to set state. For incremental backups. Check true for keep-xdb-transactional-log. Back up your jars or DLLs for custom content processing plugins or annotators. click Backup. See the white paper on Powerlink for backing up using Networker. page 60. Make sure that you have sufficient disk space for the backup and for temporary space (twice the present index size). Navigate to dsearch_home/dsearch/xhive/admin. the log file from the full backup is not deleted at the next incremental backup. perform both restore procedures before restarting the xPlore instances. Select Default location or enter the path to your preferred location and click OK. back up the xPlore federation or domain. Note: Do not mix CLI commands (suspend or resume disk writes) with native xPlore backup in xPlore administrator. File. 4. a. If you do not back up.) See Collection and domain state CLIs. Set all domains in the xPlore federation to the read_only state using the CLI. 4. Suspend ingestion for backup or restore. Incremental backup: By default. 3. 1. page 159.2 Administration and Development Guide 151 . b. change this setting before a full backup using xPlore administrator. Perform all your anticipated environment changes before backup. You supply the administrator password (same as xPlore administrator). 3. Choose Home > Global Configuration and then choose Engine. Resume xDB with the following command: XHCommand suspend-diskwrites --resume EMC Documentum xPlore Version 1. Use your third-party backup software to back up or restore. c. Back up the parent and all its subcollections. a. Use ftintegrity after the restore to detect additional files that need indexing. 1. When you change this setting. Launch the command-line tool with the following command.) See Using ftintegrity. log files are deleted at each backup.. b. then restoring the domain or xPlore federation puts the system in an inconsistent state. Note: Backup is not available for a subcollection. 2. Perform all your anticipated configuration changes before performing a full federation backup. (These files or ACLs were added or updated after the xPlore backup. Events such as adding or deleting a collection or changing a collection binding require backup. Order of backup and restore in Documentum systems The backup and restore date for xPlore must be at a point in time earlier than the backup of repository content and metadata.or volume-based (snapshot) backup and restore Data files in the backup must be on a single volume. XHCommand suspend-diskwrites 2. Search is still enabled. do the following: 1. 3. Do the following steps for domain or collection restore. See Scripted federation restore. 3. page 159. For automated (scripted) restore. or xPlore federation. 4. See Orphaned segments CLIs. 7. Domain or collection: Detach the domain or collection using xPlore administrator. page 157. 2. Collection only : Set the collection state to off_line using xPlore administrator. 8. or Scripted collection restore. Start all xPlore instances. run the CLI purgeOrphanedSegments. Scripted domain restore. perform both restore procedures before restarting the xPlore instances.) Use the script in Collection and domain state CLIs. Scripted domain restore. Detach the domain or collection: 1. Federation only: Clean up all existing data files. changed bindings) have occurred since backup.2 Administration and Development Guide . Start up and shut down xPlore. Domain only: If orphaned segments are reported before restore. • Delete everything under dsearch_home/config. page 156. (Perform a full federation backup every time you make configuration changes to the xPlore environment. Choose the collection and click Configuration. 5. Use the CLI purgeOrphanedSegments. Force-attach the domain or collection using xPlore administrator. Domain only: Generate the orphaned segment list. • Delete everything under dsearch_home/data. Restore the federation. If you are restoring a full backup and an incremental backup. The xPlore server must be shut down to restore a collection. restore both before restarting xPlore instances. Set all backed up domains to the reset state and then turn on indexing. (This state is not displayed in xPlore administrator and is used only for the backup and restore utilities.) If you are restoring a full backup and an incremental backup.Note: Force-detach corrupts the domain or collection. This procedure assumes that no system changes (new or deleted collections. domain. see Scripted federation restore. page 156. No further steps are needed for federation restore. Run the restore CLI. 2. page 157. page 156. page 158. Stop all xPlore instances. If you are restoring a federation and a collection that was added after the federation backup. If an orphaned segment file is not specified.Backup and Restore 5. page 156. Restore the collection. 2. 1. or Scripted collection restore. Shut down all xPlore instances. The following instructions include some non-scripted steps in xPlore administrator. Offline restore xPlore supports offline restore only. the orphaned segment IDs are read from stdin. 152 EMC Documentum xPlore Version 1. 6. 9. Run ftintegrity. page 160 CLI properties and environment The CLI tool is located in dsearch_home/dsearch/admin. and for the end date use the current date. Automated backup and restore (CLI) Force detach and attach CLIs.bootstrap EMC Documentum xPlore Version 1. Backup and Restore 10. page 153 Using the CLI.sh (Linux). page 60. for example. use the date of the last backup. 11. The tool wrapper is xplore. page 156 Scripted domain restore. dsearch_home/config/XhiveDatabase. page 156 Scripted collection restore. page 154 CLI batch file. page 159 Collection and domain state CLIs.2 Administration and Development Guide 153 . page 50. Perform a consistency check and test search. See . For the start date argument. page 158 Orphaned segments CLIs. If you change to the HTTPS protocol. See Using ftintegrity. page 155 Scripted federation restore. set up when you installed xPlore (same as xPlore administrator login password) bootstrap Full path to xDB bootstrap file. Run the ACL and group replication script to update any security changes since the backup. page 157 Force detach and attach CLIs. Table 19 CLI properties Property Description host Primary xPlore instance host: fully qualified hostname or IP address port Primary xPlore instance port. 12.properties to set the environment for the CLI execution. change this port.bat (Windows) or xplore. Select Data Management in xPlore Administrator and then choose Check DB Consistency. page 159 Activate spare instance CLI.Manually updating security . page 158 CLI properties and environment. page 158 Domain mode CLIs. Edit the properties file xplore. password xPlore administrator password. page 155 Scripted backup. protocol Valid values: http or https The xPlore installer updates the file dsearch-set-env.sh "<command> [parameters]" Examples: xplore “backupFederation ’c:/xPlore/dsearch/backup’.bat -f <filename> Examples: xplore./xplore.sh "dropIndex ’dftxml’.txt .Backup and Restore Property Description verbose Prints all admin API calls to console. Change the working directory to the location of dsearch_home/dsearch/admin.bat "<command> [parameters]" . The command is case-insensitive./xplore. true. (Optional) Run CLI commands from a file using the following syntax: xplore. Run a CLI command with no parameters. prints a success or error message./xplore. use the following syntax appropriate for your environment (Windows or Linux). ’folder-list-index’ " The command executes.bat -f file. This file contains the path to dsearch. For batch scripts. Run a CLI command with parameters using the following syntax appropriate for your environment (Windows or Linux). 5./xplore. xplore.sh help backupFederation 154 EMC Documentum xPlore Version 1.2 Administration and Development Guide . and then exits./xplore. For example: xplore help backupFederation .sh -f file. 2. The CLI uses the Groovy script engine./xplore. Default: true. Open a command-line window.sh. Use double quotes around the command. and a forward slash for all paths: xplore.sh resumeDiskWrites 4.bat <command> . null” . single quotes around parameters.txt Call the wrapper without a parameter to view a help message that lists all CLIs and their arguments. Using the CLI 1.sh <command> Examples: xplore. 3. set to false.bat resumeDiskWrites . The command is case-insensitive.war/WEB-INF.bat or dsearch-set-env. null println ’Done’ Call the batch file with a command like the following: xplore.bat -f sample. Set to false for full backup.sh f <batch_file_name> For example. set keep-xdb-transactional-log to true in xPlore administrator. Choose Home > Global Configuration > Engine. null" "backupCollection collection(’myDomain’. <collection_name>).gvy suspends index writes and performs an incremental backup us the xPlore federation. null" Scripted domain backup xplore backupDomain <domain_name>. Backup and Restore CLI batch file Use the following syntax to reference a batch file containing xPlore commands. ’default’). Use a forward slash for paths.bat f <batch_file_name> or xplore. isIncremental. ’c:/xplore/backup’ " EMC Documentum xPlore Version 1. Specify any path as the value of [backup_path]. null" xplore "backupDomain ’myDomain’. null is_incremental Boolean. [backup_path] Examples: xplore "backupDomain ’myDomain’. Substitute the full path to your batch file: xplore. ’c:/xplore/backup’ " Scripted collection backup xplore backupCollection collection(<domain_name>. true for incremental. null" xplore "backupFederation ’c:/xplore/backup’. Scripted federation backup xplore backupFederation [backup_path].xml as the value of the path attribute on the element admin-config/backup-location. <is_incremental>. For incremental backups.gvy Scripted backup The default backup location is specified in indexserverconfig. suspendDiskWrites folder=’c:/folder’ isIncremental=true backupFederation folder.2 Administration and Development Guide 155 . true. ’default’). false. [backup_path] Examples: "backupCollection collection(’myDomain’. the following batch file sample. Examples: xplore "backupFederation null. 2 Administration and Development Guide . Delete existing data files: • All files under dsearch_home/data • All files under dsearch_home/config. true" 156 EMC Documentum xPlore Version 1. [backup_path]: Path to your backup file. do the following: 1. Specify any path as the value of . 1. Specify any path as the value of . [bootstrap_path]: Path to the bootstrap file in the WEB-INF classes directory of the xPlore primary instance. 3. If no path is specified. Force-detach the domain using the CLI detachDomain. Restore the federation. Stop all xPlore instances. Note: If you are restoring a federation and a collection that was added after the federation backup. Restart all xPlore instances. "restoreFederation ’[backup_path]’ " For example: xplore "restoreFederation ’C:/xPlore/dsearch/backup/federation/ 2011-03-23-16-02-02’ " 4. true" For example: xplore "detachDomain ’defaultDomain’. 2. the default location in indexserverconfig.xml is used: The value of the path attribute on the element admin-config/backup-location. "suspendDiskWrites" "resumeDiskWrites" Scripted federation restore Back up and restore your jars or DLLs for custom content processing plugins or annotators. resume after backup. "detachDomain ’[domain_name]’. Run the restore CLI. the default backup location in indexserverconfig. Start up and shut down all xPlore instances. 3. If not specified. 2. If not specified.xml is used. The second argument is whether to force detach. Use only for restore operations. 1.xml is used: The value of the path attribute on the element admin-config/backup-location. Force detach can corrupt the data. Restore the collection. Scripted domain restore [backup_path]: Path to your backup file. the default backup location in indexserverconfig.Backup and Restore Scripted file or volume backup Suspend disk writes before backup. The last argument specifies a forced detachment. Generate the orphaned segment list using the CLI listOrphanedSegments. 7. 6. "restoreDomain ’[backup_path]’. run the CLI purgeOrphanedSegments. Restart xPlore instances. 2. ’[bootstrap_path]’ " 5. the IDs or orphaned segments are sent to stdio. See Orphaned segments CLIs. EMC Documentum xPlore Version 1. page 158. [backup_path]. Run the restore CLI. page 60. Generate the orphaned segment list using the CLI listOrphanedSegments. the default location in the WEB-INF classes directory of the xPlore primary instance is used. 4. See Orphaned segments CLIs. ’[collection_name]’). page 158. Run ftintegrity. Run the restore CLI. Perform a consistency check and test search. Restart xPlore instances. 9. 4. Stop all xPlore instances.2 Administration and Development Guide 157 . Run the ACL and group replication script aclreplication_for_repository_name in dsearch_home/setup/indexagent/tools to update any security changes since the backup. page 158. page 50. See Orphaned segments CLIs. Scripted collection restore [backup_path]: Path to your backup file. If no bootstrap path is specified. the segment IDs are read in from stdin. [bootstrap_path]: Path to the bootstrap file in the WEB-INF classes directory of the xPlore primary instance. [bootstrap_path]" 6. See Using ftintegrity. the IDs or orphaned segments are sent to stdio. See Manually updating security .xml is used: The value of the path attribute on the element admin-config/backup-location. 10. Stop all xPlore instances. Set the collection state to off-line. If not specified. Force-detach the collection using the CLI detachCollection. If an orphaned segment file is not specified. 3. Specify any path. Backup and Restore 2. "restoreCollection. If orphaned segments are reported before restore. 1. the default backup location in indexserverconfig. true" 3. the default location in the WEB-INF classes directory of the xPlore primary instance is used. true" For example: xplore "attachDomain ’defaultDomain’. true” 8.If an orphaned segment file is not specified. because force-detach can corrupt the collection. If an orphaned segment file is not specified. 5. Force-attach the domain using the CLI attachDomain: xplore "attachDomain ’[domain_name]’. xplore"detachCollection collection(’[domain_name>]. Use only for restore. If no bootstrap path is specified. ’true’ " or "detachCollection collection(’[domain_name]’. and any new segments that were used after backup are orphaned.’default’). xplore "attachCollection ’[domain_name]’. true" Orphaned segments CLIs Segments can be orphaned when content is added to or removed from the index after backup. Perform a consistency check and test search. If [orphan_file_path] is not specific. true" 2. page 60. Attach syntax: "attachDomain "’]domain_name]’. page 50. This file is used to purge orphaned segments after restore. 10. ’[collection_name]’). The restore operation does not reflect these changes. You attach it after you restore it. Force detach and attach CLIs You detach a domain or collection before you restore it.Backup and Restore 7. Detach syntax: "detachDomain ’[domain_name]’. See Manually updating security . true" or "attachCollection collection(’[domain_name]’.’default’). Federation restore does not create orphaned segments. Syntax: "listOrphanedSegments collection|domain [backup_path] [orphan_file_path] [bootstrap_path]" 158 EMC Documentum xPlore Version 1. If an orphaned segment file is not specified. If orphaned segments are reported before restore. See Troubleshooting data management. Force-attach the collection using the CLI attachCollection. ’[collection_name]’. Run the ACL and group replication script aclreplication_for_repository_name in dsearch_home/setup/indexagent/tools to update any security changes since the backup. run the CLI purgeOrphanedSegments. true" Examples: "attachDomain ’myDomain’. See Using ftintegrity. ’[collection_name]’). Run ftintegrity. The last argument is for forced attachment. The xDB database in xPlore does not start up with orphaned segments unless you force a restart. List orphaned segments before restore. the IDs of orphaned segments are sent to stdio. true" or "attachCollection collection(’myDomain’.2 Administration and Development Guide . 8. true" or "detachCollection collection(’myDomain’. 11. 1. true" Examples: "detachDomain "’myDomain’. page 140. the segment IDs are read in from stdin. true" 9. 1. the segment IDs are read in from stdin. ’backup/myDomain/2009-10’. Queries are allowed only from xPlore administrator. Domain mode CLIs If a domain index is corrupt. use the CLI setDomainMode to set the mode to maintenance. [maintenance|normal[" For example: "setDomainMode ’myDomain’. the default backup location in indexserverconfig. If not specified.[collection_name].0/server/DctmServer_PrimaryDsearch/deploy/dsearch.[state]" Domain state EMC Documentum xPlore Version 1. For file path. Queries from a Documentum client are tried as NOFTDQL in the Content Server.lst’ or purgeOrphanedSegments null Arguments: • [backup_path]: Path to your backup file. In maintenance mode.properties’ " or "listOrphanedSegments ’collection’. Syntax: xplore purgeOrphanedSegments ’[orphan_file_path]’ For example: purgeOrphanedSegments ’c:/temp/orphans. Syntax: "setDomainMode [domain_name].war/ WEB-INF/classes/indexserver-bootstrap. ’maintenance’ " Collection and domain state CLIs Syntax: "setState domain|collection.properties’ " 2.0/server/DctmServer_PrimaryDsearch/deploy/dsearch. ’ c:/temp/orphans.war/ WEB-INF/classes/indexserver-bootstrap.1. • [bootstrap_path]: Path to the bootstrap file in the WEB-INF classes directory of the xPlore primary instance.xml is used: The value of the path attribute on the element admin-config/backup-location. Specify any path as the value of . [domain_name].lst’ ’C:/xplore/jboss5. If orphaned segments are reported before restore. The mode does not persist across xPlore sessions. ’backup/myDomain/default/2009-10’. mode reverts to normal on xPlore restart. Backup and Restore For example: "listOrphanedSegments ’domain’. run the CLI purgeOrphanedSegments. use forward slashes. null. ’C:/xplore/jboss5. If [orphan_file_path] is not specified.1. xPlore does not process them. the only allowed operations are repair and consistency check.2 Administration and Development Guide 159 . ’myDomain’. Perform the following steps: 1. 3.Backup and Restore Set collection_name to null. Start and then stop xPlore after you restore the federation. Valid states: index_and_search (read/write). 160 EMC Documentum xPlore Version 1. null. Example: "setState ’domain’. Start all xPlore instances. Restore the collection. Restore the federation. Stop all xPlore instances. Syntax: "activateSpareNode ’[failed_instance]’. page 35. off_line. Then it is safe to restore the collection. ’myDomain’. You cannot use volume-based backups for domains or collections. see Replacing a failed primary instance. To replace a primary instance. 4. index_only (write only). The update_and_search state changes a flag so that new documents cannot be added to the collection. see Replacing a failed instance with a spare. Existing documents can be updated. As a result. page 35 Note: This CLI can be used only to replace a secondary instance. update_and_search (read and update existing documents only). read_only. ’[spare_instance]’ " Example: "activateSpareNode ’node2’. ’reset’ " or "setState ’collection’. ’off_line’ " Activate spare instance CLI You can install a spare instance that you activate when another instance fails. For instructions on activating the spare using xPlore administrator. The state change is not propagated to subcollections. ’spare1’ " Troubleshooting backup and restore Volume-based backup of domain or collection is not supported Volume-based backup requires change of a domain or collection state to read-only. you can back up an xPlore federation with volume-based backup. 2. ’default’. and reset.2 Administration and Development Guide . Federation and collection restore procedure not followed Restoring a federation and then a collection that was backed up after the federation can lead to data corruption. 2 Administration and Development Guide 161 . Open jconsole in dsearch_home/jdk/bin and specify Remote Process with the value service:jmx:rmi://jndi/rmi://localhost:9331/dsearch.1. If you have specified a base port for the primary instance that is not 9300. (9331 is the default JMX port. first restore the full backup.log in dsearch_home/jboss5.properties. Backup and Restore Incremental restore must come before xPlore restart If you do a full backup and later do an incremental backup. • Check whether the host and port are set correctly in dsearch_home/dsearch/admin/xplore. – Do not put Boolean or null arguments in quotation marks. – Separate each parameter by a comma. the JMX layer is OK.false.0/server/instance_name/logs for a CLI-related message. If you restart the instances before restoring the incremental backup. the restore procedure fails. • Check the Admin web service.null" • Check whether the primary instance is running:http://instancename:port/dsearch • Check whether the JMX MBean server is started. Open a browser with the following link. If the XML schema is shown the web service layer is operative. CLI troubleshooting If a CLI does not execute correctly. check the following: • The output message may describe the source of the error. EMC Documentum xPlore Version 1. – Linux requires double quotes before the command name and after the entire command and arguments. add 31 to your base port. • Check the CLI syntax.) If jconsole opens. Then restore the incremental backup before you restart the xPlore instances. For example:xplore "backupFederation null.http://instancename:port/dsearch/ESSAdminWebService?wsdl • Check dsearch. . that portion of the query is converted to DQL and evaluated in the Content Server database. see the EMC Documentum Content Server DQL Reference.2 Administration and Development Guide 163 . DQL queries are handled by the Content Server query plugin. For more information on FTDQL and SDC criteria. If all or part of the query does not conform to FTDQL. FTDQL queries are generated. Results from the XQuery are combined with database results. DFC 6. which translates DQL into XQuery unless XQuery generation is turned off.6 and higher translates queries directly to XQuery for xPlore. If XQuery is turned off in DFC. EMC Documentum xPlore Version 1. DFC generates XQuery expressions by default. The FTDQL queries are evaluated in the xPlore server. Chapter 9 Search This chapter contains the following topics: • About searching • Administering search • Query summary highlighting • Configuring summary security • Configuring full-text wildcard (fragment) support • Configuring Documentum search • Supporting subscriptions to queries • Troubleshooting search • Search APIs and customization • Routing a query to a specific collection • Building a query with the DFC search service • Building a query with the DFS search service • Building a DFC XQuery • Building a query using xPlore APIs • Adding context to a query • Using parallel queries • Custom access to a thesaurus • DQL Processing About searching Content Server client applications issue queries through the DFC search service or through DQL. your application must specify dates in UTC to match the format in DFTXML. Query operators Operators in XQuery expressions.2 Administration and Development Guide . For information on troubleshooting queries. • DFC: When you use the DFC interface IDfXQuery. Any subpath that can be searched with a value operator must have the value-comparison attribute set to true for the corresponding subpath configuration in indexserverconfig. Administering search Common search service tasks You can configure all search service parameters by choosing Global Configuration from the System Overview panel in xPlore administrator.Search Not all DQL operators are available through the DFC search service. page 90. page 49. If a subpath can be searched by ftcontains. Search terms are not tokenized.In DQL. page 205. Special characters are configurable. xPlore search is case-insensitive and ignores white space or other special characters. set the full-text-search attribute to true in the corresponding subpath configuration in indexserverconfig. see Changing search results security. a DQL search of the Server database returns different results than a DFC/xPlore search. see DQL. Enabling search 164 EMC Documentum xPlore Version 1. The user gets many non-relevant results. A search for ’04’ returns any document modified in April. All other attribute types use value operators (= != < >). For more information on DQL and DFC search.xml. DFC. and DFS queries. For more information. DFC. A search for ’04’ would not hit all documents modified in April. You can configure the same search service parameters on a per-instance basis by choosing Search Service on an instance and then choosing Configuration. r_modify_date must have value-comparison set to true.xml. For example. see Handling special characters. and DQL are interpreted in the following ways: • XQuery operators – The value operators = != < > specify a value comparison search. Then the date attribute is indexed as one token. For information on search security. page 183. dates are automatically normalized to UTC representation when translated to XQuery. – The ftcontains operator (XQFT syntax) specifies that the search term is tokenized before searching against index. see Troubleshooting slow queries. Can be used for exact match or range searching on dates and IDs. an improper configuration of the r_modify_date attribute sets full-text-search to true. The default values have been optimized for most environments. see Search reports. A date of ‘2010-04-01T06:55:29’ is tokenized into 5 tokens: ’2010’ ’04’ ’01T06’ ’55’ ’29’. In some cases. page 242. Therefore. • DQL operators: All string attributes are searched with the ftcontains operator in XQuery. For information on search reports. security is not evaluated for these queries.properties: security_eval.properties) are played. Click Disable (or Enable). • Index warmup of Lucene on-disk files (with suffix . Configuring query warmup Configuring scoring and freshness Configuring query summaries Configuring query warmup You can warm up the index and queries at xPlore startup. By default. • Queries are warmed up from the audit log or from a file (file first. user_name. Viewing search statistics Choose Search Service and click an instance: • Accumulated number of executed queries • Number of failed queries • Number of pending queries • Number of spooled queries • Number of execution plans • Number of streamed results • Maximum query result batch size • Total result bytes • Maximum hits returned by a query • Average number of results per query request • Maximum query execution time • Average duration of query execution • Queries per second Use reports for additional information about queries. set the following properties in query.cfs) loads the stored fields and term dictionaries for all domains. then user queries. All running queries are displayed. The most recent queries from the last N users (last_N_unique_users in query.) Warmup is typically done through the audit log.properties. super_user. By default. Click a query and delete it. security evaluation is done on these queries. The warmup utility on each instances warms up queries that were run on that instance. Queries are logged in the audit record for each xPlore instance. Search Enable or disable search by choosing an instance of the search service in the left pane of the administrator. (It is on by default. page 242. set load_index_before_query to false in query. Canceling running queries Open an instance and choose Search Service. If you need security evaluation.) See Search reports. EMC Documentum xPlore Version 1.2 Administration and Development Guide 165 . To disable index warmup. Search auditing must be turned on to accumulate the report data. Click Operations. File-based queries xPlore can speed up Documentum folder-specific queries. xplore_domain Name of domain in xPlore (usually the same as the Content Server name). queries are read from the audit record. You must change this to a valid domain. it is canceled after this timeout period. For information on editing this file..) 166 EMC Documentum xPlore Version 1. see Modifying indexserverconfig. Warmup activity is logged separately in the audit record and reported in the admin report Audit Records for Warmup Component. xplore_qrserver_port Port for primary xPlore instance.bat" name="script"/> <property value="600" name="timeout-in-secs"/> </properties> </warmup> </performance> Set the child element warmup status to on or off. located in dsearch_home/dsearch/xhive/admin/logs./.log. Configuring warmup Configure warmup in query. user_name Name of user or superuser who has permission to execute warmup queries. This file is in dsearch_home/dsearch/xhive/admin. (Query auditing is enabled by default. Not needed for file-based warmup. This is used only for file-based warmup. query_file Specify a file name that contains warmup queries. super_user Set to true if the user who executes warmup queries is a Content Server superuser.properties. Warmup is logged in the file queryWarmer. Required if security_eval is set to true.Search Search auditing logs for warmup queries. Enabling or disabling warmup Edit indexserverconfig. page 42. If it does not exist. Default: false. Table 20 Auto-warmup configuration Key Description xplore_qrserver_host Primary xPlore instance host. Query can be multi-line. Required if security_eval is set to true. You can set the warmup timeout in seconds.. If no name is specified. Locate the performance element. Restart all xPlore instances to enable your changes. security_eval Evaluate security for queries in warmup.2 Administration and Development Guide . Not needed for file-based warmup./dsearch/xhive/admin/QueryRunner. Used only for file-based warmup. If the warmup hangs. add it with the following content: <performance> <warmup status="on"> <properties> <property value=". but the file cannot contain empty lines. Used only for file-based warmup.xml.xml. 2 Administration and Development Guide 167 . but the file cannot contain empty lines. and subsequently after schedule_warmup_period. load_index_before_query Set to true to warm up the index before any queries are processed.xq is in dsearch_home /dsearch/xhive/admin. batch_size Set batch size for warmup queries. If any execution of the task encounters an exception. Default: dsearch_home/data. Search Key Description query_plan Set to true to include query plans in query warmup. fdx. EMC Documentum xPlore Version 1. Requires query auditing enabled (default). tii • Term frequencies. used with term dictionary: prx schedule_warmup Enables the warmup schedule. Default: 1 data_path Specifies the path to the Lucene index. used with term dictionary: frq • Term position. subsequent executions are suppressed. a value of 1 in DAYS units results in daily warmup. timeout Set maximum seconds to try a warmup query. Default: 0.) Warmup is read first from a file if query_file is specified. cache_index_components Warm up index components.log in dsearch_home/dsearch/xhive/admin/logs..xq) Queries in the query file can be multi-line. read_from_audit Set to true (default) to read queries from audit record. max_retries Set maximum number of times to retry a failed query print_result Set to true to print results to queryWarmer. For example. schedule_warmup_units Valid values: DAYS (default) | HOURS | MINUTES | SECONDS | MILLISECONDS | MICROSECONDS initial_delay Start warmup after the specified initial delay. Valid values: • Security warmup of stored fields: fdt. number_of_unique_users Number of users in audit log for whom to replay queries. index_cache_mb Set the maximum size of index cache in MB. fetch_result_byte Maximum number of bytes to fetch in a warmup query. Sample query file (queries. fnm • Term dictionary for wildcard and full-text queries: tis. page 204. (See Auditing queries. schedule_warmup_period How often the warmup occurs. Default: 10 number_of_queries_per_user Number of queries for each user to replay. The empty sample queries. Default: 100. the hit is ranked higher than when the term occurs in the content. the ranking file defines how the source score (xPlore) and the DFC score are merged. • If the search spans multiple instances. the hit is ranked higher than a document in which the term occurs only once. or fully used without any other influence. You can see this type in the admin report Top N Slowest Queries. This comparison is valid for documents of the same content length. a document with both terms is ranked higher than a document with one term. • If a search term appears in the metadata./dmftinternal/r_object_id’.emc.xml. or thesaurus) can be weighted in indexserverconfig.Search declare option xhive:fts-analyzer-class ’ com.documentum.xhive.index. Configuring scoring and freshness xPlore uses the following ranking principles to score query results: • If a search term appears more than once in a document. declare option xhive:index-paths-values ’ dmftmetadata//owner_name. dmftsecurity/acl_domain. declare option xhive:ignore-empty-fulltext-clauses ’true’. alternative lemma. ftcontains ( ((( ’augmenting’) with stemming)) using stop words ("") ) )) ] order by $s descending return <dmrow>{if ($i/dmftinternal/r_object_id) then $i/dmftinternal/r_object_id else <r_object_id/>}{if ($i/dmftsecurity/ispublic) then $i/dmftsecurity/ispublic else <ispublic/>}{if ($i/dmftinternal/r_object_type) then $i/dmftinternal/r_object_type else <r_object_type/>}{if ($i/dmftmetadata/*/owner_name) then $i/dmftmetadata/*/owner_name else <owner_name/>}{ if ($i/dmftvstamp/i_vstamp) then $i/dmftvstamp/i_vstamp else <i_vstamp/>}{xhive:highlight( $i/dmftcontents/dmftcontent/dmftcontentref)}</dmrow> Warmup logging All the queries that are replayed for warmup from a file or the audit record are tagged as a QUERY_WARMUP event in the audit records. partially used. 168 EMC Documentum xPlore Version 1. The xPlore score could be totally ignored. for $i score $s in collection(’/dm_notes/dsearch/Data’) /dmftdoc[( ( ( dmftinternal/i_all_types = ’030000018000010d’) ) and ( ( dmftversions/iscurrent = ’true’) ) ) and ( (. IndexServerAnalyzer’. • When the search criteria are linked with OR.2 Administration and Development Guide . • Query term source (original term. the results are merged based on score. To view all warmup queries in the audit record. run the report Audit records for warmup component in xPlore administrator. Note: If Documentum DFC clients have a ranking configuration file.indexserver.fulltext.dmftsecurity/acl_name.core.core. Terms from the query and thesaurus expansion are highlighted in search results summaries. Only documents that are six years old or less have a freshness factor. a hit in the keywords metadata increases the score for a result: <sub-path returnable="true" boost-value="2. Configure weighting for query term source: original term. This expansion takes place before the query is tokenized. The Lucene scoring details are logged when xDB logging is set to DEBUG.7 clients with the latest hotfix at the time xPlore 1. The value can range from 0 to 1000: <property name="query-original-term-weight" value="1. Set the value of the property freshness-weight in index-config/properties .0"/> 6. to a decimal between 0 (no boost) and 1. For example.75" /> 5. when a user searches for car. Linguistic analysis of all the terms that are returned. or thesaurus. page 42. In the following example. xPlore expands search terms in full-text expressions to similar terms. When you provide a thesaurus. regardless of language. See Configuring logging. For information on viewing and updating this file. Change the freshness boost factor. edit indexserverconfig. page 249. is based on the query locale.0" path="dmftmetadata/keywords"/> 3. By default the Documentum attribute r_modify_date is used to boost scores in results. Edit indexserverconfig. For example: <index-config><properties> . A thesaurus can have terms in multiple languages.0"/> <property name="query-alternative-term-weight" value="1. <property name="enable-subcollection-ftindex" value="false"/> <property name="freshness-weight" value="0. a thesaurus expands the search to documents containing auto or vehicle. Add a boost-value attribute to a subpath element.2 Administration and Development Guide 169 .xml. The thesaurus is not used for DFC metadata searches unless you use a DFC API for an EMC Documentum xPlore Version 1.0 (override the Lucene relevancy score).. 1. Edit the following properties in search-config/properties. To remove this boost.2 is released. 2.. To apply changes to an existing index. Thesaurus support is available for DFC 6. <property name="enable-freshness-score" value="false" /> </properties></category> 4. By default. Restart the xPlore instances. alternative lemma.xml and set the property enable-freshness-score to false on the parent category element. <category name=’dftxml’><properties> .0"/> <property name="query-thesaurus-term-weight" value="1. reindex your documents . see Modifying indexserverconfig...0. they are equally weighted.xml. Change the following settings before you index documents. The weight for freshness is equal to the weight for the Lucene relevancy score. Search These ranking principles are applied in a complicated Lucene algorithm. The default boost-value is 1. Adding a thesaurus A thesaurus provides results with terms that are related to the search terms. documents are returned that contain words like canals. the thesaurus is used for both search document contains (SDC) and metadata searches. When a user searches for canals.2 Administration and Development Guide .org/aos/concept#25695"> <skos:prefLabel xml:lang="fr">Charrue à socs</skos:prefLabel> <skos:prefLabel xml:lang="es">Arados de vertedera</skos:prefLabel> </skos:Concept> Importing and viewing a thesaurus You can view a list of thesauruses for each domain using xPlore administrator. 170 EMC Documentum xPlore Version 1. See Custom access to a thesaurus. For DQL queries.com/#canals"> <skos:prefLabel>canals</skos:prefLabel> <skos:altLabel>canal bends</skos:altLabel> <skos:altLabel>canalized streams</skos:altLabel> <skos:altLabel>ditch mouths</skos:altLabel> <skos:altLabel>ditches</skos:altLabel> <skos:altLabel>drainage canals</skos:altLabel> <skos:altLabel>drainage ditches</skos:altLabel> </skos:Concept> In this example. page 219. The SKOS format supports two-way expansion.fao. </skos:Concept> </rdf:RDF> Terms from multiple languages can be added like the following example: <skos:Concept rdf:about="http://www. FAST-based thesaurus dictionaries must be converted to the SKOS format. The list shows a thesaurus URI. but it is not implemented by xPlore. The alternative labels expand the term (the related terms or synonyms).org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www. Import your thesaurus to the file system on the primary instance host using xPlore administrator.. The thesaurus must be in SKOS format.my. Open Diagnostic and Utilities > Thesaurus. You can also provide a non-SKOS thesaurus by implementing a custom class that defines thesaurus expansion behavior. a search on ditch does not return documents with canals.w3. Choose the domain (Documentum Content Server repository) for the thesaurus. and the date it was imported..w3. and canalized streams. SKOS format The format starts with a concept (term) that includes a preferred label and a set of alternative labels.Search individual query. Here is an example of such an entry in SKOS: <skos:Concept rdf:about="http://www. indicates a default thesaurus. canal bends. a W3C specification.org/2004/02/skos/core#"> <skos:Concept . the main term is canals. An SKOS thesaurus must use the following RDF namespace declarations: <rdf:RDF xmlns:rdf="http://www. then browse to a local thesaurus or enter a URI to the thesaurus. Search To install a thesaurus. and the thesaurus is bypassed. Set the value of the dm_ftengine_config property thesaurus_search_enable. EMC Documentum xPlore Version 1. These APIs are in IDfSimpleAttrExpression and IDfFulltextExpression. void setThesaurusEnabled(Boolean thesaurusSearchEnabled). The following iAPI commands enable thesaurus support.dm_ftengine_config append.c. A special multi-path index is defined in indexserverconfig. The String argument for setThesaurusLibrary is a URI to the thesaurus that you have imported.c. The multi-path index allows case-insensitive and space-insensitive lookups in an SKOS dictionary.c. The thesaurus is stored in an xDB library under the domain. delete it in xPlore administrator and reimport it.param_name use_thesaurus_on_phrase append.l.xml for the SKOS format to speed up thesaurus probes. for example.param_name thesaurus_search_enable append.l.c.2 Administration and Development Guide 171 .c.l. You can import additional thesauruses for the domain and specify them in DFC or DQL queries.param_value true save. root-library/my_domain/dsearch/SystemInfo/ThesaurusDB.l.c.param_value true save. They override the default settings in dm_ftengine_config.l Enabling phrase search in a thesaurus You can configure the fulltext engine to match entire phrases against the thesaurus.dm_ftengine_config append. To modify your thesaurus. The global thesaurus setting does not enable metadata search in the thesaurus. By default. choose the domain (Documentum Content Server) for the thesaurus. retrieve. Use the following APIs to enable thesaurus use and to use a custom thesaurus in the DFC search service.c. void setThesaurusLibrary(String thesaurusLibrary) The following example enables thesaurus expansion for the metadata object_name. xPlore registers this library in indexserverconfig. phrase queries must be an exact match.xml as the category thesaurusdb. The Content Server query plugin and the DFC search service read these values . Enabling thesaurus support in a Content Server Enable thesaurus support for Documentum repositories by editing the fulltext engine object configuration for each Content Server. The following iAPI command instructs xPlore to match phrases in the thesaurus: retrieve.c. Choose true for Default if this is the default thesaurus for the domain.l DFC thesaurus APIs Update DFC in the client application to the latest patch. ftcontains "programming language" with stemming using stop words default using thesaurus default Logging thesaurus actions Set the log level to DEBUG in xPlore administrator: Services > Logging > dsearchsearch.addSimpleAttrExpression( "object_name". for example: object_name ftcontains "IIG" with stemming using stop words default using thesaurus default entire content The following query enables thesaurus search with a phrase: . To override use_thesaurus_enable when it set to false. For example: select object_name from dm_document search document contains ’test’ enable(ft_thesaurus_search) To specify a specific thesaurus in a DQL query. IDfValue. use the hint ft_use_thesaurus_library. select object_name from dm_document search document contains ‘test’ enable(ft_thesaurus_search. use the DQL hint ft_thesaurus_search.DF_STRING..com/testenv/skos. To enable a thesaurus. <message ><![CDATA[successfully verified existence of thesaurus URI: http://search.setThesaurusEnabled(true). Setting thesaurus support in DQL queries For all DQL queries. false. which takes a string URI for the thesaurus.emc. ft_use_thesaurus_library (’ http://search.rdf’)) Setting thesaurus properties in xQuery expressions xPlore supports the W3c XQuery thesaurus option in the specification. IDfSearchOperation. maxValueLevel.com/testenv/skos. the thesaurus is used only for full-text (SDC) queries and string metadata queries using the like or equals operator. If thesaurus search is enabled.2 Administration and Development Guide .emc. relationship. "IIG"). false. aSimpleAttrExpr. For example:<message > 172 EMC Documentum xPlore Version 1. The following example overrides the thesaurus setting in dm_ftengine_config because it adds the ft_thesaurus_search hint.SEARCH_OP_CONTAINS.Search rootExpressionSet. add using thesaurus default to the xQuery expression.rdf]]></message> . minValueLevel.com/myDomain/myThesaurus. then the thesaurus is used for DQL queries.. For example:<message ><![CDATA[attempt to verify existence of thesaurus URI: http://search.emc. If use_thesaurus_enable is set to true in dm_ftengine_config. use only the ft_use_thesaurus_library hint. The following information is logged: • Thesaurus is found from the specified URI.rdf]]></message> • Argument values that are passed into getTermsFromThesaurus: input terms. "uri") = " http://search.. For example:<![CDATA[thesaurus lookup xquery: declare option xhive:fts-analyzer-class ’com. relationship null.emc.com/testenv/skos. declare variable $terms external. doc(’/testenv/dsearch/SystemInfo/ThesaurusDB’)[xhive:metadata( . Employee vacations.indexserver. For example:<![CDATA[thesaurus lookup execution plan: query:6:1:Creating query plan on node /testenv/dsearch/SystemInfo/ThesaurusDB query:6:1:for expression ..org/2004/02/skos/core#’. declare namespace rdf = ’ http://www..org/2004/02/skos/core#}altLabel/child::text() query:6:1:Using query plan: query:6:1:index(Concept) [parent::{http://www. For example:<![CDATA[related terms from thesaurus lookup query [Absence from work. declare namespace skos = ’http://www.rdf"]]]/child:: {http://www.2 Administration and Development Guide 173 .. <![CDATA[Total tokens count for reader: 1]]> • Query plan for thesaurus XQuery execution..w3.w3.rdf"]/child:: {http://www.w3.org/2004/02/skos/core#}altLabel/child::text() ]]>. IndexServerAnalyzer’.core. "uri") = " http://search. The query term leaVe is rendered case-insensitive:><![CDATA[executing the thesaurus lookup query to get related terms for [leaVe]]]></message> . declare variable $relation external.core.w3.w3. <![CDATA[Returned token: leave]]> ..com/testenv/skos.[xhive:metadata(. Leave from work. Leave of absence.emc. contains text terms@0]]/child:: {http://www. Holidays from work.org/1999/02/22-rdf-syntax-ns#’. minLevelValue -2147483648.index.emc. Sick leave]]]> EMC Documentum xPlore Version 1. Provide the query plan to technical support if you are not able to resolve an issue.documentum.w3. • Related terms that are returned from the thesaurus. Search <![CDATA[calling getTermsFromThesaurus with terms [leaVe].fulltext. Absenteeism.emc.org/1999/02/22-rdf-syntax-ns#}RDF/child:: {http://www..’uri’)=’http://search.. Maternity leave.xhive.rdf’]/rdf:RDF/skos: Concept[skos:prefLabel contains text {$terms} entire content]/ skos:altLabel/text()]]> • Tokens that are looked up in the thesaurus.w3. Annual leave.w3.. maxLevelValue 2147483647]]></message> • Thesaurus XQuery statement.com/testenv/skos.org/2004/02/skos/core#}prefLabel [.org/1999/02/22-rdf-syntax-ns#} RDF[parent::document()[xhive:metadata(.org/2004/02/skos/core#} Concept[child::{http://www. properties.1. Restart xPlore and run the query again. Make sure that your thesaurus is used. Unselective queries can require massive processing to produce summaries. the summary is reprocessed for highlighting. causing a second performance impact.com. After the summary is computed.xhive. the log records a query like the following: for $i score $s in collection(’/testenv/dsearch/Data’) /dmftdoc[.com. For example: for $i score $s in collection(’/testenv/dsearch/Data’) /dmftdoc[.1.com/skos’] order by $s descending return $i/dmftinternal/r_object_id If the default thesaurus on the file system is used.properties.0/server/DctmServer_PrimaryDsearch/deploy/dsearch.index.emc. 174 EMC Documentum xPlore Version 1.logger. Set log4j.multipath.index.multipath.logger. Static summaries are computed when the summary conditions do not match the conditions configured for dynamic summaries. Configuring query summaries • Query summary highlighting. To view the default and URI settings. click the Metadata tab. Change log4j. This file is located in dsearch_home/jboss5. Navigate to the /xhivedb/root-library/<domain>/dsearch/SystemInfo/ThesaurusDB library. ftcontains ’food products’ using thesaurus at ’http://www.log is in dsearch_home/jboss5.log. For all documents in which an indexed term has been found.war/WEB-INF/classes.0/server/DctmServer_PrimaryDsearch/logs. View the URI in the xDB admin tool or the thesaurus list in xPlore administrator. This file is located at dsearch_home/jboss5. Static summaries are much faster to compute but less specific than dynamic summaries. page 176 The indexing service stores the content of each object as an XML node called dmftcontentref. page 42.xml. You can disable dynamic summaries for better performance.query to DEBUG in log4j. Some summary configuration requires editing of indexserverconfig. ftcontains ’food products’ using thesaurus default] order by $s descending return $i/dmftinternal/r_object_id You can view thesaurus terms that were added to a query by inspecting the final query. See Modifying indexserverconfig.xhive. This query is different from the original query because it contains the expanded terms (alternate labels) from the thesaurus.0/server/DctmServer_PrimaryDsearch/deploy/dsearch. Search terms are highlighted in the summary. Troubleshooting a thesaurus Make sure your thesaurus is in xPlore. in dsearch. The query is in the xDB log as “generated Lucene query clauses”. The summary is a phrase of text from the original indexed document that contains the searched word. Dynamic summaries have a performance impact. You can view thesauri and their properties in the xDB admin tool.xml as noted. Search for generated Lucene query clauses. Compare this URI to the thesaurus URI used by the XQuery. xdb.war/WEB-INF/classes. xPlore retrieves the content node and computes a summary.query = DEBUG in log4j.1. Compare the specified thesaurus URI in the XQuery to the URI associated with the dictionary. page 175 • Configuring summary security.2 Administration and Development Guide .Search You can also inspect the final Lucene query. fuzzy search. a static summary of the specified length from the beginning of the text is displayed. Set query-summary-display-length (default: 256 characters around the search terms). Search terms can be found in many places in a document. The first n characters of the document are displayed. choose Services > Search Service and click Configuration. c. for faster performance. set this value lower.xml. Configure dynamic summaries. a static summary is returned and term hits are not highlighted. if the search term is ran*. Requires xPlore native security. Wildcard search terms are also highlighted. Configure the maximum number of results that have a dynamic summary. Configure the maximum size of content that isevaluated for a dynamic summary. a. b. With native xPlore security and the security_mode property of the dm_ftengine_config object set to BROWSE. Configure general summary characteristics. Configure the number of characters at the beginning of the document in which the query term must appear. edit indexserverconfig. for example.2 Administration and Development Guide 175 . For faster summary calculation. set this value to a positive value. The default: is -1 (all documents).The default value is 65536 (64K). Highlighting does not preserve query context such as phrase search. . b. Search 1. or range search. AND search. where n is the value of the parameter query-summary-display-length. including lemmatized terms. Set category-definitions/category/elements-for-static-summary. The max-size attribute sets the maximum size of the static summary. If most users do not go beyond the first page of results. but this value negatively impacts performance. the user must have at least READ permission. If you want to turn off dynamic summaries in xPlore administrator. . Set the value of max-dynamic-summary-threshold (default: 50). If the query term is not found in this snippet. If no search term is found. are highlighted within the summary that is returned to the client search application. then the word rant in metadata is highlighted. A value of -1 indicates no maximum content size. set this value to the page size. a. 10. NOT search. Query summary highlighting The search terms.xml. To configure the size of a summary fragment. Default: 65536 (bytes). Set the value of the token-size attribute on the category-definitions/category/do-text-extraction/save-tokens-for-summary-processing element. For example. Each search term in the original query is highlighted separately. allowing up to 8 fragments for a display length of 256: <property value="32" name="query-summary-fragment-size"/> 2. Additional results have a static summary. 3. Set the maximum as the value of the extract-text-size-less-than attribute on the categordefinitions/category/do-text-extraction/save-tokens-for-summary-processing element. Dynamic summaries require much more computation time than static summaries. Configure static summaries in indexserverconfig. To configure number of characters displayed in the summary. Larger documents return a static summary. Set query-enable-dynamic-summary to false. and no terms are highlighted. Specify the elements (Documentum attributes) that are used for the static summary. EMC Documentum xPlore Version 1. The following example changes the default to 32. choose Services > Search Service in xPlore administrator. Add a property for fragment size to search-config/properties (default 64). c. For faster summary calculation. you can turn off lemmatization for indexing and queries. The XQFT default is without stemming. • SUMMARY_BASED (default): If SUMMARY is not in the select list. If SUMMARY is selected. Lemmatization in DQL and DFC queries The default is with stemming. they see only results for which they have READ permission.c. causing a second performance impact. the summary is reprocessed for highlighting. use a phrase search.dm_ftengine_config append. Use one of the following values: • BROWSE: Displays all results for which the user has at least BROWSE permission.param_value READ save. change the security_mode property of the dm_ftengine_config object.c.l.Search Dynamic summaries have a performance impact.c. By default.l Configuring query lemmatization xPlore supports search for similar or like terms. by default. You can also configure a 176 EMC Documentum xPlore Version 1. You can add a property that limits the number of results. Configuring summary security By default. You can turn on or off lemmatization of individual queries by using the XQFT modifiers with stemming or without stemming. the summary is blank. The following iAPI example sets the summary mode to READ: retrieve. To speed indexing and search performance. displays results for which the user has at least READ permission. also known as lemmatization. displays all results for which the user has at least BROWSE permission .param_name security_mode append. there is no limit. page 87.l. If the user has BROWSE permission. The DFC default is with stemming except for a phrase search. DFC 6. To modify the permissions applied to FTDQL and non-FTDQL search summaries. See Configuring lemmatization. Limiting search results You can configure a cutoff that silently stops searching after a configurable limit is reached.c. Unselective queries can require massive processing to produce summaries.7 produces XQuery syntax. Lemmatization in XQueries (including DFC 6. To turn off stemming in DQL or DFC queries.7). If SUMMARY is in the select list. • READ: Displays all results for which the user has at least READ permission.2 Administration and Development Guide . users see search results for which they have BROWSE permission if SUMMARY is not selected. After the summary is computed. ratioOfMaxDoc: Sets the maximum term frequency in the index. • search document contains (SDC) wildcards: * and ?. With FAST compatibility. This provides faster.termsExpandedNumber: Sets the cutoff. and not in. a search for begins with foot* returns a document with the name football. In simple search. Wildcard or fragment search is not performed in a full-text search. Wildcards in DQL By default. Phrase searches and diacritics Wildcards in a phrase search are removed. • xdb. Wildcards are not added to queries for the operators does not contain. or contains. To bypass XQuery. For example. you may wish to turn off diacritics removal. the term is not found because the normalized form is ole.properties in the directory WEB-INF/classes of the primary instance. word fragments in metadata are not matched by the wildcard (*). you must turn off XQuery generation in DFC to use DQL. A search for *nd would reject hits for and because it occurs in more than half of the index. page 178. Fragment search support can be turned on in xPlore. • xdb. implicit wildcards are added to constraints with the operator begins with or ends with.2 Administration and Development Guide 177 . for example.) For example. For example. EMC Documentum xPlore Version 1. For example.lucene. if the ratio is 0. In advanced search.generation. olé*. so that common terms that appear often (like stop words) are dropped from the search. ends with. Diacritics are removed from the original indexed term. explicit wildcards are supported in metadata queries when the user selects equals. Metadata (properties) search In DFC clients like Webtop and CenterStage.lucene.properties on the DFC client application: dfc. begins with. DQL is transformed into XQuery by DFC. Open xdb. If the user performs a wildcard search for a term containing a diacritic. the query does not search for terms that occur in more than half of the documents in the index.xquery. Default: 1000. a search for foot* in Webtop simple search returns a document containing football. Full-text search xPlore does not search for word fragments in the text of documents unless you configure FAST wildcard compatibility. When you turn on FAST compatibility. a wildcard is treated as a literal (*) and not as a wildcard. Search frequency limit. a search for foot* in Webtop simple search does not return a document containing football. a simple search searches content as well as metadata that is full-text indexed. A search for begins with foot returns a document with the name football. Without FAST compatibility. but it can cause slower performance in an unselective query or advanced search for ends with. Only a document containing foot is returned. (Term frequency is recorded during indexing. All other characters are treated as literals. does not equal (<>). a search for “dogs*cats” becomes dogs cats. Wildcards and fragment search The default behavior in xPlore matches that of commonly used search engines. in.enable=false The following DQL wildcards are supported.5 or higher. See Configuring full-text wildcard (fragment) support. more precise search results than the fragment search of the FAST index server.search. Add the following setting to dfc. In this case. c. Use the object ID to get the parameters and values.c. DQL queries that contain the DQL hint FT_CONTAIN_FRAGMENT in the where clause match fragments instead of whole words. Note: In addition. If your users are accustomed to wildcard searches to find a fragment of a word. To change an existing parameter. 1.c. Edit the dm_ftengine_config object in the Content Server. page 179. a.l. which is not supported in xPlore.c. If there is no fast_wildcard_compatible. “dogs*cats” in a phrase matches dogslovecats. retrieve.) Add the param_value element and set it to true.param_value true save. use set as follows: retrieve.c.dm_ftengine_config append.l.c. For example. Note: Query performance can be negatively affected when fast_wildcard_compatible is set to true. Turn on fragment support globally in the Content Server.param_name fast_wildcard_compatible append.Search • Where clause wildcards: %. param_value from dm_ftengine_config where r_object_id=’080a0d6880000d0d’ c. The FAST indexing server supports word fragment searches (leading and trailing wildcards) in metadata and SEARCH DOCUMENT CONTAINS (SDC) full-text queries. Applies to FTDQL SDC queries and DQL where clauses. If you see the parameter fast_wildcard_compatible with a value of true.c. Using iAPI in Documentum Administrator or Server Manager. You can configure a cutoff to limit runaway queries in fragment search.dm_ftengine_config dump. and *. Configuring full-text wildcard (fragment) support By default. the value of r_object_id that was returned in step 1 is used to get the parameter. append a param_name parameter with the name fast_wildcard_compatible.c. first get the object ID: retrieve. fragment support is turned on.dm_ftengine_config b.l d. In the following example. ?.l.2 Administration and Development Guide . See Wildcard search in metadata. For example.c. xPlore searches for fragments and matches wildcards (*) only in metadata search and not in the text of a document. In a DQL phrase search. locate the position of the param_name attribute value of fast_wildcard_compatible.param_value[i] //position of fast_wildcard_compatible false 178 EMC Documentum xPlore Version 1.l //locates the position set. (This parameter replaces the parameter fds_contain_fragment. you can set fast_wild_compatible to true in the Server object dm_ftengine_config.select param_name. fragments are matched. a search for com* finds documents containing committee or coming. Wildcard search in metadata By default. a metadata search in DFC clients like Webtop and CenterStage does not search for word fragments unless you explicitly use a wildcard (*). the character * is a wildcard in fast_wildcard_compatible mode. Add the following setting to dfc.generation. because the entire index must be probed. you add a sub-path entry to indexserverconfig.l e. the Documentum object name supports leading wildcards. page 176. football.enable=false When fast_wildcard_compatible is true. turn off XQuery generation and use the DQL hint ft_contain_fragment. only foot is matched.xml like the following: <sub-path leading-wildcard="true" compress="false" boost-value=" 1. When it is set to false. so you search for *Atlantic. By default. To remove an existing parameter. They also behave differently from wildcards in FAST queries.param_name[i] //position of fast_wildcard_compatible remove. You are not sure which division the customer is assigned to. like SrcAtlanticSuppliers or ResAtlanticDredgers. you can configure wildcard support in metadata without turning on full-text fragment support. For example.xquery. By default. Note: Leading wildcards perform the worst of all wildcard searches. configure a cutoff for results. In a full-text DFC or DQL search.param_value[i] save. Search save. Note: Query performance can be negatively affected when fast_wildcard_compatible is set to true.l.properties on the DFC client application: dfc.l 2. Use this configuration only when performance testing has been performed. * is a wildcard in metadata search but not in full-text search.2 Administration and Development Guide 179 . your object model has a custom attribute customer that has some variable letters before the customer name. For cutoff configuration. The cutoff applies to search results for all wildcard queries except queries with a trailing wildcard. EMC Documentum xPlore Version 1.c.c. For better performance. use remove instead of set.0" include-descendants="false" returning-contents="true" value-comparison=" true" full-text-search="true" enumerate-repeating-elements="false" type="string" path="dmftmetadata//customer"/> Wildcard behavior in xPlore and FAST Wildcards like * behavior differently depending on whether they are used in DFC or DQL queries. and more. see Limiting search results.c. To support this custom metadata leading wildcard. Configure the metadata’s sub-path element to have a leading-wildcard attribute value of true.c. However.search. The fast_wildcard_compatible property was introduced in dm_ftengine_config to support search for word fragments (FAST wildcard behavior). DFC full-text queries match foot* with footstep. Turn on fragment support in individual queries: In a DFC application. For example: remove.l. ..c. XQuery generation is not turned off. cart.c. cart. Change the allowed similarity between a word and similar words. .select param_name.. use append like the following: retrieve. The supported distance between terms can be configured. param_value from dm_ftengine_configwhere r_object_id=dm_ftengine_config_object_id 3. Use iAPI. cars. use iAPI. terms are identical). Table 21 fast_wildcard_compatible = false Full-text Metadata (contains. Set a value between 0 (terms are different by more than one letter) and 1 (default=1. cars. car* –> car. car*off –> car off (not carry off) Table 22 fast_wildcard_compatible = true xPlore Full-text xPlore Metadata FAST car* –> car.c.l. To add a parameter using iAPI in Documentum Administrator. cart. This feature can be disabled if necessary. car* –> car..dm_ftengine_config append. The documents that are returned contain the string denoted as –>. Use the object ID to get the parameters: ?. cars.. Fuzzy search is supported by calculating the similarity between terms using a Lucene algorithm. If the fuzzy_serach_enable parameter does not exist. cart.param_name fuzzy_search_enable append. Fuzzy search is not applied when wildcards are present. Configuring fuzzy search Fuzzy search can find misspelled words or letter reversals.Search Use the following tables to determine how a wildcard search behaves in DFC and DQL with the fast_wildcard_compatible property set to false (default) or true. The examples use the search term car*. returned by this API command: retrieve. begins|ends with) car* –> car (match word) car* –> car. set the property fuzzy_search_enable in the dm_ftengine_config object to false..c. Check your current dm_ftengine_config parameters.. Fuzzy search in full-text (not properties) is not enabled by default.l 4. or DFC to check the dm_ftengine_config object.. . Fuzzy search in DFC and DFS 180 EMC Documentum xPlore Version 1.l.dm_ftengine_config 2. . This default also applies to custom fuzzy queries in DFC and DFS for full-text and properties. set he parameter default_fuzzy_search_similarity in dm_ftengine_config. DQL.c. or DFC to modify the dm_ftengine_config object.2 Administration and Development Guide . To disable fuzzy search.param_value true save.c. 1. . First get the object ID. cars. DQL. Configuring Documentum search • Search engine configuration (dm_ftengine_config). page 181 • Making types and attributes searchable. page 178. and DFS queries. Sets security evaluation in xPlore. Specifies the maximum number of folder IDs to cache in the index probe for a folder descend query. Value of 0 sets evaluation in the Content Server. DFC. DOES_NOT_CONTAIN. Search Fuzzy searchSet fuzzy search in individual full-text and property queries with APIs on IDfFulltextExpression and IDfSimpleAttrExpression. page 49.dm_ftengine_config .select param_name. Add a missing parameter using iAPI append like the following: EMC Documentum xPlore Version 1. ftsearch_security_mode: Default: 1. security_mode: Sets the security when summaries are displayed. fast_wildcard_compatible (replaces the parameter fds_contain_fragment): Default: false. DQL. page 183 • Routing a query to a specific collection. Increase the cache size if folder descend queries are consistently slow or slow the first time. To view existing parameters using iAPI in Documentum Administrator: • First get the object ID:retrieve.2 Administration and Development Guide 181 . The following settings affect query processing. 3.c. 6. 5. Sets the number of results fetched from xPlore in each batch. <dm_ftengine_config_object_id> • Use the object ID to get the parameters:?. See Changing search results security. param_value from dm_ftengine_config where r_object_id=dm_ftengine_config_object_id 1. and EQUALS for String object types: • setFuzzySearchEnabled(Boolean fuzzySearchEnabled) • setFuzzySearchSimilarity(Float similarity): Sets a similarity value between 0 and 1. folder_cache_limit: Default: 2000. Sets fragment search option. A high value combined with low memory increases I/O and slows the response time. page 184 • Tracing Documentum queries.c. See Configuring full-text wildcard (fragment) support. page 182 • Folder descend queries. page 183 • DQL. Overrides the value of the parameter default_fuzzy_search_similarity in dm_ftengine_config. If you change them.. 4. Use the operators CONTAINS. 2.. Change them only if the Content Server or xPlore environment changes. page 185 Search engine configuration (dm_ftengine_config) The Content Server query plugin settings are set during installation. or DFC. you do not need to restart the Content Server. use iAPI. dsearch_result_batch_size: Default: 200. To check your current dm_ftengine_config settings. page 208 • Changing VQL queries to XQuery expressions. Use CREATE TYPE and ALTER TYPE FULLTEXT SUPPORT switches to specify searchable attributes. Setting indexable formats Properties of the format object determine which formats are indexable and which content files in indexable formats are indexed. Content and properties are indexed when a Save.l.l.c. you can ensure indexing by creating a rendition in an indexable format. Saveasnew.c.param_value false save.c. To remove an existing parameter.2 Administration and Development Guide . If the primary content of an object is not in an indexable format. see Documentum System Search Development Guide. see Documentum Content Server DQL Reference Manual. Checkin. For information on supporting extended search with LWSOs. use remove instead of set.param_value true save.l. Webtop does not display them in the search UI.dm_ftengine_config append. For more details. This redundant information is shared among the LWSOs from the shared parent object. or Prune operation is performed on the object. Users with Sysadmin or Superuser privileges can change the a_full_text setting. the client application must configure searchable attributes. Lightweight sysobjects (LWSOs) Lightweight sysobjects group the attribute values that are identical for a large set of objects. If is_searchable is false for a type or attributes.param_name fast_wildcard_compatible append.l 8.Search retrieve. Making types and attributes searchable You can create or alter types to make them searchable and configure them for full-text support. Aspects 182 EMC Documentum xPlore Version 1. For LWSOs like dm_message_archive.l. Destroy.c. use set like the following: retrieve.c. the content file is indexable. For more information on this configuration.c. see Documentum System Search Development Guide. The client application must read this attribute.dm_ftengine_config set. If the value of the can_index property of a content file format object is set to true. Allowing indexing The a_full_text attribute is defined for the dm_sysobject type and subtypes (default: true). Branch. Valid values: 0 (false) and 1 (true).c.c. Allowing search Set the is_searchable attribute on an object type to allow or prevent searches for objects of that type and its subtypes (default: true). To change an existing parameter.l 7. When a_full_text is false. only the properties are indexed.param_name fast_wildcard_compatible set. page 184. • The search predicate is unselective but the folder constraint is selective. For more information on this statement. then folder IDs are pushed into the index probe. If the folder descend condition evaluates to less than the folder_cache_limit value. EMC Documentum xPlore Version 1. the folder constraint is evaluated separately for each result. For more information on these interfaces and services. Search Properties associated with aspects are not indexed by default. and a large portion of them are empty. see Documentum Content Server DQL Reference Manual. Any part of the query that is not full-text compliant (NOFTDQL) is evaluated in the Content Server database. • Many folders and low memory capacity. DQL. Increase folder_cache_limit in the dm_ftengine_config object. If your application issues DQL queries. Decrease folder_cache_limit in the dm_ftengine_config object. DFC filter to transform a query or results. and DFS queries DFC-based client applications use the DFC query builder package to translate a query into an XQuery statement. or slow the first time but faster the next time because they are cached. Table 23 Differences between DQL and DFC/DFS queries DQL DFC and DFS No latency: attributes are evaluated from the database Latency between Content Server and xPlore No latency for security evaluation Latency for security but faster search results No VQL equivalent Extended object search (VQL-type support). You can also add more memory for the xDB cache and the xPlore host. DFC. See Changing VQL queries to XQuery expressions. The following conditions can degrade query performance: • Many folders. Folder descend queries are consistently slow. This environment causes high I/O and slow response time.2 Administration and Development Guide 183 . Decrease folder_cache_limit in the dm_ftengine_config object. See Documentum System Search Development Guide. see Documentum System Search Development Guide. use an ALTER ASPECT statement to identify the aspects you want indexed. and the results are combined with results from the XQuery. Folder descend queries Folder descend query performance can depend on folder hierarchy and data distribution across folders. Set the folder_cache_limit in the dm_ftengine_config object to the expected maximum number of folders in the query (default = 2000). DFS similarly generates XQuery statements. If you wish to index them. If the condition exceeds the folder_cache_limit value. the Content Server query plugin for xPlore translates the DQL into an XQuery expression. No facets Facet support No hit count Hit count No filter support unless DFC customized Configurable filters in index agent. generation. or the DFC interface IDfXQuery.xml. using XQuery or the DFC interface IDfXQuery.xquery. • Denormalize the relationship of a document to other objects or tables.xml. For information on viewing and updating this file. they must be reindexed to include the XML content. The DFC query builder adds the DQL hint TRY_FTDQL_FIRST. The hints file allows you to specify certain conditions under which a database or standard query is done in place of a full-text query. you can rewrite some VQL queries to XQuery equivalents. see VQL and XQuery Syntax Equivalents. If your documents containing XML have already been indexed. 184 EMC Documentum xPlore Version 1. Turn off XQuery generation by adding the following setting to dfc. For a table of VQL examples and their equivalents in XQuery expression. For information on using a hints file. Hints file not supported unless XQuery generation is turned off Fragment search supported Fragment search in full-text not supported unless XQuery generation is turned off. change index-as-sub-path to true on the xml-content element in indexserverconfig.search. Older Documentum applications supported zone searches with Verity Query Language (VQL). page 299. Enabling DQL hints (turn off XQuery) The Webtop search components use the DFC query builder package to construct a query.properties on the DFC client application: dfc. FAST indexing did not support VQL queries. disable XQuery generation by DFC or DFS. XQuery.enable=false Changing VQL queries to XQuery expressions By default. see Documentum System Search Development Guide. • Perform boolean searches using DQL. such as email attachments.2 Administration and Development Guide . see Modifying indexserverconfig. the XML content (elements and attributes) of an input document is not indexed. Fragment search in metadata can be configured. The query builder also bypasses lemmatization by using a DQL hint for wildcard and phrase searches. XQuery. This hint prevents timeouts and resource exceptions by querying the attributes portion of a query against the repository database.Search DQL DFC and DFS Hints file supported. page 42. or the DFC interface IDfXQuery. With xPlore or DFC APIs. • Perform structured searches of XML documents using XQuery or the DFC interface IDfXQuery. To support searching on XML content or attribute values. • Join different objects using DQL (NOFTDQL). To use a DQL hints file or hints in a DQL query. You can trace subsystems with one of the following values: • all Traces everything (sum of cs. • ftengine Traces back-end operations: HTTP transactions between the query plugin and xPlore.fulltext.all Tracing in the log Trace messages are written to $DOCUMENTUM/dba/log/fulltext/fttrace_<repository_name>. tracing is session-specific.VALUE.c.S.log in the logs subdirectory of the JBoss deployment directory.log.log). • ftplugin Traces the query plugin front-end operations such as DQL translation to XQuery.MODIFY_TRACE.S. • none Turning tracing on or off Use the iAPI command to turn off tracing: apply. so that you can find the translated query in the xPlore fulltext log ($DOCUMENTUM/dba/log/fullext/fttrace_repository_name.none Use the iAPI command to turn on tracing: apply.2 Administration and Development Guide 185 .SUBSYSTEM. • cs Traces Content Server search operations such as initializing full-text in-memory objects and the options used in a query. • The XQuery that was translated from DQL • The request and response streams.S. the result stream returned from xPlore.MODIFY_TRACE.fulltext.NULL. calls to the back end. and fetching of each result.SUBSYSTEM.NULL. The log entry contains the following information: • Request query ID. On UNIX and Linux. this command controls tracing for all sessions. to diagnose communication errors or memory stream corruption • dm_ftengine_config options • The query execution plan is recorded in the xplore log dsearch.S.c. ftplugin.VALUE. Search Tracing Documentum queries You can trace queries using the MODIFY_TRACE apply method. Supporting subscriptions to queries About query subscriptions EMC Documentum xPlore Version 1. There are four possible levels of tracing queries in Documentum environments. and the query execution plan. and ftengine). On Windows. the request stream sent to xPlore. Search Installing the query subscription DAR QuerySubscriptionAdminTool About query subscriptions • Overview, page 186 • Query subscription creation, page 186 • Query subscription execution, page 187 Overview Query subscriptions is a feature in which a user can: • Specify to automatically run a particular saved search (full-text or metadata-only) at specified intervals (once an hour, day, week, or month) and return any new results. The results can be discarded or saved. If the results are saved, they can be merged with or replace the previous results. • Unsubscribe from a query. • Retrieve a list of their query subscriptions. • Be notified of the results via a dmi_queue_item in the subscribed user Inbox and, optionally, an email. • Execute a workflow, for example, a business process defined in xCP. Query subscriptions run in Content Server 6.7 SP1 with DFC 6.7 SP1. Support for query subscriptions is installed with the Content Server. A DFC client like Webtop or CenterStage must be customized using DFC 6.7 SP1 to present query subscriptions to the user. Because automatically running queries at specified intervals can negatively affect xPlore performance, tune and monitor query subscription performance. Query subscription creation When a user subscribes to a query, the following objects are created: • A dm_relation object relates the dm_smart_list object to a single dm_ftquery_subscription object. • A dm_ftquery_subscription object specifies the attributes of the subscription and the most recent query results. A user can subscribe to a dm_smart_list object only once. Note: Queries are saved as dm_smart_list objects through the DFC Search Service API or any user interface that exposes that API like Webtop or Taskspace Advanced Search. A dm_smart_list object contains the query. Query Subscription Object Model, page 187 illustrates how the different objects that comprise query subscriptions are related. A query can be subscribed to multiple times. A single dm_smart_list object can have multiple dm_relation objects, and each single dm_relation object, in turn, is related to a single dm_ftquery_subscription object. For example, both Subscription1 and SubscriptionN 186 EMC Documentum xPlore Version 1.2 Administration and Development Guide Search are related to the same dm_smart_list, SmartListX, but through different dm_relation objects, SubscriptionRelation1 and SubscriptionRelationN, respectively. Furthermore, Subscription1 and SubscriptionN have different characteristics. For example, Subscription1 is executed once a week whereas SubscriptionN executes once an hour. There is only one subscriber per subscription; that is, the subscriber of Subscription1 is user1 and the subscriber for SubscriptionN is user2. Figure 16 Query Subscription Object Model Query subscription execution When one of four pre-installed jobs run, the following sequence of actions occurs: 1. Matching query subscriptions are executed sequentially. A matching dm_ftquery_subscription object is one that has a frequency attribute value that matches the job method’s -frequency value. For example, a job method frequency value of hourly executes all matching dm_ftquery_subscription objects once an hour. Note: All queries run under the subscriber user account. 2. One of the following conditions occurs: a. If new results are found, then the new results are returned and one of the following occurs: EMC Documentum xPlore Version 1.2 Administration and Development Guide 187 Search • (Default) A dmi_queue_item is created and, optionally, an email is sent to the subscribing user. • A custom workflow is executed. If the workflow fails, then a dmi_queue_item describing the failure is created. Note: You must create this workflow. b. If no new results are found, then the next matching query subscription is executed. 3. Depending on the result_strategy attribute value of the dm_ftquery_subscription object, the new results: • Replace the current results in the dm_ftquery_subscription object. • Merge with the current results in the dm_ftquery_subscription object. • Are discarded. Note: The number of results returned per query as well as the total number of results saved are set in the dm_ftquery_subscription object max_results attribute. 4. The next matching query subscription is executed. 5. After all matching query subscriptions have been executed, the job stops and a job report is saved. Note: If the stop_before_timeout value (the default is 60 seconds) is reached, then the job is stopped and any remaining query subscriptions are executed when the job runs next time. Installing the query subscription DAR If you have installed Content Server 6.7 SP1, you do not need to install the query subscription DAR. However, if you installed or upgraded to Content Server 6.7 SP1 with FAST as your search engine, you muar manually install the query subscription DAR to support subscriptions. The installation files are located in dsearch_home/setup/qbs. The query subscription DAR file contains: • An SBO that provides functionality to subscribe, unsubscribe, and list the current subscriptions for a user. • A TBO that provides functionality to run the saved search query and save results. • dm_ftquery_subscription object type • dm_qbs_relation object (dm_relation_type) • Jobs and an associated Java class that runs the subscribed queries. • A query subscription test program with which the administrator can validate that query subscriptions were set up properly. 1. Copy qbs.dar, DarInstall.bat or DarInstall.sh, and DarInstall.xml from dsearch_home/setup/qbs to a temporary install directory. 2. Edit DarInstall.xml: a. Specify the full path to qbs.dar including the file name, as the value of the dar attribute. b. Specify your repository name as the value of the docbase attribute. c. Specify the repository superuser name as the value of the username attribute. d. Specify the repository superuser password as the value of the password attribute. 188 EMC Documentum xPlore Version 1.2 Administration and Development Guide Search For example: <emc.install dar="C:\Downloads\qbs.dar" docbase="xPlore1" username="Administrator" password="password" /> 3. Edit DarInstall.bat (Windows) or DarInstall.sh (Linux or Unix). a. Specify the path to the composerheadless package as the value of ECLIPSE. For example: set ECLIPSE="C:\Documentum\product\6.6\install\composer\ComposerHeadless" b. Specify the path to the file DarInstall.xml in a temporary working directory (excluding the file name) as the value of BUILDFILE. For example: set BUILDFILE="C:\DarInstall\temp" c. Specify a workspace directory for the generated Composer files. For example: set WORKSPACE="C:\DarInstall\work" 4. Launch DarInstall.bat (Windows) or DarInstall.sh (Unix or Linux) to install the query subscription SBO. On Windows 2008, run the script as administrator. Testing query subscriptions • You must have installed the query subscription DAR file. See Installing the query subscription DAR, page 188. • You must have installed JDK 6. • Include these JAR files in the Java classpath: aspectjrt.jar qbs.jar server-test.jar dfc.jar commons-lang-2.4.jar log4j.jar You execute the com.documentum.test.qbs.Tqbs.java program to test: • Subscribing to a query for a specific user ( -SubscribeAndVerify flag) • Unsubscribing from a query for a specific user (-UnsubscribeAndVerify flag) • Running a job and verifying that it completes successfully (-RunAndVerifyJob flag) • Deleting a smartlist and verifying that the associated dm_relation and dm_ftquery_subscription objects are deleted (-VerifyDeleteCascading flag) Note: You can also use qbsadmin.bat or qbsadmin.sh. See the qbsadmin.bat and qbsadmin.sh usage instructions., page 198 1. If you have not done so already, create a dm_smart_list object, which is a saved query. You can use a Documentum client (such as Webtop) to save a search, which creates a dm_smart_list object. 2. Execute the Tqbs.java class. For example, executing the following command with the -h flag provides the syntax: EMC Documentum xPlore Version 1.2 Administration and Development Guide 189 Search %JAVA_HOME%\bin\java" -classpath "C:\documentum\config;.\lib\aspectjrt.jar; .\lib\qbs.jar;.\lib\server-test.jar;.\lib\dfc.jar;.\lib\commons-lang-2.4.jar; .\lib\log4j.jar" com.documentum.test.qbs.Tqbs -h Subscription reports When you support query subscriptions, monitor the usage and query characteristics of the users with subscription reports. If there are many frequent or poorly performing subscriptions, increase capacity. Finding frequent or slow subscription queries You can run the following reports to troubleshoot query subscription activity. Use the report to find both frequent and poorly performing queries. • QBS activity report by user: Find the users whose subscription queries consume the most resources or perform poorly. Filter by date range. Set the number of users to display • QBS activity report by ID: Find the subscription queries that consume the most resources or perform poorly. Filter by date range and user name. Order by Total processing time (descending), or Frequency (ascending). Set the number of IDs (N) to display. If you order by total processing time, N subscriptions with the longest query times are displayed. If you order by job frequency, N subscriptions with the shortest job frequency are displayed. • The Top N slowest query: Find the slowest subscription queries. Filter by Subscription query type. Finding subscriptions that use too many resources Subscribed queries can consume too many resources. Use the QBS activity report by ID to find poorly performing queries. Use the QBS activity report by user to find users who consume the most resources. A user can consume too many resources with the following subscriptions: • Too many subscriptions • Queries that are unspecific (too many results) • Queries that perform slowly (too many criteria) Users complain that subscriptions do not return results Troubleshooting steps: • Use Documentum Administrator to make sure that the job was run. Select the job report. • Check that the user query was run using QBS activity report by user. If the query returns no results, set a lower frequency, depending on business needs. • Check the query itself to see if it is properly formulated to return the desired results. The query may be searching on the wrong terms or attributes. In this case, reformulate the saved query. 190 EMC Documentum xPlore Version 1.2 Administration and Development Guide Search Subscription logging Subscribed queries are logged in dsearch.log with the event name QUERY_AUTO. The following information is logged: <event name="QUERY_AUTO" component="search" timestamp="2011-08-23T14:45:09-0700"> ….. <application_context> <query_type>QUERY_AUTO</query_type> <app_name>QBS</app_name> <app_data> <attr name="subscriptionID" value="0800020080009561"/> <attr name="frequency" value="DAILY"/> <attr name="range" value="1015"/> <attr name="jobintervalinseconds" value="86400"/> </app_data> </application_context> </event> Key: • subscriptionID is set by the QBS application • frequency is the subscription frequency as set by the client. Values: HOURLY, DAILY, WEEKLY, MONTHLY. • range reports time elapsed since last query execution. For example, if the job runs hourly but the frequency was set to 20 minutes, the range is between 0 and 40 minutes (2400 seconds). Not recorded if the frequency is greater than one day. • jobintervalinseconds is how often the subscription is set to run, in seconds. For example, a value 86400 indicates a setting of one day in the client. Not recorded if the frequency is greater than one day. dm_ftquery_subscription Represents a subscribed query. Description Supertype: SysObject Subtypes: None Internal name: dm_ftquery_subscription Object type tag: 08 A dm_ftquery_subscription object represents subscription-specific information but not the saved query itself, which is contained in a dm_smart_list object. Properties The table describes the object properties. EMC Documentum xPlore Version 1.2 Administration and Development Guide 191 merged with the new results (1). then the existing results are listed next and sorted from the highest to lowest score. subscriber_name CHAR(32) S Name of the user who is uses this subscription. result_strategy INTEGER s Integer that indicates whether existing results that are saved in the dm_smart_list are to be replaced with the new results (0. or the new results are to be discarded (2). With this value. workflow_id ID S Process ID of the workflow to be executed by the job. Valid value is represented in frequency query subscription job parameter. then the new results are listed first and sorted from the highest to lowest score. then the new results are sorted from the highest to lowest score. the default). 192 EMC Documentum xPlore Version 1.Search Table 24 dm_ftquery_subscription type properties Property Datatype Single or repeating Description frequency CHAR(32) S How often the subscription is to run. then the notification is executed through queue item creation when any result is returned. Specify this value to run jobs when there are too many subscriptions for a single job. • If the result_strategy is set to 1. zone_value INTEGER S Zone to which this subscription belongs. all subscriptions with the same frequency can be picked by different jobs (User can customize jobs such that those jobs will be run on the same interval but with different value in job argument of “zone_value”.2 Administration and Development Guide . If this value is null. The saved query results are sorted as follows: • If the result_strategy is set to 0. last_exec_date TIME S Last date and time that the subscription was executed. 2 Administration and Development Guide 193 . child_type dm_ftquery_subscription None. parent_type dm_smart_list None. Each job executes its query subscriptions in ascending order based on each subscription last_exec_date property value. EMC Documentum xPlore Version 1. Overview Each of these jobs execute all query subscriptions that are specified to execute at the corresponding interval: Job Name Description dm_FTQBS_HOURLY Executes all query subscriptions that are to be executed once an hour. security_type CHILD None. Query subscription jobs A job is executed for each query subscription at a specified interval. it is executed when the job runs next. dm_FTQBS_WEEKLY Executes all query subscriptions that are to be executed once a week. Table 25 dm_qbs_relation properties Property Value Notes object_name dm_qbs_relation All query subscription-created dm_relation objects have this name. then the corresponding dm_relation and dm_ftquery subscription objects are deleted. direction_kind 0 If a dm_smart_list object is integrity_kind 2 deleted. Note: A job is stopped gracefully just before it is timed out. If a query subscription is not executed. dm_FTQBS_DAILY Executes all query subscriptions that are to be executed once a day. Search dm_qbs_relation object A dm_relation_type object that relates the subscription (dm_ftquery_subscription object) to the original dm_smart_list object. Results are returned via an inbox queue item and an email or a workflow. dm_FTQBS_MONTHLY Executes all query subscriptions that are to be executed once a month. txt dm_FTQBS_WEEKLY FTQBS_WEEKLYDoc. -search_timeout (Optional) Number of milliseconds that the job runs before it times out. Valid values: hourly.txt Custom jobs The job method -zone_value parameter is meant for partitioning the execution of query subscriptions amongst multiple custom jobs that run on the same interval.Search Method arguments Argument Description -frequency (Required) Selects the corresponding subscriptions. then it will execute all subscriptions on the same interval regardless of each subscription’s zone_value value. 194 EMC Documentum xPlore Version 1. A custom job executes every dm_ftquery_subscription that has the same zone_value and frequency attribute values as the custom job.txt dm_FTQBS_DAILY FTQBS_DAILYDoc. -zone_value (Optional) An integer that matches subscriptions with the same zone_value. Default: 50. If this argument is specified. If a job does not specify a -zone_value value. then a dm_ftquery_subscription’s zone_value and frequency attributes must match the corresponding method arguments in order for a subscription to be executed by the job. Default value: 60.2 Administration and Development Guide . Reports Job reports are stored in: $DOCUMENTUM\dba\log\sessionID\sysadmin Job Name Report File dm_FTQBS_HOURLY FTQBS_HOURLYDoc. daily. weekly.txt dm_FTQBS_MONTHLY FTQBS_MONTHLYDoc. monthly -stop_before_timeout (Optional) Number of seconds before which you want the job to stop before timing out. Default: 60000. You must specify a -zone_value value for every custom job that runs on the same interval and that value must be unique amongst all those custom jobs. -max_result (Optional) Maximum number of query results that can be returned as well as maximum number that can be saved in the subscription object. dar Methods • public IDfId subscribe (String docbaseName.2 Administration and Development Guide 195 .qbs.fulltext.documentum.QuerySubscriptionInfo.qbs.qbs.IQuerySubscriptionSBO. – The package name must be: QBS-Package0 – The package type must be dm_fquery_subscription.impl. – Only one starting activity can be specified. – The starting activity’s name must be: QBS-Activity-1 • Packages: – One package is required. int EMC Documentum xPlore Version 1. Interface name com. – Only one package can be specified.impl.qbs. execution of the workflow will fail if the subscriber does not have at least RELATE permissions on the workflow. Query subscription workflows A workflow can be called by the query subscription job. IQuerySubscriptionSBO Provides the functionality to subscribe to.server.impl. import com.impl.fulltext.server.IDfId smartListID.impl.IQuerySubscriptionSBO Imports import com. – The subscription ID must be passed as the package. Note: You can create a workflow using Documentum Process Builder. Make sure that your workflow executes correctly before using it with query subscription jobs.QuerySubscriptionException.fulltext. For example. unsubscribe from. because the pre-installed jobs do not have a -zone_value specified and will execute all subscriptions on the same interval regardless of their zone_value value.server.documentum. String subscriber. Requirements • Activities: – One starting activity is required.server.fulltext.documentum. IDfId workFlowID. String frequency. and list query subscriptions. Search Note: None of your custom jobs should have the same interval as any of the pre-installed jobs. import com. DAR QBS.documentum. impl.documentum.impl. String subscriber) throws DfException. DAR QBS. • For examples of calling this SBO. import com. if not applicable. For resultStrategy: Integer that indicates whether existing results that are saved in the dm_smart_list are replaced with the new results (0.DF_NULLDATE.qbs.impl. For more information • For more information about invoking SBOs.server. the default). merged with the new results (1). IDfId smartListID. if not applicable.qbs.2 Administration and Development Guide .documentum. Interface name com.documentum. • public List getSubscribedSmartList(String docbaseName. see the source code for the following class: – com.admin. specify -1.server.IQuerySubscriptionTBO. if not applicable.test. specify DfTime.QuerySubscriptionException Unsubscribe service destroys the dm_relation and dm_ftquery_subscription objects that are associated with the specified dm_smart_list and subscriber.QuerySubscriptionAdminTool IQuerySubscriptionTBO Manages basic query subscription execution.server.fulltext.qbs.results.server.fulltext. Specify -1.Tqbs – com.fulltext.DfResultsSetSAXDeserializer.qbs. String subscriber)throws DfException Returns a list of all of the specified user subscriptions. The workflow template ID can be set to null. int resultStrategy) throws DfException. The object ID of dm_ftquery_subscription object is returned.fulltext.impl. For lastExecDate.QuerySubscriptionException Validates the dm_smart_list object ID and subscriber name in the specified repository. Creates a dm_ftquery_subscription and dm_relation objects. IDfTime lastExecDate.qbs. see the Documentum DFC Development Guide.IQuerySubscriptionTBO Imports import com. if not applicable.DAR 196 EMC Documentum xPlore Version 1. • public boolean unsubscribe (String docbaseName. Notes Extending this SBO is not supported. For zone_value. validates the frequency value with all query subscription jobs with the job method argument “-frequency”.documentum.documentum.Search zoneValue. or the new results are discarded (2). Search Methods • public void setSmartListId(IDfId smartListId) Sets the dm_smart_list object ID associated with the dm_ftquery_subscription object. This method must be called before calling runRangeQuery(). • public IDfResultsSet runRangeQuery(String docbaseName, IDfTime from) throws DfException, IOException, InterruptedException Executes a query saved in a dm_smart_list object from the specified date/time in the from parameter. If from is not a nulldate, a range is added to the search query with a condition like "r_modify_date > = from". If from is a nulldate, then no range condition is added to the search query. • public void setResults(IDfResultsSet results) Saves the results to dm_ftquery_subscription. • public IDfResultsSet getResults() throws DfException Gets the results that are saved in dm_ftquery_subscription. • public void setSearchTimeOut(long timeout) Sets the number of milliseconds that the search runs before it times out. • public long getSearchTimeOut() Gets the number of milliseconds that the search runs before it times out. • public void setMaxResult(int max); public int getMaxResult() Sets the maximum number of query results that can be returned as well as maximum number that can be saved in the subscription object. • public int getMaxResult() Gets the maximum number of query results that can be returned as well as maximum number that can be saved in the subscription object. • public void setResultStrategy(int resultStrategy) Integer indicates whether existing results that are saved in the dm_smart_list are replaced with the new results (0, the default), merged with the new results (1), or the new results are discarded (2). Note: doSave() updates the last_exec_date of the subscription based on this value. • public int getResultStrategy() Gets the result strategy. • public void setQBSAttrs(Map<String,String>qbsInfo) Sets query subscription information, such as subsciption ID. • public Map<String,String> getQBSAttrs() Returns a hash map contains key-value pairs for query subscription information. Notes Extending this TBO is not supported. For more information • See the Javadoc for more information about this TBO. EMC Documentum xPlore Version 1.2 Administration and Development Guide 197 Search • For more information about invoking TBOs, see the Documentum DFC Development Guide. QuerySubscriptionAdminTool Class name com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool Usage You use com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool to: • Subscribe to a query for a specific user (-subscribe flag) • Unsubscribe from a query for a specific user (-unsubscribe flag) • List all subscribed queries for a specific user (-listsubscription flag) Note: All parameter values are passed as string values and must be enclosed in double quotes if spaces are specified in the value. To display the syntax, specify the -h flag. For example: C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath "C:\Documentum\config; .\lib\qbs.jar;.\lib\qbsAdmin.jar;.\lib\dfc.jar;.\lib\log4j.jar; .\lib\commons-lang-2.4.jar;.\lib\aspectjrt.jar" com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool -h Note: In dsearch_home/setup/qbs/tool, qbsadmin.bat and qbsadmin.sh demonstrate how to call this class. In qbsadmin.bat and qbsadmin.sh, modify the path to the dfc.properties file. You can also change the -h flag to one of the other flags. Required JARs qbs.jar qbsAdmin.jar dfc.jar log4j.jar commons-lang-2.4.jar aspectjrt.jar -subscribe example C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath "C:\Documentum\config;.\lib\qbs.jar;.\lib\qbsAdmin.jar; .\lib\dfc.jar;.\lib\log4j.jar;.\lib\commons-lang-2.4.jar; .\lib\aspectjrt.jar" com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool -subscribe D65SP2M6DSS user1 password password1 080000f28002ef2c daily -subscribe output subscribed 080000f28002ef2cfor user user1 succeeded with subscription id 080000f28002f115 -unsubscribe example C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath 198 EMC Documentum xPlore Version 1.2 Administration and Development Guide Search "C:\Documentum\config;.\lib\qbs.jar;.\lib\qbsAdmin.jar; .\lib\dfc.jar;.\lib\log4j.jar;.\lib\commons-lang-2.4.jar; .\lib\aspectjrt.jar" com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool -unsubscribe D65SP2M6DSS user1 password passwrod1 080000f28002ef2c -unsubscribe output User user1 has no subscriptions on dm_smart_list object (080000f28002ef2c) -listsubscription example C:\Temp\qbsadmin>"%JAVA_HOME%\bin\java" -classpath "C:\Documentum\config;.\lib\qbs.jar;.\lib\qbsAdmin.jar; .\lib\dfc.jar;.\lib\log4j.jar;.\lib\commons-lang-2.4.jar; .\lib\aspectjrt.jar" com.documentum.server.impl.fulltext.qbs.admin.QuerySubscriptionAdminTool -listsubscription D65SP2M6DSS user1 password password1 -listsubscription output Subscriptions for user1 are: smartList: 080000f28002ef2c frequency: DAILYworkFlowID: 0000000000000000 smartList: 080000f28002ef2f frequency: 5 MINUTESworkFlowID: 0000000000000000 Troubleshooting search When you set the search service log level to WARN, queries are logged. See Auditing queries, page 204 for more information. If query auditing is enabled (default), you can view or edit reports on queries. See Auditing queries, page 204. Full-text service is disabled Investigate the following possible causes: • No queries are allowed when the search service has not started. You see this error message: The search has failed: The Full-text service is disabled • The username contains an illegal character for the xPlore host code page. Communication error or no collection available If an API returns a connection refused error, check the value of the URL on the instance. Make sure that it is valid and that search is turned on for the instance. If the search service is not enabled, dsearch.log records the following exception: com.emc.documentum.core.fulltext.common.search.FtSearchException:... There is no node available to process this request type. From Documentum DFC clients, the following exception is returned: EMC Documentum xPlore Version 1.2 Administration and Development Guide 199 Search DfException: ..."EXEC_XQUERY failed with error: ESS_DMSearch:ExecuteSearchPassthrough. Communication Error Could not get node information using round-robin routing. From Documentum DQL, the following error is returned: dmFTSearchNew failed with error: ESS_DMSearch:ExecuteSearch. Communication Error Could not get node information using round-robin routing. Error after you changed xPlore host If you have to change the xPlore host name, do the following: • Update indexserverconfig.xml with the new value of the URL attribute on the node element. For information on viewing and updating this file, see Modifying indexserverconfig.xml, page 42. • Change the JBoss startup (script or service) so that it starts correctly. Verifying the query plugin version Queries fail if the wrong (FAST) query plugin is loaded in the ContentServer. Check the Content Server log after your start the Content Server. The file repository_name.log is located in $DOCUMENTUM/dba/log. Look for the line like the following. It references a plugin with DSEARCH in the name. [DM_FULLTEXT_T_QUERY_PLUGIN_VERSION]info: "Loaded FT Query Plugin: .../DSEARCHQueryPlugin.dll...FT Engine version: X-Hive/DB 10" The Content Server query plugin properties of the dm_ftengine_config object are set during xPlore configuration. If you have changed one of the properties, like the primary xPlore host, the plugin can fail. Verify the plugin properties, especially the qrserverhost, with the following DQL: 1> select param_name, param_value from dm_ftengine_config 2> go You see specific properties like the following: param_name param_value - dsearch_qrygen_mode both fast_wildcard_compatible true query_plugin_mapping_file C:\Documentum\fulltext\dsearch\dm_AttributeMapping.xml dsearch_domain DSSLH1 dsearch_qrserver_host Config8518VM0 dsearch_qrserver_port 9300 dsearch_qrserver_target /dsearch/IndexServerServlet Testing a query in xPlore administrator You can search on a full-text string in content or metadata. 1. In xPlore administrator, expand the Diagnostic and Utilities tree and choose Test search. 2. To search for a keyword, choose Keyword and enter the search string. If you enter multiple terms in the Keyword field, the XQuery expression is generated using the AND condition. 200 EMC Documentum xPlore Version 1.2 Administration and Development Guide Search 3. To search using an XQuery expression, choose XQuery and enter the expression. Make sure that you select the correct domain for the document. If you are copying a query from the log, remove declare phrases that include xhive from the beginning of the query. Security is not evaluated for results from a test search. As a result, the number of items returned does not reflect hits that are removed after security is applied in the index server. A status of fail or success indicates that the query did or did not execute; success does not indicate the presence of hits. Testing a query in Documentum iAPI or DQL Try a query like the following: api>?,c,SELECT text,object_name FROM dm_document SEARCH DOCUMENT CONTAINS ’test’ WHERE (a_is_hidden = FALSE) Debugging from Webtop If the query fails to return expected results in Webtop, perform a Ctrl-click on the Edit button in the results page. The query is displayed in the events history as a select statement like the following: IDfQueryEvent(INTERNAL, DEFAULT): [dm_notes] returned [Start processing] at [2010-06-30 02:31:00:176 -0700] IDfQueryEvent(INTERNAL, NATIVEQUERY): [dm_notes] returned [SELECT text,object_name,score,summary,r_modify_date,... SEARCH DOCUMENT CONTAINS ’ctrl-click’ WHERE (...] If there is a processing error, the stack trace is shown. Debugging from DFC To log XQuery and XML results, set log4j.logger.com.documentum.fc.client.search=DEBUG, stdout in dfc.properties for the DFC application. The file dfc.properties is located in the WEB-INF/classes directory of a web application like Webtop or CenterStage. Determining the area of failure 1. Start at the lowest level, xDB. Use the xDB admin tool to execute the XQuery. (Get the XQuery from the log and then click the query icon in the admin tool.) 2. If the query runs successfully in xDB, use xPlore administrator to run the XQuery (Execute XQuery in the domain or collection view). 3. If xPlore administrator runs the query successfully, check the query plugin trace log. See Tracing Documentum queries, page 185. 4. If there are two counter.xml files in domain_name/Data/ApplicationInfo/group collection, delete the file that contains the lower integer value. EMC Documentum xPlore Version 1.2 Administration and Development Guide 201 Search The wrong number of results are returned There is latency between document creation or modification and indexing. First, check whether the object has been indexed yet. You can use the following DQL. Substitute the actual object ID of the document that exists on the Content Server but is not found in search results: select r_object_id from dm_sysobject search document contains object_id If the object has been indexed, check the following: • Check user permissions. Run the query as superuser or through xPlore administrator. • ACL and group databases can be out of synch. Run the manual update script aclreplication. See Manually updating security , page 50. • Query tokens might not match indexed tokens (because of contextual differences). Run the tokenization test on the query terms and on the sentence containing the terms in the document. See Testing tokenization, page 95 • Make sure that the attribute was not excluded from tokenization. Check indexserverconfig.xml for a subpath whose full-text-search attribute is set to false, for example: <sub-path ...full-text-search="false" ...path="dmftmetadata//acl_name"/> • Make sure counter.xml has not been deleted from the collection domain_name/Data/ApplicationInfo/group. If it has been deleted, restart xPlore. • Try the query with Content Server security turned on. (See Changing search results security, page 49.) • Summary can be blank if the summary security mode is set to BROWSE. (See Configuring summary security, page 176.) Searching for XML Users can search for a specific element in an XML document. By default, XML content of an input document is not indexed. To support search in XML content or attributes, change this setting in indexserverconfig.xml . (For information on viewing and updating this file, see Modifying indexserverconfig.xml, page 42.) If your documents containing XML have already been indexed, they must be reindexed to include the XML content. • Change the value of the store attribute on the xml-content element to embed. • Change the value of the tokenize attribute on the xml-content element to true. • Change the value of the index-as-sub-path attribute on the xml-content element to true. • Verify the path value attribute on the xml-content element against the DFTXML path. (For the DFTXML DTD, see Extensible Documentum DTD, page 292.) An XPath error can cause the query to fail. Foreign language is not identified Queries issued from Documentum clients are searched in the language of the session_locale. The search client can set the locale through DFC or iAPI. 202 EMC Documentum xPlore Version 1.2 Administration and Development Guide xml. Check the tokens in the Tokens library to see whether the search term was properly indexed. Substitute the document ID in the following XQuery expression. The collection for each document is recorded in the tracking DB for the domain. see Modifying indexserverconfig.xml. Search Document is not found after index rebuild Reingest large documents. At the original ingestion. for example: C:/xPlore/data/mydomain/default/dmftdoc_2er90/ids. For information on viewing and updating this file. then reindex the document. Set the save-tokens option to true for the target collection and restart xPlore. the XML content exceeds the value of the file-limit attribute on the xml-content element in indexserverconfig. page 42. page 75. and execute it in the xDB admin tool: for $i in collection("dsearch/SystemInfo") where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"] return $i//trackinginfo/document/collection-name 3.2 Administration and Development Guide 203 . and execute it in the xDB admin tool: for $i in collection("dsearch/SystemInfo") where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"] return $i//trackinginfo/document/collection-name • Set the save-tokens option to true for the target collection and restart xPlore. EMC Documentum xPlore Version 1. The collection for each document is recorded in the tracking DB for the domain. then reindex the document.) • Verify that the document was indexed to the correct collection. See Checking the status of a document. Check the tokens in the Tokens library to see whether the search term was properly indexed. Verify that the document was indexed to the correct collection. some documents are not embedded in DFTXML. Make sure that the indexing status is DONE.txt Document is indexed but not searchable Try the following troubleshooting steps: • Make sure that the indexing status is DONE. The list is located in dsearch_home/data/domain_name/collection_name/index_name/ids.txt. (See Checking the index queue in Documentum Administrator. Substitute the document ID in the following XQuery expression. Changes to configuration are not seen If you have edited indexserverconfig. page 124 2. your changes are overwritten by xPlore. In these documents. The index rebuild process generates a list of object IDs for these documents. Try the following troubleshooting steps: 1.xml. MODIFY_TRACE. • Using xPlore search API save: IDfXQuery.S.fulltext.c.ftengine retrieve: The query execution plan is written to dsearch. You view or create reports on the audit record. Audit records are purged on a configurable schedule (default: 30 days). • The instance that processed the query. in an application_context element. SAVE_EXECUTION_PLAN.Search Getting the query execution plan The query plan can be useful to EMC tech support for evaluating slow queries. which is located in the logs subdirectory of the JBoss deployment directory. The application context is supplied by the search client application.SUBSYSTEM.2 Administration and Development Guide . open System Overview in the xPlore administrator left pane. Audit records are saved in an xDB collection named AuditDB. Click search to enable query auditing. seeConfiguring the audit record. 204 EMC Documentum xPlore Version 1.setBooleanOption(IDfXQuery. in the NODE_NAME element. • The user name and whether the user is a superuser. page 38. Query auditing provides the following information: • The XQuery expression in a CDATA element. • The application context. To enable or disable query auditing. Click Global Configuration and choose the Auditing tab.VALUE.setSaveExecutionPlan(true) retrieve: IFtSearchSession. The query plan shows which indexes were probed and the order in which they were probed. Use one of the following options to save or fetch the query plan: • Using DFC query builder API save: IDfXQuery.fetchExecutionPlan(requestId) Troubleshooting slow queries Auditing queries Auditing is enabled by default.FtQueryOptions.S. in the QUERY_OPTION element. For information on configuring the audit record. • The query options in name/value pairs set by the client application.true) retrieve: IDfXQuery.getExecutionPlan(session) • Using iAPI save: apply.NULL.log. – TOTAL_INPUT_HITS_TO_FILTER . – How many times the group-out cache was probed for a query. Change the value of the property xhive-cache-pages in indexserver-bootstrap. dsearch_home/boss5. To view a query in the audit record. in the STATUS element. Troubleshooting slow queries Check the audited events using xPlore administrator. – How many times the group-in cache was probed for a query. The XQuery expression is contained within the QUERY element. This file is located in the WEB-INF/classes directory of the application server instance. – GROUP_IN_CACHE_FILL . click a query of interest to view the XML entry in the report. in the LIBRARY_PATH element. – GROUP_OUT_CACHE_FILL . Stop all xPlore instances. Response times are higher until the caches are loaded with data. You can filter by date and query type.How many times the query added a group to the group-in cache. See Configuring the security cache. • The number of items returned. see Configuring the security cache. page 51.1. go to Diagnostic and Utilities > Reports and choose Audit records for search component. Caches are too small xPlore uses caches that reduce disk I/O. in the HITS_FILTERED_OUT element. You see a slow query the first time and much faster when it is repeated. in the FETCH_COUNT element. in the GROUP_OUT_CACHE_HIT element.How many times the query added a group to the group-out cache. Restart the xPlore instance. for example. in the EXEC_TIME element. in the TOTAL_HITS element.properties to at least 512 KB (maximum 1 GB).2 Administration and Development Guide 205 . For details on configuring the caches. in the FETCH_TIME element. in the GROUP_IN_CACHE_HIT. choose the query type Warmup Tool Query in the Audit records for search component report. From the results. • The time in msec elapsed to fetch results. Search • The xDB library in which the query was executed.war/WEB-INF/classes.. Suggested workaround: Increase the size of the xDB buffer cache for higher query rates. page 51. The audit record reports how many times these caches were hit for a query. – Number of hits filtered out by security because the user did not have sufficient permission.0/server/DctmServer_PrimaryDsearch/deploy/dsearch. To view warmup queries in the audit record. EMC Documentum xPlore Version 1. • The following security events are recorded for user-generated queries. • The amount of time in msec to execute the query. • The number of hits.How many hits a query had before security filtering. • The status of the query. Content Server security is much slower than xPlore native security. For instructions on configuring the caches. If the number of results is more than 1000. which is located in the directory WEB-INF/classes of the primary instance.000 results. For users who are members of many groups. Custom clients can consume a larger result set.” The FAST indexing server supported word fragment searches for leading and trailing wild cards in metadata. • Change the client to consume a smaller number of results by closing the result collection early or by using the DQL hint ENABLE(RETURN_TOP_N). page 49 .properties. examine the group_out_cache_fill element in the query audit record.Search Security caches are not tuned If a user is very underprivileged. If you enable FAST-compatible wildcard behavior for your Documentum application. (When the user is underprivileged. 206 EMC Documentum xPlore Version 1. see Configuring the security cache. many results are discarded.) Workaround: Enable xPlore native security. Make sure query auditing is enabled (default). For underprivileged users. you see slower queries when the query contains a wildcard. or the user is the member of many groups. page 51. Webtop applications consume only 350 results. SeeManaging Security. so the extra result memory is costly for large user environments or multiple collections (multiple repositories). FAST supported word fragment searches in SEARCH DOCUMENT CONTAINS (SDC) full-text queries.000 most relevant results per collection to support a facet window of 10. then the cache is too small. To detect the problem with query auditing enabled. the custom client might return all the results. xPlore security might be disabled and the user is underprivileged. If the number of results is more than 1000. xPlore security is disabled When xPlore native security is disabled. If the value exceeds the groups-in-cache-size. examine the number of results in the TopNSlowestQueries report for a specific user and day. In an environment with millions of documents and multiple collections. FAST-compatible wildcard and fragment behavior is enabled Many Documentum clients do not enable wildcard searches for word fragments like "car” for "careful. queries can slow due to small group caches. Result sets are large By default. Set the value of queryResultsWindowSize to a number smaller than 12000. you could see longer response times or out of memory messages. Examine the number of results in the TopNSlowestQueries report for a specific user and day. xPlore gets the top 12. then the cache is too small. If the value exceeds the not-in-groups-cache-size. examine the group_cache_cache_fill element in the query audit record.2 Administration and Development Guide . Workarounds: • Limit query result set size: Open xdb. because some or many results that are passed to the Content Server are discarded. EMC Documentum xPlore Version 1. Workarounds: • If the system has only one or two cores and a high query rate. DQL and DFC Search service queries always use the index. or memory If a query is slow the first time and much faster when it is reissued. Results are collected across repositories. page 178. page 208 User is very underprivileged If the user is very underprivileged.2 Administration and Development Guide 207 . Use one of the following solutions for this problem: • Merge collections using xPlore administrator. the security filter discards tens of thousands of results. the problem is likely due to insufficient I/O capacity on the allocated drives. organize the repository so that the user has access to documents in certain containers such rooms or cases. Append the container IDs to the user query. find the query using the TopNSlowestQueries report for the specific user and day. Add more capacity. Workaround: Queries can generally be made more selective. make sure that query auditing is enabled. To detect this problem with query auditing. With query auditing enabled. Insufficient CPU.enable = true • Use the ENABLE(fds_collection collectionname) hint or the IN COLLECTION clause in DQL.parallel_execution. disk I/O. If you cannot modify the query. Look for high query rates with slow queries. it is a security cache issue. Query of too many collections A query probes each index for a repository (domain) sequentially. Some IDfXQuery-based queries might not use it. try the query across repositories and then target it to a specific repository. or multiple collections within a domain.search. see Configuring full-text wildcard (fragment) support. If there are multiple repositories. then the query runs slowly.xquery. Examine the TopNSlowestQueries report for the specific user name and day on which the problem was experienced. the query can take more time. add more CPUs. Concatenated drives often show much lower I/O capacity than striped drives because only a subset of drives can service I/O requests. • Use parallel queries in DFC-based search applications by setting the following property to true in dfc.option. See Routing a query to a specific collection. page 138. Search For information on how to change this behavior. If the number in the Documents filtered out columns is large. The query does not use the index If a multi-path index is not used to service the query.properties: dfc. • If the system is large but receives complex or unselective queries. See Moving a temporary collection. Use one of the two following syntaxes. If you use a DQL hint. By default. if any.select attr from type SDC where … enable( fds_query_collection_collection1_collection2__. page 209.) • Route an individual query using the DQL in collection clause to specify the target of a SELECT statement.xml enable(fds_query_collection_collectionname) where collectionname is the collection name... • Implement the DFC IDfXQuery API collection() • DFS PartitionScope object in a StructuredQuery implementation Use DQL You can route a DQL query to a specific collection in the following ways. For example:select r_object_id from dm_document search document contains ’benchmark’ enable(fds_query_collection_custom) • Implement the DFC query builder API addPartitionScope.) 208 EMC Documentum xPlore Version 1. You must turn off XQuery generation.) . (See Turning off XQuery generation to support DQL. Search APIs and customization Routing a query to a specific collection You can route a query to a specific in the following ways: • Route an individual query using the DQL in collection clause to specify the target of a SELECT statement. but you can turn off XQuery generation..2 Administration and Development Guide . refer to Documentum System Search Development Guide. page 209. For more information on the hints file. DFC does not generate DQL. By default. but you can turn off XQuery generation. See Turning off XQuery generation to support DQL. find the query using the TopNSlowestQueries report (with user name and day). you do not need to change the application or DFC query builder.Search APIs and customization To detect this issue with query auditing.. Obtain the query plan to determine which indexes were probed.) Rewrite the query to use the index.) For example: select r_object_id from dm_document search document contains ’benchmark’ in collection(’custom’) • Route all queries that meet specific criteria using a DQL hint in dfcdqlhints.. DFC does not generate DQL.’collection2’. page 209. (Provide the query plan to EMC technical support for evaluation. Click the query ID to get the query text in XML format. – Collection names are separated by underscores .) select attr from type SDC in collection (’collection1’. (See Turning off XQuery generation to support DQL. generation. EMC Documentum xPlore Version 1.xml enable(fds_query_collection_collectionname) where collectionname is the collection name.enable=false Debugging queries with the xDB admin tool Some query optimization and debugging tasks use the xDB admin tool.2 Administration and Development Guide 209 . customize query routing to the appropriate collection. refer to Documentum System Search Development Guide. Search APIs and customization – Collection names are in quotation marks.select r_object_id from dm_document search document contains ’report’ in collection ( ’default’ ) enable(return_top 10) • Route all queries that meet specific criteria using a DQL hint in dfcdqlhints.search. For a detailed example of query routing.<RuleSet> <Rule> <Condition> <From condition="any"> <Type>my_type</Type> </From> </Condition> <DQLHint>ENABLE(FDS_QUERY_COLLECTION_MYTYPECOLLECTION)</DQLHint> </Rule> </RuleSet> Implement a DFC query builder API If you have created a BOF module that routes documents to specific collections.xquery. The following hints route queries for a specific type to a known target collection appended to FDS_QUERY_COLLECTION_. see "Improving Webtop Search Performance Using xPlore Partitioning" on the EMC Community Network (ECN). separated by commas.properties on the DFC client application: dfc. Turning off XQuery generation to support DQL Add the following setting to dfc. Call addPartititionScope(source. collection_name) in IDfQueryBuilder. See "Building a query with query builder APIs" in Documentum System Search Development Guide. For more information on the hints file. debug the query. Building a query with the DFC search service In a Documentum environment. With Federated Search Services (FS2) product. and perform concurrent query execution in a federation. change display attributes. you can query external sources and the client desktop as well.Search APIs and customization Figure 17 xDB admin tool 1. 5. wildcards. support external sources. Click the connection icon to log in.bat or XHAdmin.xml. you can create queries for one or more full-text indexed or non-indexed Servers. fuzzy search. which shows segments. Expand the root library to find a library and a collection of interest. Query a library or collection with the search icon at the top of the xDB admin client. 4. The password is the same as your xPlore administrator password. After login.client. This tool is not aware of xPlore configuration settings in indexserverconfig. support asynchronous operations. the DFC and DFS APIs encapsulate complex functionality like stemming. your backups cannot be restored. you see the tree in the left pane. hit count. 2.search allows you to build queries that provide the following information: • Data to build the query • Source list (required) • Max result count 210 EMC Documentum xPlore Version 1. The query window has tabs that show the results tree.fc. 3.2 Administration and Development Guide .documentum. and facets. and libraries. then highlight a particular indexed document to see its XML rendition. Run the script XHAdmin. You can change the query structure.sh. If you remove segments. users. IDfQueryBuilder in the package com. Building a query definition with IDfQueryBuilder With the DFC search service. The DFC interface IDfQueryBuilder provides a programmatic interface. CAUTION: Do not use xhadmin to rebuild an index or change files that xPlore uses. groups. and optimize the query. Navigate to dsearch_home/dsearch/xhive/admin. or if it failed with errors. By default. Multiple successive calls can be made to get new results and the query status. EMC Documentum xPlore Version 1. You can modify the cache clean‑up properties by editing the dfs‑runtime. private Query getScopedQuery () { StructuredQuery structuredQuery = new StructuredQuery(). scope. PartitionScope objects target the query to specific collections. depending on the Search Profile setting. searches are blocking. whether to include subfolders. The query status contains the status for each source repository. refer to Documentum System Search Development Guide. hit count. Searches can either be blocking or non‑blocking.setScopes(Arrays.setRepositoryName(m_docbase). the DFC and DFS APIs encapsulate complex functionality like stemming. The cache clean‑up mechanism is both time‑based and size‑based. Non‑blocking searches display results dynamically.properties file. refer to EMC Documentum Enterprise Content Services Reference. if more results are expected. scope. DFS search services provide search capabilities against EMC Documentum repositories. The cache contains the search results populated in background for every search. and specifies object type. targets a specific collection. Building a query with the DFS search service In a Documentum environment. fuzzy search. scope. indicating if it is successful. A structured query defines a query using an object‑oriented model. structuredQuery.asList(scope)).setDescend(true). An ordered list of RepositoryScope objects define the scope of the query (the sources against which it is run). ExpressionScope Building a structured query The following example sets the repository name. creates the full-text expression. For information on creating and configuring facets and processing facet returns.setExcluded(true). For complete information on the DFS search service and code examples. page 225. The query is constrained by a set of criteria contained in an ExpressionSet object. RepositoryScope scope = new RepositoryScope(). wildcards. folder path in the repository.2 Administration and Development Guide 211 . as well as against external sources Documentum Federated Search Services (FS2) server. scope.setLocationPath("/SYSTEM”). see Facets. Search APIs and customization • Container of source names • Transient search metadata manager bound to the query • Transient query validation flag • Facet definition • Specific folder location using addLocationScope() • Target for a particular collection using addPartitionScope() • Specify parallel execution across several collections For examples of query building. and facets. setRepositoryName(m_docbase).CONTAINS. new SimpleValue("test"))). page 212. Perform the following steps to use the DFC APIs for building an XQuery: 1.setRootExpressionSet(expressionSet2). See Get IDfXQuery object. Set query options. ExpressionScope eScope = new ExpressionScope(). Execute the query.asList(eScope)). page 210). See Set the query target. } Building a DFC XQuery You can build XQuery expressions using DFC. eScope. page 214 6.xquery: DfFullTextXQueryTargets for xPlore and DfStoreXQueryTargets for XML Store. Retrieve results. 3. See Set query options. page 214 Get IDfXQuery object Create a new DFC client and get an IDfXQuery object: IDfClientX clientx = new DfClientX().documentum. pScope. See Create the XQuery statement. See Execute the query. return structuredQuery. expressionSet2. final ExpressionSet expressionSet = new ExpressionSet(). structuredQuery. There are two implementations of IDfXQueryTargets in the package com. it may be easier to use the DFC search service to build queries (Building a query with the DFC search service.asList(pScope)).getXQuery().setPartitionScopes(Arrays. page 212. 4. structuredQuery.xml. page 213.setExpressionScopes(Arrays. if you are familiar with search customizations using DFC. However.Search APIs and customization PartitionScope pScope = new PartitionScope(). Set the query target An XQuery expression can be run against the EMC Documentum XML Store or against the xPlore server. expressionSet. 5. structuredQuery.setObjectType("dm_document"). The following example sets the xPlore target: 212 EMC Documentum xPlore Version 1. 2. IDfXQuery xquery = clientx. page 213. structuredQuery.setExpressionSet(expressionSet). eScope.setRepositoryName(m_docbase). Create an XQuery statement. See Retrieve the results. Get a DFC client object and an IDfXQuery object.addExpression(new FullTextExpression("EMC")). Set the query target to xPlore.2 Administration and Development Guide .setPartitionName("partitionName").addRepository(m_docbase). structuredQuery. Condition.addExpression(new PropertyExpression( "object_name". // Set expression ExpressionSet expressionSet2 = new ExpressionSet(). pScope. xquery.xquery. and saves the execution plan for debugging.xml. >= ‘ 2008-12-20T08:00:00’] ) and . Set query options You can set query options in the query before calling execution. XQuery expressions submitted through the IDfXQuery interface use the xPlore native full-text security evaluation.getXQuery().setTimeout(10000).. xquery.. (For Documentum clients.xml. Options: • Debugging: – Get and set client application name for logging – Get and set save execution plan to see how the query was executed • Query execution: – Get and set result batch size. For a single batch. For more information on these options.2 Administration and Development Guide 213 . The following example sets timeout. set security evaluation of FTDQL in the ftsearch_security_mode attribute of the dm_ftengine_config object. batch size. – Get and set target collection for query – Get and set query text locale – Get and set parallel execution of queries – Get and set timeout in ms • Security: – Get and set security filter fully qualified class name – Get and set security options used by the security filter EMC Documentum xPlore Version 1.getXQueryTargets(IDfXQueryTargets. The following example creates a query for a string in the contents: IDfXQuery xquery = clientx. turns on caching. the IDfXQuery interface in the package com."). set to 0. IDfXQueryTargets fttarget = clientx.documentum. xquery.getXQuery().setXQueryString("unordered(for $i in collection(‘/docbase1/DSS/Data’) where ( ( $i/dmftdoc/dmftmetadata/*/r_creation_date[.documentum.xquery runs user-defined XQuery expressions against xPlore indexes.setSaveExecutionPlan(true). IDfXQuery xquery = clientx.DF_FULLTEXT) Create the XQuery statement For Documentum search clients. xquery. xquery. refer to the javadocs for IDfXQuery in the package com.setCaching(true).setBatchSize(200).) Create an XQuery expression to submit. Search APIs and customization IDfClientX clientx = new DfClientX(). retrieve: IFtSearchSession.NULL. which is located in the logs subdirectory of the JBoss deployment directory. Use one of the following options to save or fetch the query plan: • DFC query builder API. options. The query plan shows which indexes were probed and the order in which they were probed.SUBSYSTEM.getInputStream(session). Retrieve the results You can get the results as an input stream from the instance of IDfXQuery. retrieve: IDfXQuery.fetchExecutionPlan(requestId) 214 EMC Documentum xPlore Version 1.setBooleanOption(IDfXQuery.execute(session. save: IDfXQuery.. xquery. passing in the DFC session identifier and the xPlore target.VALUE. and XQuery in the instance of IDfXQuery.fulltext. • xPlore search API.Search APIs and customization – Get and set native security (false sets security evaluation in the Content Server) • Results: – Get and set results streaming – Get and set results returned as XML nodes – Get and set spooling to a file – Get and set synchronization (wait for results) – Get and set caching • Summaries: – Get and set return summary – Get and set return of text for summary – Get and set summary calculation – Get dynamic summary maximum threshold – Gets and sets length of summary fragments – Get summary security mode Execute the query After you have set the target. save: IDfXQuery. You pass in the DFC session identifier of the session in which you executed the query.SAVE_EXECUTION_PLAN. InputStream results = xquery. save: apply.ftengine.MODIFY_TRACE. you execute. Get the query execution plan The query plan can be useful to EMC tech support for evaluating slow queries.S.FtQueryOptions.c.true).setSaveExecutionPlan(true). retrieve: The query execution plan written to dsearch.log. fttarget).S.2 Administration and Development Guide .getExecutionPlan(session) • iAPI. IFtSearchSession. Set query debug options. and then provide the options object to the executeQuery method of IFtSearchSession. page 212. For example: IFtQueryOptions options = new FtQueryOptions(). Add options similar to the following.emc. connection). Access to indexing and search APIs is through IDSearchClient in the package com. it may be easier for you to create a query using their APIs.common. page 253. See Building a query with the DFC search service.search and Set the query target.fulltext. Set query options.setSpooled(true). Some of the classes in the jars will be used for later examples.core. For more information. the settings override the global configuration settings in the xPlore administration APIs. FtQueryDebugOptions EMC Documentum xPlore Version 1.client. When you use an xPlore API to set options. public void connect() throws Exception { String bootStrap = BOOT_STRAP. page 290.fulltext. The following example connects to the search service and creates a session.documentum. private String m_domain = "DSS_LH1". Reference the jars in the SDK dist and lib directories in your classpath. If you are familiar with DFC or DFS applications.core. Perform the following steps to use the xPlore APIs for query building: 1. To set options.7. options).documentum. use the following syntax: public String getDebugInfo(IDfSession session. see Setting up the xPlore SDK. IDSearchClient client = DSearchClient.emc. The enumeration FtQueryDebugOptions can be used to set debug options for IDfXQuery in DFC version 6. DSearchServerInfo connection = new DSearchServerInfo(m_host. //see "Executing a query" } 4. You execute a query by calling executeQuery for the interface com. m_port). private int m_port = 9300. executeQuery(xquery. Create an XQuery statement. options. The following example creates a query for a string in the contents: public void testQuery() { String xquery = "for $doc in doc(’/DSS_LH1/dsearch/Data/default’) where $doc/dmftdoc[dmftcontents ftcontains ’strange’] return string(<R> <ID>{ string($doc/dmftdoc/dmftmetadata//r_object_id)}</ID></R>)". 5. Get a search session using IDSearchClient.createFtSearchSession(m_domain). 3. Each API is more fully described in the javadocs and in Search APIs. See the javadocs for IFtQueryOptions in the package com.fulltext. Search APIs and customization Building a query using xPlore APIs You can build a query using xPlore XQuery APIs.newInstance( "MySearchSession". page 210 or Building a query with the DFS search service. m_session = client. page 211.2 Administration and Development Guide 215 . 2.client.documentum.emc. } private String m_host = "localhost".core.//this is the xPlore domain name private IFtSearchSession m_session. executeQuery(xquery.out. Execute the query. page 214.executeQuery(xquery. //printQueryResult(r).QUERY_ID).setSpooled(true). Iterator<IFtQueryResultValue> results = m_session. options.setWaitForResults(true). requestId = m_session.executeQuery.FtQueryDebugOptions. } } 8.getResultsIterator( requestId).Search APIs and customization debugOption) throws DfException.getDebugInfo(m_session. The following example sets the query options. The method executeQuery returns an instance of IFtQueryRequest from which you can retrieve results.println(). printing them to the console. 6. Retrieve results. options.hasNext()) { IFtQueryResultValue r = results.println("Failed to execute query").out. See Retrieve the results. while (results. See Execute the query. executes the query by implementing the IFtSearchSession method executeQuery. private void executeQuery (String xquery) { String requestId = null.out. requestId = m_session. options). For example: String queryid = xquery. options. Results from IFtSearchSession. Provide the query options and XQuery statement to your instance of IFtSearchSession. See next step System.2 Administration and Development Guide . while (results.next(). System. 7. options).print("results = "). Retrieve results. } } catch (FtSearchException e) { System. similar to the following: requestId = m_session.hasNext()) { 216 EMC Documentum xPlore Version 1. options.executeQuery are returned as an instance of IFtQueryRequest from which you can retrieve results.getResultsIterator( requestId). IDfXQuery.setResultBatchSize(5). options).setAreResultsStreamed(false). Get each result value as an instance of IFtQueryResultValue by iterating over the IFtQueryRequest instance. page 214. Iterator<IFtQueryResultValue> results = m_session.executeQuery(xquery. and iterates through the results. try { IFtQueryOptions options = new FtQueryOptions(). NODE) { System. Valid values: • setApplicationAttributes(Map<String. For example.print(v.2 Administration and Development Guide 217 . true).getSelectListType(). query subscription and query warmup add context to indicate the type of query. set the application name and query type. This information is used to report subscription queries. page 243. and add your custom attributes to the application context object: IDfQueryProcessor processor = m_searchService. For information on creating reports from the audit record. Set user-defined attributes in a Map object. EMC Documentum xPlore Version 1.out.getValueAsString()). Search APIs and customization IFtQueryResultValue r = results. DFC example The following example sets the query subscription application context and application name. DfApplicationContext can set the following context: • setApplicationName(String name) • setQueryType(String type). The context information is available in audit events and reports.next().Type.7 SP1.getType() != IFtQuerySelectListItem. for (IFtQueryResultValue child : children) { printQueryResult(child). you can add context information to a query. } private void printQueryResult(IFtQueryResultValue v) throws FtSearchException { if (v. see Editing a report.getValue(). A Documentum client sets query context using the DFC search service or IDfXQuery. Context information is not used to execute the query.String> attributesMap). } else { List<IFtQueryResultValue> children = ( List<IFtQueryResultValue>) v. Instantiate a query process from the search service. } }} Adding context to a query With DFC 6.newQueryProcessor( queryBuilder. DFC IDfQueryProcessor method setApplicationContext(DfApplicationContext context). printQueryResult(r). a report that gets failed subscribed queries has the following XQuery expression.Search APIs and customization DfApplicationContext anApplicationContext = new DfApplicationContext(). aSetOfApplicationAttributes. For example."300"). anApplicationContext.String> aSetOfApplicationAttributes = new HashMap<String."320").blockingSearch(60000). IDfResultsSet results = processor. ftcontains "xplore" with stemming using stop words default)] return string(<R>{$i//r_object_id}</R>)]]></QUERY> <USER_NAME>unknown</USER_NAME> <IS_SUPER_USER/> <application_context> <app_name>QBS</app_name> <app_data> <attr name="subscriptionid" value="080f444580029954"/> <attr name="frequency" value="300"/> <attr name="range" value="320"/> </app_data> </application_context> </event> The event data is used to create a report.setApplicationName("QBS"). anApplicationContext. This expression gets queries for which the app_name is QBS and the queries are not executed: let $lib :=’/SystemData/AuditDB/PrimaryDsearch/’ let $failingQueries := collection($lib)//event[name=’AUTO_QUERY’ and application_context[app_name=’QBS’ and app_data[attr[ @name=’frequency’]/@value < attr[@name=’range’]/@value]]]/QUERY_ID return $failingQueries The result of this XQuery is the following: <QUERY_ID>PrimaryDsearch$706f93fa-e382-499c-b41a-239ae800da96 </QUERY_ID> 218 EMC Documentum xPlore Version 1. anApplicationContext.put("frequency".String>().put("range". processor.2 Administration and Development Guide . Map<String.setQueryType("AUTO_QUERY"). The context is serialized to the audit record as follows: <event name="AUTO_QUERY" component="search" timestamp=" 2011-07-26T14:00:18-0700"> <QUERY_ID>PrimaryDsearch$706f93fa-e382-499c-b41a-239ae800da96 </QUERY_ID> <QUERY> <![CDATA[for $i in collection(’/yttestenv/dsearch/Data’)/dmftdoc[( dmftmetadata//a_is_hidden = "false") and (dmftversions/iscurrent = " true") and (.setApplicationAttributes( aSetOfApplicationAttributes). aSetOfApplicationAttributes.setApplicationContext(anApplicationContext). fulltext. Call setQueryType(FtQueryType queryType) with the FtQueryType enum. set the API and compare performance with a sequential query (the default).fulltext.enable = false You can also use one of the following APIs to execute a query across several collections in parallel: • DFC API: IDfXQuery FTQueryOptions.common.PARALLEL_EXECUTION • xPlore API: IFtQueryOptions.search. and minLevelValue and maxLevelValue are 2: ’using thesaurus at ’thesaurusURI’ relationship ’RT’ exactly 2 levels’ In the following example.documentum. ’using thesaurus at ’thesaurusURI’ relationship ’BT’ at most 2 levels’ EMC Documentum xPlore Version 1. Custom access to a thesaurus Thesaurus expansion of queries is supported without customization.properties: dfc.documentum. Search APIs and customization IDfXQuery Use the API FtQueryOptions in the package com.search. Implement getTermsFromThesaurus(). and maxLevelValue is 2.common. In the following example.emc. Use the input terms from the query to probe the thesaurus. minLevelValue is Integer.2 Administration and Development Guide 219 .parallel_execution. int maxLevelValue).search package. To customize access.option. which returns a collection of string terms from the thesaurus.emc.core. Call setApplicationName(String applicationName) to log the name of the search client application. the relationship value is “RT” (related term). you must create a custom thesaurus class that implements IFtThesaurusHandler in the com. String relationship. for example. To probe all collections in parallel. You can add custom access to a thesaurus that does not conform to the SKOS format. CAUTION: Parallel queries may not perform better than a query that probes each collection in sequence. For information on these parameters.setParallelExecution(true) Parallel queries are not supported in DQL. For example.MIN_VALUE. int minLevelValue. see FTThesaurusOption. You can use the optional XQuery relationship and levels parameters of FTThesaurusOption to specify special processing. the relationship is “BT” (broader term). Multi-term queries result in multiple calls to this method. you could add the Basistech Name Indexer to match people. Using parallel queries You can enable parallel queries in DFC by setting the following property to true in dfc. places. or organizations.xquery. webtop.core. public Collection<String> getTermsFromThesaurus(Collection<String> terms. For example:for $i score $s in collection(’/testenv/dsearch/Data’) /dmftdoc[. The following example indicates a thesaurus URI to a custom-defined class. The following example uses the default thesaurus to expand the full-text lookup and a metadata thesaurus to expand the metadata lookup: IDfExpressionSet rootSet = queryBuilder.java • Package the class in a jar file and put it into the library dsearch_home/jboss5. Note: Only one instance of the class is created on startup. > <collection .class • Modify indexserverconfig. Define a new thesaurus element under the domain that will use the custom thesaurus. When a query specifies this URI.fulltext. 220 EMC Documentum xPlore Version 1.core... Implement a class that implements the IFtThesaurusHandler interface. For example: javac –cp dsearch-client.search.. the custom class is used to retrieve related terms.1.2 Administration and Development Guide .xml.Search APIs and customization Setting up a custom thesaurus Perform the following steps: • 1.jar com\emc\documentum\core\fulltext\ common\search\impl\SimpleThesaurusHandler. <domain storage-location-name="default" default-document-category=" dftxml" name=.0/server/DctmServer_PrimaryDsearch/deploy/dsearch. you may have a metadata thesaurus that lists various forms of company names.jar com\emc\documentum\core\fulltext\common\ search\impl\SimpleThesaurusHandler.war/WEB-INF/lib. <thesaurus uri="my_thesaurus" class-name=" com. The path in the jar file must match the package name... Restart the xPlore instances after making this change.impl. For example: jar cvf com\emc\documentum\core\fulltext\common\search\impl\ dsearch-thesaurus.xml to specify the custom thesaurus.getRootExpressionSet(). • DQL: Use the ft_use_thesaurus_library hint. Avoid thread synchronization issues or use thread local storage • Compile the custom class.documentum. For example. > . A sample FAST thesaurus is provided at /samples/thesaurus. • IDfXQuery: Add a thesaurus option. This file is provided in the SDK at the following path: /samples/src/com/emc/documentum/core/fulltext/common/search/impl). so multiple search threads share the class.common.FASTThesaurusHandler"/> </domain> Accessing the custom thesaurus in a query You can specify custom thesaurus access in a DFC or IDfXQuery: • DFC: setThesaurusLibrary(String uri).java for an example.jar in your classpath when you compile.emc. ftcontains ’food products’ using thesaurus default] order by $s descending return $i/dmftinternal/r_object_id You can access one thesaurus for full-text and one thesaurus for metadata. See FASTThesaurusHandler. Include dsearch-client. See FTThesaurusOption.. Use the URI that you defined in indexserverconfig. (This class is included in the xPlore SDK.setThesaurusSearchEnabled(true).DF_STRING.emc.emc.iterator(). package com. Collection<String> result = new ArrayList<String>().documentum. //simple attribute expression uses custom metadata thesaurus IDfSimpleAttributeExpression aMetadataExpression = rootSet. Collection<String>>().emc.common. import com. companyNameValue).addFullTextExpression( fulltextValue). A sample FAST thesaurus is provided at /samples/thesaurus. } private static Map<String. int maxLevelValue) { Iterator<String> termIterator = terms.addAll(s_thesaurus. while (termIterator.search. false. false. import java.SEARCH_OP_CONTAINS. import java. Search APIs and customization //full-text expression uses default thesaurus IDfFullTextExpression aFullTextExpression = rootSet.com/metadatathesaurus").containsKey(key)) result.IFtThesaurusHandler. aMetadataExpression.util.hasNext()) { String key = termIterator.addSimpleAttrExpression("companyname".*. aFullTextExpression.core.search.impl. static EMC Documentum xPlore Version 1.) When the class is instantiated by xPlore.get(key)). String relationship.next().documentum. if (s_thesaurus. This file is provided in the SDK at the following path: /samples/src/com/emc/documentum/core/fulltext/common/search/impl). Sample thesaurus handler class The FASTThesaurusHandler class is a sample implementation of the IFtThesaurusHandler interface. IDfValue.setThesaurusLibrary(" http://search. int minLevelValue.fulltext. aMetadataExpression.io. it reads a FAST dictionary file and stores the term mappings into memory. This results in quick lookups to return related words from the FASTThesaurusHandler class during query execution.*. public class FASTThesaurusHandler implements IFtThesaurusHandler { public Collection<String> getTermsFromThesaurus( Collection<String> terms.common.fulltext.core. IDfSimpleAttrExpression.setThesaurusSearchEnabled(true). } return result. Collection<String>> s_thesaurus = new HashMap<String.2 Administration and Development Guide 221 . // related should have at least "[?]". Collection<String> terms = new ArrayList<String>( relatedTerms. where ? is any character if (related. related = related.split("=". String key = mapping[0]. terms). for (String term : relatedTerms) { terms.txt".println("FileNotFoundException while loading FAST Thesaurus: " + e. 2). BufferedReader br = new BufferedReader(new InputStreamReader(in)).substring(1.getenv("DOCUMENTUM") + " /DocumentumThesaurusFAST. } s_thesaurus.Search APIs and customization { try { String location = System. String relatedTerms[] = related.length() < 3) continue.length() < 1) continue.2 Administration and Development Guide .length( )-1) != ’]’) continue.length() > 0) { String[] mapping = line.length()-1).out.out.put(key.split(".getMessage()).getMessage()). FileInputStream fstream = new FileInputStream(location). } } } 222 EMC Documentum xPlore Version 1. String related = mapping[1]."). DataInputStream in = new DataInputStream(fstream).length). String line.charAt(related. } } } catch (FileNotFoundException e) { System.println("IOException while loading FAST Thesaurus: " + e. if (related. related. while((line = br. // do some format checking if (key.charAt(0) != ’[’ || related. } catch (IOException e) { System.add(term).readLine()) != null) { if (line. properties in Documentum_home/config: dfc.) FT_COLLECTION Use DFC query builder addPartitionScope() or turn on DQL generation TRY_FTDQL_FIRST. DQL hints migration The following table lists the DQL hint support in xPlore. First.xml.2 Administration and Development Guide 223 . Next. the query is evaluated in the full-text index. A query in XQuery does not return results until xPlore has updated the index.properties.xquery. If all or part of the query does not conform to FTDQL. NOFTDQL Turn on DQL generation in dfc. You can turn off XQuery generation in dfc.properties so that DQL is generated and hints are applied. see "Including DQL hints" in Documentum Content Server DQL Reference Manual. FT_CONTAIN_FRAGMENT Default is FT_CONTAIN_WORD. add the following line to dfc.properties. FOR READ/BROWSE/DELETE/MODIFY option Turn on DQL generation in dfc. run an XQuery expression (default). not DQL. Search APIs and customization DQL Processing The DFC and DFS search services by default generate XQuery expressions. Modify ft_engine_config to support fragments.generation. See Configuring full-text wildcard (fragment) support. Comparing DQL and XQuery results DQL and XQuery results can be compared by testing a query with the DFC search service.properties or turn off lemmatization in xPlore indexserverconfig. DFC search service asks for the exact number of results. All metadata constraints are evaluated in the Content Server database. DQL hints in a hints file are not applied. turn off XQuery generation and generate DQL. Sometimes you see different results if you query for metadata on an object that has not yet been indexed. but used to configure queries in DQL hint manager. for xPlore. Consequently. If query constraints conform to FTDQL. and the results are combined. A query in DQL returns results directly from the repository. Table 29 DQL hint migration for xPlore DQL xPlore RETURN TOP N Not needed by xPlore.disable Unsupported DQL xPlore does not support the DQL SEARCH TOPIC clause or pass-through DQL. For a full description of the referenced DQL hints. To turn off XQuery generation. EMC Documentum xPlore Version 1. page 178. (Not a hint. only the SDC portion is evaluated in the full-text index. Do not turn off XQuery generation if you want xPlore capabilities. ENABLE(dm_fulltext(’qtf_lemmatize=0|1’) Turn on DQL generation in dfc.search. properties.2 Administration and Development Guide .Search APIs and customization DQL xPlore FTDQL Use default XQuery generation unless other hints are added ROW_BASED or database-specific hints No equivalent All other hints Turn on DQL generation in dfc. 224 EMC Documentum xPlore Version 1. r_modifier or keywords. A facet represents one or more important characteristics of an object. tags. Facets are computed on discrete values. Faceted navigation permits the user to explore data in a large dataset. and date or numeric ranges. so that the user can drill down rather than constructing a query in a complicated UI. removing the need to write explicit queries and avoiding queries that do not return desired results. page 226. Facets are not computed on text fields such as content or object name. categories. • Faceted navigation prevents dead-end queries by limiting the restriction values to results that are not empty. represented by one or more object attributes in the Documentum object model. authors. the client application must provide localization. allows users to explore large datasets to locate items of interest. Facet results are not localized. After facets are computed and the results of the initial query are presented in facets. • The data set is presented in a visual interface. See Configuring facets in xPlore. Some facets are already configured by default. the user can drill down to areas of interest.2 Administration and Development Guide 225 . You can define facets for the attributes that are used most commonly for search. the query is reissued for the selected facets. for example. Chapter 10 Facets This chapter contains the following topics: • About Facets • Configuring facets in xPlore • Creating a DFC facet definition • Facet datatypes • Creating a DFS facet definition • Tuning facets • Logging facets • Troubleshooting facets About Facets Faceted search. For drilldown. for example. Facets are presented in a visual interface. EMC Documentum xPlore Version 1. Multiple attributes can be used to compute a facet. It has several advantages over a keyword search or explicit query: • The user can explore an unknown dataset by restricting values suggested by the search service. also called guided navigation. Before you create facets. create indexes on the facet attributes. Preconfigured facets in xPlore The following facets are configured in indexserverconfig.xml.2 Administration and Development Guide . Subpath indexes speed this lookup. Configuring facets in xPlore Facets are configured in indexserverconfig. If security is evaluated in the Content Server and not in xPlore. Search service reads through the results iterator until the number of results specified in query-max-result-size has been read (default: 10000). (They are configured in a subpath element whose returning-contents attribute is set to true. For information on using the DFC query builder API.xml. see Building a query with the DFS search service. Default: 10. For information on using the DFS search service. page 210. The APIs that perform these operations are described fully in the following topics. API overview Your search client application can define a facet using the DFC query builder API or DFS search service. Return the facets values and top results. See Building a query with the DFC search service. page 210. Your DFC-based application must also define the facet using query builder APIs. c. 4. b. Perform the following on the list of all facet values: a. Keep only the top facet values according to setMax (DFC) or setMaxFacetValues (DFS). Facets are computed in the following process. page 211. facets are disabled. DFC or DFS search service evaluates the constraints and returns an iterator over the results. 2. Do not add them to the application. 3. Order the facet values. For each result.Search APIs and customization Facets and security Facets restrict results based on the xPlore security filter. so that users see only those documents for which they have permission. get the attribute values and increment the corresponding facet values. see Building a query with the DFC search service. 1.) • a_application_type (Documentum application) • a_content_type (file format) • a_gov_room_id (room ID) • acl_domain (permission set owner) • acl_name (permission set) • city (CenterStage property) 226 EMC Documentum xPlore Version 1. because the values are found in the index. not in the xDB pages. .. A subpath applies to the element node in DFTXML and its descendants. Configuring the attribute datatype for facets Configure a subpath for each typed Documentum attribute that is indexed. In the following excerpt from indexserverconfig. r_modify_date is available for use as a facet: <path-value-index path=.> . A subpath element has the following schema definition: <xs:complexType name="sub-path"> <xs:complexContent> <xs:extension base="base-config"> <xs:attribute name="path" type="xs:string" use="required"/> <xs:attribute name="type" use="optional" default="string"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="string"/> <xs:enumeration value="integer"/> <xs:enumeration value="boolean"/> <xs:enumeration value="double"/> </xs:restriction> EMC Documentum xPlore Version 1. Every subpath whose returning-contents attribute is set to true can be used to compute facets.. configure a subpath in indexserverconfig.xml. see Subpaths.xml. Search APIs and customization • company (CenterStage property) • continent (CenterStage property) • country (CenterStage property) • keywords (user-defined list of keywords) • location (CenterStage property) • owner_name (document owner) • person (CenterStage property) • r_full_content_size (document size) • r_modifier (last person who modified the document • r_modify_date (last modification date) • r_object_type (object type) • state (CenterStage property) Configuring your own facets For each attribute that is used as a facet.. Stop all instances in the xPlore system before modifying this file. <sub-path returning-contents="true" value-index="true" tokenized="true" position-index="false" type="string" path="dmftmetadata//r_modify_date"/> </path-value-index> For more information on subpath configuration. This file is located in dsearch_home/config.2 Administration and Development Guide 227 . page 121. See Facet datatypes. a custom attribute has been configured as a subpath to support facets: <sub-path returning-contents="true" value-index="true" tokenized="true" position-index="false" type="string" path="dmftmetadata//mycustomattribute"/> Creating a DFC facet definition The class DfFacetDefinition in the DFC package com. Sample facet definition The DFC query builder interface IDfQueryBuilder creates facets with the addFacet(FacetDefinition) method. Default: 10. For example. a facet is created for the attribute r_modifier. (The facet definition setMax API controls the size of the output in facets computation. The valid values for these methods are dependent on the datatype of the facet (underlying attribute). page 231.xml is equivalent to the API setMaxResultsForFacets. This setting controls the maximum number of results that are used to compute all facets in a single query.Search APIs and customization </xs:simpleType> </xs:attribute> <xs:attribute name="position-index" type="xs:boolean" use="optional" default="false"/> <xs:attribute name="tokenized" type="xs:boolean" use="optional" default="true"/> <xs:attribute name="value-index" type="xs:boolean" use="optional" default="false"/> <xs:attribute name="returning-contents" type="xs:boolean" use="optional" default="false"/> </xs:extension> </xs:complexContent> </xs:complexType> Configuring a custom attribute to support faceted search In the following example. and substitute values appropriate for your repository and instance owner: 228 EMC Documentum xPlore Version 1. The property query-facet-max-result-size in indexserverconfig. a setMaxResultsForFacets value of 50 could return the top 10 users. set up session variables.2 Administration and Development Guide . A value of -1 specifies unlimited number of facet values.documentum. • setName(String): Name of facet The IDfQueryBuilder API setMaxResultsForFacets() limits the overall number of query results that are used to compute facets. First. You can set the following optional parameters using class methods. In the following example. • setGroupBy(String): Group by • setMax(int): Maximum number of facet values for the facet.client.search represents a facet. if a user has an average of 5 documents.fc. exprSet. queryBuilder. EMC Documentum xPlore Version 1. IDfAttr. IDfSearchService m_searchService = client.DM_TIME..SEARCH_OP_LESS_EQUAL.setGroupBy("year"). Another facet definition adds the last modification date and sets some type-specific options for the date: DfFacetDefinition definitionDate = new DfFacetDefinition("r_modify_date").addSimpleAttrExpression("r_modify_date".2 Administration and Development Guide 229 . private static final String PASSWORD = "dmadmin". IDfSimpleAttrExpression.setMax(-1). Now for the facets definition that defines a facet for person who last modified the document: DfFacetDefinition definitionModifier = new DfFacetDefinition("r_modifier").addResultAttribute("r_object_id").setDateFormat(DATE_FORMAT). false.addFacetDefinition(definitionModifier). IDfAttr. IDfQueryBuilder queryBuilder = queryManager. add the selected source and desired results: queryBuilder.setIdentity(DOCBASE.newSessionManager().addResultAttribute("object_name"). private static final String USER = "dmadmin". A subpath definition in indexserverconfig. Search APIs and customization private static final String DOCBASE = "DSS". " 1980-01-01T00:00:00"). IDfQueryManager queryManager = m_searchService.getLocalClient(). Next.path="dmftmetadata//keywords"/> Keywords facet: DfFacetDefinition definitionKeywords = new DfFacetDefinition("keywords"). queryBuilder. " 2010-01-01T00:00:00"). false. exprSet. m_sessionManager. The previous code builds a query without facets.DM_TIME. IDfSessionManager m_sessionManager = client. identity).addFacetDefinition(definitionDate). queryBuilder.SEARCH_OP_GREATER_EQUAL. definitionDate. queryBuilder. Start building the root expression set by adding the result attributes: IDfExpressionSet exprSet = queryBuilder. DOCBASE). final String DATE_FORMAT = "yyyy-MM-dd’T’HH:mm:ss". PASSWORD).newQueryBuilder("dm_sysobject"). definitionDate. Get a session and instantiate the search service and query builder: IDfClient client = DfClient. false. IDfSimpleAttrExpression. false.xml defines a facet for keywords as follows: <sub-path . private static final long SEARCH_TIMEOUT = 180000. queryBuilder. DfLoginInfo identity = new DfLoginInfo(USER.addSelectedSource(DOCBASE).addSimpleAttrExpression("r_modify_date".getRootExpressionSet().newSearchService( m_sessionManager.newQueryMgr().. getName()). instantiate IDfQueryProcessor. If you call IDfQueryProcessor.2 Administration and Development Guide .println(result. Getting facet values from IDfQueryProcessor The IDfQueryProcessor method getFacets() provides facets results. System. for (IDfFacet facet : facets) { System.getDefinition().hasAttr(" object_name") ? result. List<IDfFacetValue> values = facet.Facet: " + facet. try { processor.println("processor.println("processor.out.newQueryProcessor( queryBuilder. instantiate the processor and launch the search: IDfQueryProcessor processor = m_searchService. for (int i = 0.getFacets() before all sources finish query execution. Get the non-facets results by calling getResults: IDfResultsSet results = processor.println("--.printStackTrace().out.getFacets().addFacetDefinition(definitionKeywords).size(). } For debugging.getQueryStatus(). To submit the query and process the results. The merged results conform to the facet definition of maximum number of results and sort order.getHistory() = " + processor.getValues(). the facets are merged. for (IDfFacetValue value : values) { System.getString("object_name"):"no title")).out.blockingSearch(SEARCH_TIMEOUT). } } 230 EMC Documentum xPlore Version 1.getQueryStatus().println("value = " + value).out. i < results.getHistory()).getResults(). First. which is described in the following topic.getQueryStatus() = " + processor. System.out. Note: When several repositories return facets. } Get the facets results by calling getFacets: List <IDfFacet> facets = processor.getResultAt(i). you can check the query status: System. the results can differ from final results.Search APIs and customization queryBuilder.getId("r_object_id") + " = " + (result. true). } catch (Exception e) { e.getQueryStatus()). i++) { IDfResultEntry result = results. String facet datatype A string type facet accepts the following parameters: • Set the maximum number of facet values. call setProperty(String timezone) for the facet definition. GMT+10. Values of the ORDER enum (DFC) and FacetSort field: FREQUENCY (default)| VALUE_ASCENDING | VALUE_DESCENDING | NONE Facets are returned as IDfFacetValue. • Set the maximum number of facet values. date. For example. To set the time zone in DFS. but results for additional modifiers are not returned. DFS: setMaxFacetValues(Integer maxFacetValues) Setting the time zone: Valid values for client time zone are expressed in UTC (relative to GMT). Three main facet datatypes are supported: string. for example. You can set the following parameters for each datatype in the specified DfFacetDefinition method (DFC) or FacetDefinition object (DFS). a facet for r_modifier with a maximum of two returns only two values. Results of documents modified by the first two modifiers are returned. . Search APIs and customization Facet datatypes Each facet datatype requires a different grouping strategy. • Set the sort order. DFS: setMaxFacetValues(int maxFacetValues) • Set the sort order. the most recent facets are returned first. Default: 10. month. Values of the ORDER enum: Values of the ORDER enum (DFC) and FacetSort field: FREQUENCY (default)| VALUE_ASCENDING | VALUE_DESCENDING | NONE DFC: setOrderBy(ORDER orderby) DFS: FacetSort object. week. A value of -1 specifies unlimited results. call setProperties(PropertySet set) for a FacetDefinition with a Property having a UTC value. Following is an example of the XML representation of returned facet date values: <facet name=’r_modify_date’> <elem count=’5’ value=’2000-05-04T00:00:00’> EMC Documentum xPlore Version 1. quarter. but results for additional dates are not returned. and numeric. DFC: setMax(Integer max). DFC: setOrderBy(ORDER orderby) DFS: FacetSort object. DFC and DFS: setGroupBy(String groupBy) Valid values are: day. Results of documents modified for the first two modification dates are returned. A relativeDate facet must set the order by parameter to NONE. Default: 10. To set the time zone in DFC. DFC: setMax(Integer max). year.2 Administration and Development Guide 231 . A value of -1 specifies unlimited results. Following is an example of the XML representation of returned facet string values: <facet name=’r_modifier’> <eleme count=’5’ value=’user2’/> <element count=’3’ value=’user1/> </facet> Date facet datatype A date type facet accepts the following parameters: • Set the grouping strategy. For example. When no value for max is specified. a facet for r_modification_date with a maximum of two returns only two values. relativeDate (Microsoft Outlook style). DFC: setOrderBy (ORDER orderby) DFS: FacetSort object. Default: none (treated as string). The following is an example of the XML representation of returned facet numeric values for range=0:10. Values of the ORDER enum: Values of the ORDER enum (DFC) and FacetSort field: FREQUENCY (default)| VALUE_ASCENDING | VALUE_DESCENDING | NONE. a facet in DFC for r_content_size with setMax(2) has only two values. A value of -1 specifies unlimited results. Valid values are a comma-separated list of ranges. Default: 10.100:<facet name=’r_full_content_size’> <elem count=’5’ value=’0:10’> <prop name=’lowerbound’>0</prop> <prop name=’upperbound’>10</prop> </elem> <elem count=’3’ value=’10:100’> <prop name=’lowerbound’>10</prop> <prop name=’upperbound’>100</prop> </elem> <elem count=’0’ value=’100:’> <prop name=’lowerbound’>100</prop> </elem> </facet> Creating a DFS facet definition You can use the DFS data model to create facets in a structured query. To define the range.10:100. DFC: setMax(Integer max).2 Administration and Development Guide . A range can be unbounded. 232 EMC Documentum xPlore Version 1.100:1000. call setProperty(String range) for the facet definition (DFC) or call setProperties(PropertySet set) for a FacetDefinition with a Property (DFS).Search APIs and customization <prop name=”lowerbound”>2000-05-04T00:00:00</prop> <prop name=”upperbound”>2000-05-05T00:00:00</prop> </elem> <elem count=’3’ value=’ 2000-05-03T00:00:00’> <prop name=”lowerbound”>2000-05-03T00:00:00</prop> <prop name=”upperbound”>2000-05-04T00:00:00</prop> </elem> </facet> Numeric facet datatype A numeric type facet accepts the following parameters: • Set the maximum number of facet values.10:100. A range must set the order by parameter to NONE. DFC and DFS: setGroupBy(String groupBy). Separate upper and lower bounds by a colon.1000:10000.10000:’ . • Groups results in range order. The following topics describe facet object and their place in a structured query. for example: ’0:9. For example. DFS: setMaxFacetValues(int maxFacetValues) • Set the sort order. QueryExecution.2 Administration and Development Guide 233 . with count in parentheses: Tom Terrific (3) Mighty Mouse (5) A FacetValue object can also contain a list of subfacet values and a set of custom properties. The possible sort orders include the following: FREQUENCY (default) | VALUE_ASCENDING | VALUE_DESCENDING | NONE. FacetSort is an enumeration that specifies the sort order for facet values. the name is used as the attribute. For example. If no attributes are specified. a facet on the date attribute r_modify_date has a value of a month (November). This object is like a QueryResult object. and custom property values. returned by getMaxResultsForFacets and set by setMaxResultsForFacets. xPlore computes the facet. returned by getFacetDefinitions and set by setFacetDefinition. the client application can retrieve facets asynchronously by specifying a SearchProfile EMC Documentum xPlore Version 1. See EMC Documentum Enterprise Content Services for more information on Query. • Number of query results used by xPlore to compute the facets in a query. Adding facets to a structured query Two fields in a StructuredQuery object relate to facets: • List of facet definitions. The getFacets method of the SearchService object calculates facets on the entire set of query results for a specified Query. Facet definitions must be specified when the query is first executed. FacetDefinition A FacetDefinition object contains the information used by xPlore to build facet values. Facet results A Facet object holds a list of facet values that xPlore builds. A QueryFacet object contains a list of facets that have been computed for a query as well as the query ID and QueryStatus. A facet definition can hold a subfacet definition. The facet name is required. and SearchProfile. For example. The FacetValue has a label and count for number of results in the group. subfacet. The facet has subfacet values of weeks in the specific month (Week from 11/01 to 11/08). A date facet must set the sort order to NONE. For a query on several repositories that support facets. The method has the following signature: public QueryFacet getFacets( Query query. OperationOptions options) throws SearchServiceException This method executes synchronously by default. a facet on the attribute r_modifier could have these values. StructuredQuery. A call to getFacets returns QueryResult. It is a field of the FacetDefinition object. The OperationOptions object contains an optional SearchProfile object that specifies whether the call is blocking. Search APIs and customization FacetValue A FacetValue object groups results that have attribute values in common. QueryExecution execution. Results from 150 to 300 are cached. 25. set. Results from 0 to 150 are cached. 25. QueryExecution queryExecution = new QueryExecution(0. query. 25. Search is launched to retrieve results from 150 to 175. // exec options: we don’t want to retrieve results. query. maxResultsPerSource). FacetDefinition facetDefinition = new FacetDefinition("date").setMaxFacetValues(-1).VALUE_ASCENDING). 150): Gets a page.setObjectType("dm_sysobject"). query. m_moduleName).addRepository("your_docbase"). The following results are obtained from a result set of 5000 with the specified paging parameters: • (0. Example // Get the SearchService ISearchService service = m_facade. page by page. Paging facet results Results can be retrieved from a specified index.addAttribute("r_modify_date"). and SearchProfile. If the next page is no longer in the cache.setRootExpressionSet(set). // set sort order facetDefinition. Refer to EMC Documentum Enterprise Content Services for more information on Query. One result is missing because page size is greater than maxResultsPerSource. 151. Results from 25 to 50 are cached. Also.addFacetDefinition(facetDefinition). facetDefinition. Search retrieves 150 results but only the first 25 are returned. query. QueryStatus returns a hitCount of 5000. • (150.150): Gets the first page. 0). Paging information in QueryExecution has no impact on the facets calculation.2 Administration and Development Guide . 234 EMC Documentum xPlore Version 1. FAST returned only 350 results (configurable). ExpressionSet set = new ExpressionSet(). Results from 150 to 300 are returned. Paging of results is a new feature that FAST indexing did not support. search is launched to retrieve results from 25 to 50.class.getService(ISearchService. • (150. 150): Gets a page. // group results by month facetDefinition. // request all facets facetDefinition. // Create the query StructuredQuery query = new StructuredQuery(). You can call this method after a call to execute. 150): Gets the next page. // Add a facet definition to the query: a facet on r_modify_date // attribute.setFacetSort(FacetSort. as specified in the QueryExecution object: QueryExecution(startingIndex. • (25. StructuredQuery. maxResultCount.setGroupBy("month").Search APIs and customization object as the OperationOptions parameter. we just want the // facets. null. Results from 150 to 300 are cached. using the same Query and queryId. QueryExecution.addExpression(new FullTextExpression("your_query_term")). whereas xPlore paging can support higher numbers of results. } } Tuning facets Limiting the number of facets to save index space and computation time Every facet requires a special index. add the following line to log4j.getValue() + "/" + facetValue. Limiting the number of results used to compute a facet You can limit the number of results that are used to compute an individual facet. the disk space required for the index increases. you can have 10.logger. Logging facets To turn on logging for facets. The setting depends on how many possible values a facet attribute can have.emc. Set this property in the client application that issues the query.documentum.core.fulltext. As the number of facets increases.get(0). QueryFacet queryFacet = service. for (Facet facet : facets) { for (FacetValue facetValue : facet. Disk space depends on how frequently the facet attributes are found in indexed documents. This setting varies the specificity of a facet.services. Search APIs and customization // Call getFacets method.getFacets(). // Display facet values List<Facet> facets = queryFacet.println(facetValue. The 50 documents can belong to only five users.getRepositoryStatusInfos().getFacets(query.println(status.properties in the index server WEB-INF/classes directory: log4j.facets=DEBUG EMC Documentum xPlore Version 1. The computation will stop after 50 results are obtained. As the number of facets in an individual query increase. System. new OperationOptions()). queryExecution.getValues()) { System.com. depending on whether the indexes are spread out on disk.out.getStatus()). and every query that contains facets requires computation time for the facet.getQueryStatus(). For example. a setMaxResultsForFacets value of 50 could return the top 10 users. or they can belong to one user who contributes many documents.out. If each user has an average of 5 documents.indexserver. the computation time increases.000 users but wish to return only the top 10. for the Documentum attribute r_modifier. // Can check query status: should be SUCCESS QueryStatus status = queryFacet.getCount()).2 Administration and Development Guide 235 . 0800007580000916 236 EMC Documentum xPlore Version 1.emc.dm_ftengine_config .core.CompositeFacetsProcessor" timeInMilliSecs="1249475838953"> <message ><![CDATA[Facet handler string(r_modify_date) returned 4 values.indexserver.CompositeFacetsProcessor" timeInMilliSecs="1249475838953"> <message ><![CDATA[Facets computed using 13 results.953" level="DEBUG" thread="pool-3-thread-10.]]> </message> </event> <event timestamp="2009-08-05 14:37:18. impl.emc. impl. impl.ftsearch_security_mode ..CompositeFacetsProcessor" timeInMilliSecs="1249475838953"> <message ><![CDATA[Facet handler string(r_modifier) returned 11 values.core.documentum.l.emc.]]> </message> </event> <event timestamp="2009-08-05 14:37:18.Search APIs and customization Output is like the following: <event timestamp="2009-08-05 14:37:18.core.services.fulltext.953" level="DEBUG" thread="pool-3-thread-10" logger="com.fulltext.facets.documentum. <event timestamp="2009-08-05 14:37:18.CompositeFacetsProcessor" timeInMilliSecs="1249475838953"> <message ><![CDATA[Sort facets]]></message> </event> <event timestamp="2009-08-05 14:37:18.CompositeFacetsProcessor" timeInMilliSecs="1249475838953"> <message ><![CDATA[Begin facet computation]]></message> </event> <event timestamp="2009-08-05 14:37:18.core.services. impl.facets.fulltext. 1 API> retrieve.953" level="DEBUG" thread="pool-3-thread-10" logger="com.indexserver.emc.documentum.facets.indexserver.facets.indexserver.core. impl.953" level="DEBUG" thread="pool-3-thread-10" logger="com.953" level="DEBUG" thread="pool-3-thread-10" logger="com.documentum.facets.services.fulltext.emc...c.documentum. impl.]]></message> </event> <event timestamp="2009-08-05 14:37:18...indexserver.953" level="DEBUG" thread="pool-3-thread-10" logger="com.core.953" level="DEBUG" thread="pool-3-thread-10" logger="com.services.services.2 Administration and Development Guide .services.fulltext.facets.CompositeFacetsProcessor" timeInMilliSecs="1249475838953"> <message ><![CDATA[End facet computation]]></message> </event> Troubleshooting facets A query returns no facets Check the security mode of the repository..c.documentum. Use the following IAPI command: get.indexserver.fulltext.emc. c.l. set the security mode to evaluation in xPlore.1.c.c EMC Documentum xPlore Version 1.. 0 If the command returns a 0. not the Content Server.2 Administration and Development Guide 237 .ftsearch_security_mode 1 save...c.c.dm_ftengine_config set. Search APIs and customization . Use the following IAPI command: retrieve. 0800007580000916 API> get.1 reinit. as in the example.ftsearch_security_mode .. . see Using ftintegrity. For information on enabling and configuring auditing.2 Administration and Development Guide 239 . Auditing supplies information to reports on administrative tasks or queries (enabled by default). and they are also a troubleshooting tool. Statistics on content processing and indexing are stored in the audit database. To generate Documentum reports that compare a repository to the index. Use xPlore administrator Data Management >Reports to query these statistics. describe how to use reports for troubleshooting tips. EMC Documentum xPlore Version 1. Types of reports The following types of reports are available in xPlore administrator. and search for uses of the reports. indexing. choose Diagnostic and Utilities and then click Reports. page 140. See the troubleshooting section for CPS. To run reports. Chapter 11 Using Reports This chapter contains the following topics: • About reports • Types of reports • Document processing (CPS) reports • Indexing reports • Search reports • Editing a report • Report syntax • Sample edited report • Troubleshooting reports About reports Reports provide indexing and query statistics. page 60. see Auditing collection operations. ftintegrity. Per day: totals for current month. StatusDB Queue. CPS processing. Displays error code. Index processing. Documents ingested per month Per month: totals for current year. QBS activity report by ID Find the subcribed queries that take the longest processing time or are run the most frequently. test search. and error text. domain. you can view and create audit reports. StatusDB Executor. Filter for query type: interactive. 240 EMC Documentum xPlore Version 1. format. average size.2 Administration and Development Guide . Report for each code displays the request ID. including document count. average processing latency. date and time. or all. count. Per hour: hourly totals for current day. QBS activity report by user Find users whose subscribed queries take the longest processing time. and CPS error count. Audit records for admin component If admin auditing is enabled in System > Global configuration. bytes ingested. Document processing error detail Drill down for error codes. Audit records for warmup component If search auditing is enabled in System > Global configuration. count. Content too large to index Displays format. consistency checker. StatusDB Update. metrics. Summarized by domain (Documentum repository). subscription. Index Queue. you can view and create audit reports on index and query warmup.Search APIs and customization Table 30 List of reports Report title Description Audit records for search component If search auditing is enabled in System > Global configuration. Get query text Click the query ID from the report Top N slowest queries to get the XQuery expression. and minimum size. maximum size. Index Executor. warmup. Average time of an object in each indexing stage Reports the average bytes and time for the following processing stages: CPS Queue. Document processing error summary Use first to determine the most common problems. report. you can view and create audit reports. and error text. CPS Executor. Query links display the xQuery. fetching errors. Specify the date and time range. consistency checker. Filter for query type: interactive. test search. or all. and last result count (sortable columns). number of results fetched. or document size. User activity Displays query and ingestion activity and errors for the specified user and specified period of time. the Document processing error detail report for Error code 770 (File corrupt) displays object ID. metrics (query of metrics database for reports or indexing statistics). password protection or encryption. average response time. View detailed reports for each type of processing error. subscription. Data can be exported to Microsoft Excel. Search APIs and customization Report title Description Query counts by user For each user. ftintegrity. Number of users sets last N users to display. subscription. Select Number of results to display. displays domain. Using the object ID. you can view the metadata in Content Server to determine the document owner or other relevant properties. The following types of processing errors are reported: request and fetch timeout. invalid path. Optionally. warmup. Sort by time to first result. Top N slowest queries Displays the slowest queries.2 Administration and Development Guide 241 . unsupported format. number filtered out by security. domain. consistency checker. date. specify a user to get slowest queries for a the. test search. and maximum and minimum response times. language and parts of speech detection. and error text. processing time. or most recent queries. or all. warmup. For example. time. number of hits. To get slowest queries for a user. Filter for query type: interactive. file damage. EMC Documentum xPlore Version 1. number of queries. run the report Top N slowest queries. Document processing (CPS) reports Run the Document processing error summary report to find the count for each type of problem. report. You can then locate the document in xPlore administrator by navigating to the domain and filtering the default collection for the object ID. format. report. The error count for each type is listed in descending order. metrics. Top query terms Displays most common query terms including number of queries and average number of hits. ftintegrity. Top N slowest queries Find the slowest queries by selecting Top N slowest queries.indexserver. Get query text To examine a slow or failed query by a user. Results are limited by default in Webtop to 350. Search reports Enable auditing in xPlore administrator to view query reports (enabled by default). page 272. If your indexing throughput is acceptable. sort by Number of results fetched. see Configuring the security cache. The report shows Average processing latency. Indexing reports To view indexing rate.emc. Run the report User activity to see ingestion activity and error messages for ingestion by a specific user and time period. 2.2 Administration and Development Guide .fulltext. For information on changing the security cache. For example. page 51.822. The user searched in Webtop for the string "xplore" (line breaks added here): declare option xhive:fts-analyzer-class ’ com. Examine the query text for possible problems. run the report Documents ingested per month/day/hour. Sort Top N slowest queries by Number of hits denied access by security filter to see how many underprivileged users are experiencing slow queries due to security filtering. you can increase the size of documents being indexed. The daily report covers the current month. The hourly report covers the current day.core. The following example is a slow query response time. you can determine your period of highest usage.IndexServerAnalyzer’.index.Search APIs and customization Run the report Content too large to index to see how many documents are being rejected for size. You can divide the document count into bytes processed to find out the average size of content ingested. This size does not include non-indexable content. For more information about indexing performance.core. ftcontains ( ((’xplore’) with stemming) ) )) ] order by $s descending return <dmrow>{if ($i/dmftinternal/r_object_id) then $i/dmftinternal/r_object_id 242 EMC Documentum xPlore Version 1. From the hourly report.xhive. see Indexing performance. get the query ID from Top N slowest queries and then enter the query ID into Get query text. for $i score $s in collection(’ /DSS_LH1/dsearch/Data’) /dmftdoc[( ( ( (dmftmetadata//a_is_hidden = ’false’) ) ) and ( (dmftinternal/i_all_types = ’030a0d6880000105’) ) and ( (dmftversions/iscurrent = ’true’) ) ) and ( (. The monthly report covers the current 12 months.documentum. To determine how many queries are unselective.469 bytes for 909 documents yields an average size of 3105 bytes. Query counts by user Use Query counts by user to determine which users are experiencing the slowest query response times or to see queries by a specific user. User activity Use User activity to display queries by the specified user for the specified time. Click a query link to see the xQuery.1. For a guide to the syntax in a typical report. Alternatively. Accessing the audit record The audit record is stored in the xDB database for the xPlore federation.2 Administration and Development Guide 243 . Click AuditDB and then click auditRecords. You can filter by date and domain. see Report syntax.xml. drill down to the AuditDB collection in Data Management > SystemData. Specify a unique file name and title for the report. Adding a variable Reports require certain variables. EMC Documentum xPlore Version 1. Select a report in xPlore administrator and click Save as. Reports are based on the W3C XForms standard.0/server/primary_instance/deploy/dsearchadmin. You can filter the audit record by date using xPlore administrator. Data can be exported to Microsoft Excel. The XForms processor substitutes the input value for the variable in the query. To see the new report in xPlore administrator. To view the entire audit record. page 209. page 244. Editing a report You can edit any of the xPlore reports. see Debugging queries with the xDB admin tool. Search APIs and customization else <r_object_id/>}{if ($i/dmftsecurity/ispublic) then $i/dmftsecurity/ispublic else <ispublic/>}{if ($i/dmftinternal/r_object_type) then $i/dmftinternal/r_object_type else <r_object_type/>}{if ($i/dmftmetadata/*/owner_name) then $i/dmftmetadata/*/owner_name else <owner_name/>}{if ($i/dmftvstamp/i_vstamp) then $i/dmftvstamp/i_vstamp else <i_vstamp/>}{if ($i/dmftsecurity/acl_name) then $i/dmftsecurity/acl_name else <acl_name/>}{if ($i/dmftsecurity/acl_domain) then $i/dmftsecurity/acl_domain else <acl_domain/>}<score dmfttype=’dmdouble’>{$s}</score>{xhive:highlight( $i/dmftcontents/dmftcontent/dmftcontentref)}</dmrow> Use the xDB admin tool to debug the query.war/reports. For instructions on using xhadmin. you can write a new copy of the report and save it to dsearch_home/jboss5. click somewhere else in xPlore administrator and then click Reports. >= $startTime]. xforms:bind Sets constraints for an input field.. xhtml:body Contains the xhtml markup that is rendered in a browser (the report UI) The following example highlights the user the input field startTime in the report Query Counts By User (rpt_QueryByUser. 5 for $d in distinct-values(collection(’/SystemData..war/reports. 2..<rowset>. The original report XForms are located in dsearch_home/jboss5. 3.xml)... 4 return <report ...1. The nodeset attribute specifies a path within the current XForms document to the input field.. xforms:setvalue Sets a default value for an input field.. Report syntax. Reference it within the body of the query.. • 2 Model data: Declares the input fields for start and end date and time variables. These are the key elements that you can change in a report: Table 31 Report elements Element Description xhtml:head/input Contains an element for each input field xhtml:head/query Contains the XQuery that returns report results xforms:action Contains xforms:setvalue elements... These steps are highlighted in the syntax description.0/server/DctmServer_PrimaryDsearch/deploy/dsearchadmin. model (data definition). and ess_report. 2 let $u1 := distinct-values(collection(’/SystemData/AuditDB’)// event[@component = "search". The ref attribute specifies a path within the current XForms document to the input field.>. Define the UI control and bind it to the data.. 244 EMC Documentum xPlore Version 1. you can copy the XForms file and edit it in an XML editor of your choice. Report syntax xPlore reports conform to the W3C XForms specification.. The full report is line-numbered for reference in the example (some lines deleted for readability): 1 ....Search APIs and customization 1. 3 and START_TIME[ . You can edit a report in xPlore administrator and save it with a new name.2 Administration and Development Guide . 1 <query><![CDATA[ . Alternatively. page 244. 4 </input> • 1 Defines an XForms instance.<xforms:model><xforms:instance><ess_report xmlns=""> 2 <input> 3 <startTime/><endTime/>. Declare it.. located in the same directory as the report. 3 <xforms:bind nodeset="input/startTime" constraint="seconds-from-dateTime( ..... <= $endRange]]. although it could evaluate a subset of results. <= $endRange] 7 .</xhtml:head> 5 <xhtml:body>. • 3 References the start time and end time variables and sets criteria for them in the query: as greater than or equal to the input start time and less than or equal to the input end time: and START_TIME[ ... >= $startTime] and START_TIME[. processes the returned XML elements..... The first line specifies the collection for the report: let $u1 := distinct-values(collection(’/SystemData/AuditDB’). • 6 This expression evaluates various computations such as average. • 5 This expression iterates over the rows returned by the query./endTime)"/> 4 <xforms:bind nodeset="input/startTime" type="xsd:dateTime"/>.... 6 return let $k := collection(’AuditDB’).) <= seconds-from-dateTime(... </xforms:message> 9<xforms:action ev:event="xforms-invalid"> <xforms:setvalue ref=". and minimum query response times. The syntax conforms to the XQuery specification. } </rowset></report> ]]></query></ess_report></xforms:instance> • 1 Specifies the XQuery for the report.. >= $startTime] and START_TIME[ . Search APIs and customization and START_TIME[ .2 Administration and Development Guide 245 .. 1 <xforms:action ev:event="xforms-ready"> 2 <xforms:setvalue ref="input/startTime" value="seconds-to-dateTime( seconds-from-dateTime(local-dateTime()) ... return . • 7 The response times are returned as row elements (evaluated by the XSL transform).xsl./endTime" ev:event="xforms-invalid" value=". The transform plain_table. This particular expression evaluates all results..and START_TIME[ . >= $startTime] and START_TIME[ ./startTime"/><xforms:rebuild/> </xforms:action> </xforms:input></xforms:group></xhtml:td></xhtml:tr>.24*3600)"/>....<xhtml:tr class=""> 6 <xhtml:td>Start from:</xhtml:td> <xhtml:td><xforms:group> 7 <xforms:input ref="input/startTime" width="100px" ev:event="DOMActivate"> 8<xforms:message ev:event="xforms-invalid" level="ephemeral"> The "Start from" date should be no later than the "to" date... </xforms:action>. EMC Documentum xPlore Version 1. xhtml:head/xforms:model/xforms:instance/ess_report/query • 2 let: Part of an XQuery FLWOR expression that defines variables. <= $endRange]]/USER_NAME) • 4 return report/rowset: The return is an XQuery FLWOR expression that specifies what is returned from the query. </xforms:model>. maximum. The body contains elements that conform to XForms syntax. Change the report title in metadata/title.. CDATA.. Find the nodes in a QUERY element whose TOTAL_HITS value is equal to zero to get the failed queries: let $z := collection(’AuditDB’)//event[@component = "search" and @name = "QUERY" and START_TIME[ .. xforms:message contains the message when the entry does not conform to the constraint. 4.): let $failedCnt := count($z) 7. Attributes on this element define the width and event that is fired. 5. Entries after the end date are invalid. 2. >= $startTime and .) and add your new query. using the XPath constraint seconds-from-dateTime.): <cell> { $failedCnt } </cell> 8. add the following column: <column type="integer">Failed Queries</column> 5...xml located in dsearch_home/jboss5. 6 The first table cell in this row contains the label Start From: 7. Add this line after <rowset. <= $endRange] and USER_NAME = $j and TOTAL_HITS = 0] 6. Locate the variable definition for successful queries (for $j . >= $startTime and .>let $k.0/server/DctmServer_PrimaryDsearch/deploy/dsearchadmin. Redefine the failed query variable to get a count for all users.Search APIs and customization 1. xforms:input contain elements that define the UI for this input control. Binds the form control to the XML schema datatype dateTime. after the query count cell (<cell> { $queryCnt } . After the column element whose value is Query Cnt. <= $endRange] and USER_NAME 246 EMC Documentum xPlore Version 1. Return the failed query count cell. Define a variable for the count of failed queries and add it after the variable for successful query count (let $queryCnt.. Using an XML editor...war/reports. 3.. 4.let $k . 3.1. 1.. Sets the data mode using XPath expressions. open the report rpt_QueryByUser.: let $z := collection(’AuditDB’)//event[@component = "search" and @name = " QUERY" and START_TIME[ . 9.. 6. xforms:action ev:event="xforms-invalid" defines the invalid state for the input control.. 2. This step finds failed queries. 8.2 Administration and Development Guide . to Failed Query Counts By User. xhtml:body: Defines the UI presentation in xhtml. Sample edited report This example edits the Query counts by user report to add a column for number of failed queries. Save the report with a new file name. xforms:action completes the xforms model data. The browser renders these elements. Binds the form control to the startTime variable and constrains the data that can be entered. it will time out after about one minute. Open Tools > Internet Options and choose the Security tab. for debugging. you get a stack trace that identifies the line number of the error. Troubleshooting reports If you update Internet Explorer or turn on enforced security. The result is like the following: Figure 18 Customized report for query count If your query has a syntax error. after <cell> { $queryCnt } </cell>: <cell> { $failedCnt } </cell> 10. Click Trusted sites and then click Sites. You can run the same query in the xDB admin tool. Search APIs and customization and TOTAL_HITS = 0] 9. You can copy the text of your report into an XML editor that displays line numbers. If the query runs slowly. Add EMC Documentum xPlore Version 1. reports no longer contain content. ave and run the report.2 Administration and Development Guide 247 . Add the total count cell to this second rowset. 2 Administration and Development Guide . Set the security level for the Trusted sites zone by clicking Custom level. 248 EMC Documentum xPlore Version 1. Reset the level to Medium-Low.Search APIs and customization the xPlore administrator URL to the Trusted sites list. index.emc. Chapter 12 Logging This chapter contains the following topics: • Configuring logging • CPS logging Configuring logging Note: Logging can slow the system and consume disk space.emc.properties.documentum. You cannot configure logging for these loggers in log4j.xml.core.xml.common. Levels set in xPlore administrator have precedence over log levels in log4j.fulltext. • dsearchindex Logs indexing operations from the package com. To configure logging for other classes or packages.{admin|index|search|common} are written to indexserverconfig. Log levels can be set for indexing. choose System Overview in the left panel. which is in the following directory of the xPlore primary instance: deploy/dsearch.properties. In a production environment. Choose a log and set the tracing level for dsearch. To set logging for a service.core. Logging can be configured for each service in xPlore administrator. search.emc. Choose Global Configuration and then choose the Logging Configuration tab to configure logging.2 Administration and Development Guide 249 .properties.war/WEB-INF/classes. see CPS logging.fulltext.core. If you define more precise logger paths in log4j. You must restart the xPlore instances to get logging changes from log4j.properties.documentum. For CPS logging. and xPlore administrator.emc.admin. • dsearchsearch EMC Documentum xPlore Version 1. • dsearchdefault Sets the default log level from the package com.core. they have precedence over settings in xPlore administrator and indexserverconfig.fulltext. edit log4j.documentum.fulltext.documentum.properties. page 252.log: • dsearchadmin Logs xPlore administrator operations from the package com. Logging levels for the four loggers com. the system must run with minimal logging. an open source module for logging.fulltext.log4j. and xDB logs in xPlore administrator.documentum. You can log the activities of specific packages in log4j. log4j has a set of logging configuration options based on severity level.emc. so that TRACE displays more than DEBUG. search.properties. The following configuration logs messages to an XML file. Click Download All Log Files to get links for each log file.DSEARCH.log.log log4j.emc.appender. FATAL logs only the most severe errors. dsearch. cps. xDB and Lucene logging xDB and Lucene are logged in xdb. • TRACE • DEBUG • INFO Note: Some xPlore APIs logged at the INFO level may appear in the logs even when a more restrictive log level is configured.log. which is in the following directory of the xPlore primary instance: 250 EMC Documentum xPlore Version 1. which is located in the primary instance dsearch_home/server/DCTMServer_PrimaryDsearch/logs.MaxBackupIndex=100 log4j.Search APIs and customization Logs search operations from the package com.DSEARCH=org.File=$H(SERVER_INSTANCE_DIR)/logs/dsearch.documentum. or xdb to view the last part of the log. Indexing and search messages are logged to dsearch.appender. CPS.core.core.log. The following log levels are available.DSEARCH. ESSXMLLayout log4j. Click the tab for dsearch. xDB. Line breaks are shown here for readability but do not exist in the properties file: log4j. Logging for xDB and Lucene operations is configured in log4j. You can configure the appenders for Dsearch (xPlore APIs).layout=com.Encoding=UTF-8 Viewing logs You can view indexing.2 Administration and Development Guide .apache. Choose an instance in the tree and click Logging.DSEARCH.DSEARCH.MaxFileSize=10MB log4j.RollingFileAppender log4j. CPS. which displays more than INFO.search.fulltext.properties.appender. • WARN • ERROR • FATAL Log locations xPlore uses Apache log4j.DSEARCH. cps_daemon.appender.utils.appender. and CPS daemon in log4j.appender.properties. Levels are shown in increasing severity and decreasing amounts of information. the search service logs the following information for all log levels: • Start of query execution including the query statement • Total results processed • Total query time including query execution and result fetching Note: More query information is logged when native xPlore security (not Content Server security) is enabled.2 Administration and Development Guide 251 .> <message ><![CDATA[QueryID=PrimaryDsearch$d95fd870-9639-42ad-8da2-167958017f4d. The log message has the following form: <date-time><Tracing Level><Class Name><Thread ID><Query ID>[ <main query options in concise form>]<total hits><execution time in millseconds> The following examples from dsearch.war/WEB-INF/classes.200) is running]]></message></event> <event timestamp="2010-06-07 21:54:26.xhive.. To log information on Lucene merges.> <message ><![CDATA[QueryID=PrimaryDsearch$d95fd870-9639-42ad-8da2-167958017f4d execution time=234 Milliseconds]]></message></event> Log layout Two formats are supported for logs: EMC Documentum xPlore Version 1.merging to DEBUG.090" . ftcontains ’strange’] order by $s descending return <d> {$i/dmftmetadata//r_object_id} { $i/dmftmetadata//object_name } { $i/dmftmetadata//r_modifier } </d> return subsequence($j. query-locale=en.log show a query. You can set the log level independently for administration. INFO. DEBUG. Levels in decreasing amount of verbosity: TRACE.> <message ><![CDATA[QueryID=PrimaryDsearch$d95fd870-9639-42ad-8da2-167958017f4d.. For each query..apache.xhive and log4j.index. and default. indexing.com. set the log level of log4j.324" .logger. Dirty. Open Services in the tree.org.com. total results processed. A single line is logged for each batch of query results returned by the xPlore server.logger. and total query time: <event timestamp="2010-06-07 21:54:26. clean. and FATAL.. You can configure the log4j..logger. and final merges are logged with their respective scheduling intervals. Query logging The xPlore search service logs queries.xhive (for xDB) and log4j.lucene loggers..090" . WARN (default).query-string=let $j:= for $i score $s in /dmftdoc [.1. ERROR.logger. query thread started on xhive library=DSS_LH1/dsearch/Data//default]]> </message></event> <event timestamp="2010-06-07 21:54:26. Set the log level in xPlore administrator.multipath. search. Search APIs and customization deploy/dsearch. expand and select Logging.com. and click Configuration. IndexCollectionConfig" elapsedTime="1231295981094"> <message><![CDATA[[CONF_NO_DEFAULT_LIBRARY] There is no default library found for collection.ESSXMLLayout.<appenderName>.layout.<appenderName>. A CPS instance that is embedded in an xPlore instance (installed with xPlore. make sure each file appender in log4j.]]></message> </event> CPS logging CPS does not use the xPlore logging framework. [library1]. each instance has its own web application and log4j. • XML layout is used for dsearch.apache.log.properties points to a unique file path.094" level="WARN" thread="main" logger="com.PatternLayout log4j. A standalone CPS instance uses log4j.appender.properties file.appender.2 Administration and Development Guide . is assumed as default.impl.documentum.<appenderName>.log log4j.apache.<appenderName>.properties file.appender.log4j.log4j.appender.core.emc. not separately) uses the log4j.<appenderName>=org.<appenderName>.RollingFileAppender log4j.Encoding=UTF-8 Sample XML log entry (line breaks inserted for readability): <event timestamp="2009-01-06 18:39:41. xmlfile. The log4j configuration is like the following.layout.documentum. in the WEB-INF/classes directory.MaxFileSize=10MB log4j.appender.log. For conversion pattern information see the Apache log4j documentation.File=$H(SERVER_INSTANCE_DIR)/logs/cps.appender.<appenderName>. Specify your preferred text layout as the value of log4j.appender.ConversionPattern=%d{ISO8601} %5p [MANAGER-%c{1}-(%t)] %m%n log4j. 252 EMC Documentum xPlore Version 1.emc.log.properties file in WEB-INF/classes of the dsearch web application.component_name. The first library in the list.) The default layout is XML.apache. [knowledgeworker]. (This file is located in the indexserver war file.Search APIs and customization • Text layout is used for cps.MaxBackupIndex=100 log4j.log4j.config.core. Substitute your preferred values for file size and log location: log4j. To avoid one instance log overwriting another.layout=org.appender. Substitute <appenderName> with your appender.properties in the CPS web application. If you have installed more than one CPS instance on the same host. The appender that generates an XML log is com.log and cps_daemon.core.log and xdb.<appenderName>.indexserver. Use org.layout.PatternLayout as the value of log4j. in the WEB-INF/classes directory.fulltext.fulltext.utils.appender.ConversionPattern in the log4j. 2. 3. The following diagram shows indexing customization points. see Enabling logging in a client application. Add the distribution jar files to your build path. 4.EMC.properties.com). • LICENSE: The redistribution license for the Eclipse online help framework used by xPlore. For information on sample-log4j-for-essclient. page 260 and Tracing.2 Administration and Development Guide 253 .properties. Chapter 13 Setting up a Customization Environ- ment This chapter contains the following topics: • Setting up the xPlore SDK • Customization points • Adding custom classes • Tracing • Enabling logging in a client application • Handling a NoClassDef exception Setting up the xPlore SDK 1. page 256. Download the xPlore software development kit (SDK) from the Powerlink website (http://powerlink. The SDK contains the following content: • README: Information about the SDK file. • lib: Java libraries that your application needs. The configuration parameters are described within the file dsearchclientfull. Customization points You can customize indexing and searching at several points in the xPlore stack. • doc: This guide and javadocs. • samples: Examples of a FAST thesaurus API and a natural language processing UIMA module. • conf: Sample configuration files. EMC Documentum xPlore Version 1. • dist: xPlore distribution APIs and classes needed for customizations. The following information refers to customizations that are supported in a Documentum environment. Expand the download file to your development host. Add the lib directory to your project or system classpath. You can use a similar TBO to join two or more Documentum objects that are related. page 127.Setting up a Customization Environment Figure 19 Indexing customization points 1. See Custom content filters. 2. page 70. 3. page 73. create a BOF module that pre-filters content before indexing. 254 EMC Documentum xPlore Version 1.2 Administration and Development Guide . Create a custom routing class that routes content to a specific collection based on your enterprise criteria. See Injecting data and supporting joins. Using DFC. either metadata or content. Create a TBO that injects data from outside a Documentum repository. See Creating a custom routing class. The following diagram shows query customization points. FTDQL queries are passed to xPlore. Implement the DFC search service. page 225. create XQueries using IDfXQuery. 7. Using WDK. See Facets. Using DFC. Target a specific collection in a query using DFC or DFS APIs. page 223. page 211.2 Administration and Development Guide 255 . Queries with the NOFTDQL hint or which do not conform to FTDQL criteria are not passed to xPlore. Create and customize facets to organize search results in xPlore. EMC Documentum xPlore Version 1. create NOFTDQL queries or apply DQL hints (not recommended except for special cases). See DQL Processing. 6. Setting up a Customization Environment Figure 20 Query customization points 1. See Routing a query to a specific collection. page 212. xPlore processes the expression directly. Implement the DFC interface IDfQuery and the DFS query service. See Building a DFC XQuery. 5. DQL is evaluated in the Content Server. page 208. See Documentum System Search Development Guide. Using DFC. See Documentum System Search Development Guide. which generates XQuery expressions. See Building a query with the DFS search service. which generates an XQuery expression. implement StructuredQuery. modify Webtop queries. Using DFC or DFS. 2. 3. 4. Using DFS. xPlore processes the expression directly. modify Webtop search and results UI. The tracing facility checks for the existence of a log4j and appender in the log4j. Use xPlore APIs to create an XQuery for an XQuery client.xm in dsearch_home/config using an XML-compliant editor. Your subdirectory path under WEB-INF/classes must match the fully qualified routing class name. When a logger and appender are not found.xml. substituting your fully qualified class name. xPlore creates a logger named com. the tracing facility uses log4j API to log the tracing information.log are written to the Java IO temp directory (where XXX is a timestamp generated by the tracing mechanism).2 Administration and Development Guide .XXX. a detailed Java method call stack is logged in one file.trace. See Building a query using xPlore APIs. Enabling tracing Enable or disable tracing in xPlore administrator: Expand an instance and choose Tracing.fulltext. To route documents from all domains using a custom class.emc. Adding custom classes Custom classes are registered in indexserverconfig. Tracing xPlore tracing provides configuration settings for various formats of tracing information. When you enable tracing. 1. which is located in the conf directory of the SDK. Custom routing classes are supported in this version of xPlore. The configuration parameters are described within the file dsearchclientfull.war WEB-INF/classes directory. <customization-config> <collection-routing class-name="custom_routing_class_name"/> </customization-config> 4.core. 5.properties. starting with the primary instance. Edit indexserverconfig.properties file.IndexServerTrace. Stop all instances in the xPlore system. between the elements system-metrics-service and admin-config. Use the file dsearchclientfull. You can trace individual threads or methods. 3. Tracing does not require a restart. with parameters and return values.properties. The xPlore classes are instrumented using AspectJ (tracing aspect). Restart the xPlore instances. Place your class in the indexagent. edit this configuration file. When tracing is enabled and initialized.Setting up a Customization Environment 8. 256 EMC Documentum xPlore Version 1. 2. you can identify the methods that are called. page 215. From that file.utils. Tracing files named ESSTrace. Add the following element to the root element index-server-configuration. If not specified. Default: 1 that log4j will keep. Value should be current working directory . If false. default: false print the exception stack when a method call results in an exception. the Any legal file name. You can configure tracing for specific classes and methods. output file-creation-mode Specifies whether to create one single-file | file-per-thread single tracing file or one file per thread. location. print-exception-stack Specifies whether tracing should true | false . The following example in log4j.logger. If true.io.2 Administration and Development Guide 257 .properties file. and format of the log file for the logger and its appender in indexserverconfig. Table 32 Tracing configuration elements in tracing-config/tracing Element and attribute Description Values tracing enable Turns tracing on or off true | false . default: false tracing mode Determines whether the tracing standard | compact. the trace entries will appear in the order of method entrance.fulltext. default: should be placed.xml or in the log4j.common = DEBUG The following table describes the child elements of <tracing-config> in indexserverconfig. records the entire stack trace after the method exit. EMC Documentum xPlore Version 1. writable.properties file is in the SDK conf directory. Setting up a Customization Environment Configuring tracing You can configure the name.properties debugs a specific package: log4j. Default: tracing infrastructure names log xPloreTrace files <file_prefix>.documentum. Default: records method entry and exit standard on separate lines as they occur (standard) or whether everything (the method arguments and return value) is recorded a single line (compact).client. then tracing will use the directory specified by the System property java. output max-file-size Maximum size that the log file can Any string that log4j accepts for reach before it rolls over MaxFileSize. The oldest backup files are deleted first. The log4j configuration takes precedence. tracing verbosity Amount of information saved. logs only the name and message of the exception. In compact mode.emc.tmpdir. A sample log4j.timestamp. output file-prefix In standard file creation mode.log. Default is C:\TEMP (Windows) or "/tmp" or "/var/tmp" (UNIX).com. standard | verbose output dir Directory in which the trace file Any valid directory. default: 100MB output max-backup-index Specifies the number of backups Positive integer.xml.core. [. and methods to trace. date-output format Specifies date format if timing-style Format string conforms to the is set to date. all threads are the filter. The filter is a regular traced expression (see the Javadoc for the class java.SimpleDateFormat date-output column-width If date format is specified. date-output timing-style Specifies the units for time nanoseconds | milliseconds | recording of call entrance or milliseconds_from_start | seconds | exit (standard mode) or method date . tracing will be turned on for that thread. The property value is one or more string expressions that identify what is traced. In compact mode.Pattern for syntax). default: -1 (unlimited) traced tracing-filters/ method-name* Repeating element that specifies Default not set. those threads whose names match Default: Not set.text.[method_name_segment][*]0] Trace log format The format of each trace item entry is the following. All calls made within the context of that method will be traced. tracing-filters/ thread-name Case-sensitive repeating element Regular expression conforming to that filters the trace output to only the regular expression language. Tracing continues for that thread until the method that was matched is exited. * The method-name filter identifies any combinations of packages. with line breaks inserted for readability: time-stamp [method-duration] [thread-name] [entry_exit_designation] stack-depth-indicator qualified-class-name@object-identity-hash-code. method(method-arguments) [==>return-value|exception] Key: 258 EMC Documentum xPlore Version 1. classes. all methods are methods to trace. a value Positive integer for date column width is required.2 Administration and Development Guide .Setting up a Customization Environment Element and attribute Description Values max-stack-depth Limits the depth of calls that are Integer. default: milliseconds entrance (compact mode).regex. Syntax with asterisk as wild card: ([qualified_classname_segment][*]|*). the second column displays the duration of the method call. When a thread traced enters a method which matches one of the filters. syntax supported by the Java class java. Thread-[4-6] matches the threads named Thread-4. For example. Thread-5 or Thread-6.util. • [entry_exit_designation] One of the following: – ENTER: The entry represents a method entry.documentum. the collection name.core. 1263340384815 [http-0. If the entry records an exception thrown by a method. [Lcom. – EXIT: The entry represents a method exit.0. inspect the trace file ESSTrace.core.1319574712514.controller. <init> ==> <void> Reading trace output To troubleshoot problems. – !EXC!: The entry represents a call that results in an exception.indexserver.com.core.2 Administration and Development Guide 259 . Setting up a Customization Environment • [method-duration] Appears only if tracing-config/tracing[@mode="compact"]. if any.null) In the following snippet."superhot".com.. The following sample from a trace file shows the XML generated by xPlore for a test index document..true) 1218670851046 [http-8080-1] [EXIT] .documentum.admin.fulltext.indexserver. Output for tracing-config/tracing[@enable="true"] 1218670851046 [http-8080-1] [ENTER] .core.testIndexDocument(" <dmftdoc dmftkey="In this VM_txt1263340384408" ess_tokens=":In this VM_txt1263340384408:dmftdoc:1"> <dmftkey>In this VM_txt1263340384408</dmftkey> <dmftmetadata> <dm_document> <r_object_id dmfttype="dmid">In this VM_txt1263340384408</r_object_id> <object_name dmfttype="dmstring">In this VM.core." file:///C:/DOCUME~1/ADMINI~1.txt".<init>( "dmftcontentref".CPSElement@1f1df6b. For example.fulltext.emc.@82efed) Now the document is indexed: EMC Documentum xPlore Version 1.core.indexserver..documentum.log is found on Windows 2008 at C:\TEMP.emc. the trace data can be sorted by tools like awk or Microsoft Excel. </dmftdoc>". Because trace output is vertically aligned.."DSS_LH1".0.emc. The final line shows the repository or source.fulltext.Node$ContainerInfo@12bf560. <init>("http://localhost:8080".com.cps.core.cps. The trace data are aligned vertically.0-9300-1] [ENTER] . • [==>return-value|exception] Recorded if the mode is compact or the entry records a method exit. the return value is the exception name and message. and category name.fulltext. a CPS worker thread processes the same document: 1263340387580[CPSWorkerThread-1] [ENTER] .EMC/LOCALS~1/Temp/3/In this VM.com.true.emc.fulltext. and spaces separate the fields.log in the Java IO temp directory (where XXX is a number generated by the tracing mechanism).CPSOperation..documentum.ESSAdminWebService@ 124502c.txt</object_name> <r_object_type dmfttype="dmstring">dm_document</r_object_type>.documentum.indexserver.Node$ContainerInfo@12bf560.indexserver.emc.XXX. the file ESSTrace... source repository.com.2 Administration and Development Guide .txt</object_name>. ftcontains ’FileZilla’] return <d> {$i/dmftmetadata//r_object_id} { $i//object_name } { $i//r_modifier } </d> is running" The execution results are then logged in dsearch..documentum.IndexServerAnalyzer.admin. A search on a string "FileZilla” in the document renders this XQuery expression..xhive.getFullMessage ==> "QueryID=PrimaryDsearch$ba06863d-7713-4e0e-8569-2071cff78f71.properties . []}.query-locale= en. You can find all trace statements for the query ID.properties and put it in your client application classpath. you search for the query ID "PrimaryDsearch$ba06863d-7713-4e0e-8569-2071cff78f71" in the trace log. Enabling logging in a client application Logging for xPlore operations is described in Logging.{ []={[]. and collection in the executeQuery method: 1263340474627 [[email protected]."superhot") A query for a string in the file is executed: 1263340474643 [RMI TCP Connection(116)-10. Use the file samplelog4j.properties.190" level="INFO" thread="pool-10-thread-10" logger="com.documentum.fulltext.fulltext. language.emc. Refer to the Apache log4j site for more information on log4j.query-string=for $i in /dmftdoc[. which is located in the conf directory of the SDK. page 249 .log: 1263340475190[pool-10-thread-10] [EXIT] .fulltext." DSS_LH1". You can enable logging for xPlore client APIs and set the logging parameters in your client application.documentum. execution time=547 Milliseconds]]></message> </event>" You can find all trace statements for the document being indexed by searching on the dmftkey value.0.53] [EXIT] .SearchMessages.fulltext.0.fulltext. indexserver.core.log.executeQuery("for $i in /dmftdoc[. ftcontains ’ FileZilla’] return <d> {$i/dmftmetadata//r_object_id} { $i//object_name } { $i//r_modifier } </d>"..search.emc. " In this VM_txt1263340384408".core.8.0-9300-1] [ENTER] ."en".utils. In this example.emc.format ==> " <event timestamp="2010-01-12 23:54:35..core. "null".indexserver. Save the file as log4j. [child::dmftkey[0]]={[child::dmftkey[0]]. you search for "In this VM_txt1263340384408" in the trace log. [email protected]. 260 EMC Documentum xPlore Version 1.core.Setting up a Customization Environment 1263340395783 [IndexWorkerThread-1] [ENTER] com.controller.documentum.documentum.index.addElement( <object_name dmfttype="dmstring">In this VM.core.search" timeInMilliSecs="1263340475190"> <message ><![CDATA[QueryID=PrimaryDsearch$ba06863d-7713-4e0e-8569-2071cff78f71. In this example.8.emc.indexserver. . java:41) EMC Documentum xPlore Version 1.admin.NoClassDefFoundError: com/emc/documentum/fs/rt/ServiceException at com.java.api.getAdminService( FtAdminFactory.FtAdminFactory...2 Administration and Development Guide 261 .fulltext.client. Setting up a Customization Environment Handling a NoClassDef exception If you see the following Java exception.emc.lang.documentum. you have not included all of the libraries (jar files) in the SDK dist and lib directories: .core. . Add remote CPS or more JVM memory 3. See the system planning topic in Documentum xPlore Installation Guide and the xPlore Sizing Guide. The following diagram shows ingestion scaling. do the following: 1. As you increase the number of documents in your system. Add memory.2 Administration and Development Guide 263 . Increase the number of collections for ingestion specificity. disk. EMC Documentum xPlore Version 1. 4. Chapter 14 Performance and Disk Space This chapter contains the following topics: • Planning for performance • Improving search performance with time-based collections • Disk space and storage • System sizing for performance • Measuring performance • Tuning the system • Documentum index agent performance • Indexing performance • Search performance Planning for performance Plan your system sizing to match your performance and availability requirements. Add xPlore instances on the same or different hosts to handle your last scaling needs. This information helps you plan for the number of hosts and storage. or the rate at which documents are added. or CPU 2. Performance and Disk Space Figure 21 Scaling ingestion throughput Use the rough guidelines in the following diagram to help you plan scaling of search.2 Administration and Development Guide . 264 EMC Documentum xPlore Version 1. The order of adding resources is the same as for ingestion scaling. You can migrate a limited set of documents using time-based DQL in the index agent UI. or a custom date attribute. If most of your documents are not changed after a specific time period. or customizing the DFC query builder. Base your migration on creation date. modification date.r_creation_date. Route using a custom routing class. so that only recent documents are indexed.2 Administration and Development Guide 265 . To determine whether a high percentage of your documents is not touched after a specific time period. Performance and Disk Space Figure 23 Scaling number of users or query complexity in search Improving search performance with time-based collections You can plan for time-based collections.r_creation_date. Use the following DQL query to determine the number of documents modified and accessed in the past two years (change DQL to meet your requirements): select count(+) from dm_sysobject where datediff(year. use two DQL queries to compare results: 1. Use the following DQL query to determine the number of documents in the repository: EMC Documentum xPlore Version 1. you can migrate data to collections.r_modify_date)<2 2.r_access_date)<2 and datediff(year. index agent configuration. If the number is high. the status DB is not purged. stored to the log. through inserts and facet and security lookup.8. The following procedures limit the space consumed by xPlore: • Status database Purge the status database when the xPlore primary instance starts up. from log. • Lemmas 266 EMC Documentum xPlore Version 1. xDB transaction (redo) Stores transaction Updates areas in xDB Sometimes provides log information. page 117.Performance and Disk Space select count(*) from dm_sysobject 3. As the index grows. xPlore requires disk space for the following components. For information on viewing and updating this file. summary. for example. Table 33 How xPlore uses disk space Component Space use Indexing Search xDB DFTXML representation Next free space consumed Random access retrieval of document content and by disk blocks for batches of particular elements and metadata. you need twice the final index space. See Managing the status database. For example. the disk space must grow correspondingly. page 42. for temporary Lucene merges. (80%. metrics. For details on extraction settings. see Configuring text extraction.2 Administration and Development Guide . Divide the results of step 1 by the results of step 2. you need an additional 120 GB for merges and optimizations. page 38 • Saved tokens If you have specified save-tokens for summary processing. Lucene Stores an index of content Information is updated Inverted index lookup. snapshot during retrieval. Allocate twice the final index size for merges. Tokens are not saved for larger documents. audit. in addition to the index space itself. Lucene temporary Used for Lucene updates Uncommitted data is None working area of non-transactional data. Set the maximum size of tokens for the document as the value of the attribute token-size. xDB and Lucene require most of the xPlore space. edit indexserverconfig. 0. most documents were modified and accessed in the past two years.xml to limit the size of tokens that are saved. By default. in this example) Disk space and storage Managing xPlore disk space xPlore provides a sizing calculator for disk space. see Modifying indexserverconfig. of XML files. and metadata. As a rule of thumb. and Document ACLs and groups. merges. if the index size after migration is 60 GB. Set the maximum size of the element content in bytes as the value of the attribute extract-text-size-less-than.xml. these 10000 documents have an indexed footprint of 286 MB. Tuning xDB properties for disk space You can set the following property in xdb. Your representative sample was 20% of the indexable content. 1.collection. see Modifying indexserverconfig. page 87. For information on viewing and updating this file. For example./data 5. By default they are not saved. for example: select avg(r_full_content_size). After export and indexing.r_full_content_size from dm_sysobject where r_full_content_size >(1792855 -1000) and r_full_content_size >(1792855 +1000) and a_content_type = ’zip’ enable (return_top 1000) 3. If you have not installed a Documentum indexing server. Perform a query to return 1000 documents in each format. Make the plus/minus value a small percentage of the average size. Determine the size on disk of the dbfile and lucene-index directories in dsearch_home. clean xPlore install.properties. For example: select r_object_id. If not specified. Export these documents and index them into new. Edit indexserverconfig. which is located in the directory WEB-INF/classes of the primary instance. indexing rebuild performance If you save indexing tokens for faster index rebuilding. Thus your calculated index footprint is 5 x sample_footprint=1. grouped by a_content_type. See Configuring lemmatization. page 42. they consume disk space. lucene-index 593 MB).43 GB (dbfile 873 MB.tmpdir is used. Performance and Disk Space You can turn off alternative lemmatization support. • TEMP_PATH Temporary path for Lucene index.property "save-tokens" to true for a collection.io. The Content Server footprint is approximately 12 GB. Disk space vs.xml and set domain. use the following procedure to estimate index size. You must calculate the average size for your environment. to save space. you can add it. Extrapolate to your production size. You get a sample of 1000 documents of each format in the range of 190 to 210 KB. r_full_content_size greater than (average less some value) and less than (average plus some value). EMC Documentum xPlore Version 1. you have ten indexable formats with an average size of 270 KB from a repository containing 50000 documents.xml. Estimating index size (Documentum environments) The average size of indexable content within a document varies from one document type to another and from one enterprise to another. Perform a query to find the average size of documents. that is.a_content_type from dm_sysobject group by a_content_type order by 1 desc 2. The easiest estimate is to use the disk space that was required for a Documentum indexing server with FAST. 4. or turn off lemmatization entirely. Specify the average size range.properties. If this property is not listed. the current system property java.2 Administration and Development Guide 267 . • Size for high availability and disaster recovery requirements. • Include sizing for changes to existing documents: – A modification to a document requires the same CPU for processing as a new document. Configure the storage location for a collection in xPlore administrator. You can also add new storage locations through xPlore administrator. unless the FAST installation was very undersized or you expect usage to change. Storage types and locations Table 34 Comparison of storage types performance Function SAN NAS local disk iSCSI CFS Used for Common Common Common Rare Rare Content Server (content) Network Fiber Ethernet Local Ethernet Fiber Performance Best Slower than Good until I/O Slower than Almost as fast SAN. – A versioned document requires the same (additional) space as the original version. page 137. See Changing collection properties. Requires Drives already Requires Requires Drives already instance network shared shared network shared network shared shared drives drives drives System sizing for performance You can size several components of an xPlore system for performance requirements: • CPU capacity • I/O capacity (the number of disks that can write data simultaneously) • Memory for temporary indexing usage Sizing for migration from FAST When you compare sizing of the FAST indexing system to xPlore. SAN or NAS. 268 EMC Documentum xPlore Version 1. improved as SAN with 10GE with 10GE High Requires cluster Provides shared Requires Requires cluster Provides shared availability technology drives for server complete dual technology drives for server takeover system takeover xPlore multi.Performance and Disk Space Adding storage The data store locations for xDB libraries are configurable. • Use VMware-based deployments. use the following guidelines: • Size with the same allocations used for FAST. which were not supported for FAST. The xDB data stores and indexes can reside on a separate data store.2 Administration and Development Guide . improved limit reached SAN. run the xPlore configuration script and choose Create Content Processing Service Only. CPS also processes queries. You should have at least one CPS instance for each Documentum repository. change the value of query-result-cache-size in search service configuration and restart the search service. The remote instance adds overhead for the xPlore system. such as document processing errors. use the following XQuery in xPlore administrator: for $i in collection(’/SystemData/MetricsDB/PrimaryDsearch’) /metrics/record/Ingest[TypeOfRec=’Ingest’]/Errors/ErrorItem EMC Documentum xPlore Version 1. respectively). add an instance for each collection. You can add CPS instances either on the same host as the primary instance or on additional hosts (vertical and horizontal scaling. If you have process documents for multiple collections. Select an xPlore instance and then choose Indexing Service or Search Service to see the metric. Measuring performance The following metrics are recorded in the metrics database. latency Total number of documents Indexing Service Indexing runs out of disk space indexed (or bytes) Formats Content Processing Service Some formats are not indexable Languages Content Processing Service A language was not properly identified Error count per collection Content Processing Service Finding collection where errors occurred Number of new documents and Content Processing Service updates Search response time Search Service Query timeouts or slow query response To get a detailed message and count of errors. View statistics in xPlore administrator to help identify specific performance problems. Performance and Disk Space Add processing instances CPS processing of documents is typically the bottleneck in ingestion. Table 35 Metrics mapped to performance problems Metric Service Problem Ingestion throughput: Bytes and Indexing Service Slow document indexing docs indexed per second. A remote CPS instance does not perform as well as a CPS instance on an indexing instance. and ingestion rate. Sizing for search performance You can size several components of an xPlore system for search performance requirements: • CPU capacity • Memory for query caches Using xPlore administrator. content too large.2 Administration and Development Guide 269 . To add CPS instances. Some metrics are available through reports. response throughputs time. • Lucene working memory Used to process queries. You can turn off diacritics indexing to improve ingestion and query performance. Excluding xPlore files from virus scanners Performance of both indexing and search can be degraded during virus scanning. page 87. Increasing the JVM memory usually does not affect performance. Exclude xPlore directories. Has largest impact on Lucene index performance. See Configuring lemmatization. xDB data.properties. see Modifying indexserverconfig. A word like "swim" is indexed as more than one part of speech ("swim" and "swimming") is more likely to be found on search. 270 EMC Documentum xPlore Version 1. especially the dsearch_home/data directory.xml. The last three are part of the xPlore instance memory and have a fixed size: • OS buffer cache Holds temporary files. See Handling special characters. Back up the xPlore federation after you change this file. page 42. and then restart all instances. a word changes meaning depending on a diacritic.Performance and Disk Space return string(<R><Error>>{$i/Error}</Error><Space>" " </Space> <Count>>{$i/ErrorCnt}</Count></R> To get the total number of errors. Lucene working memory is consumed from the host JVM process. For information on viewing and updating this file. Tuning memory pools xPlore uses four memory caches. Increase for higher query rates: Change the value of the property xhive-cache-pages in indexserver-bootstrap. In some languages. Excluding diacritics and alternative lemmas Diacritics are not removed during indexing and queries. Alternate lemmas are also indexed. use the following XQuery in xPlore administrator: sum(for $i in collection(’/SystemData/MetricsDB/PrimaryDsearch’)/metrics/record/Ingest [TypeOfRec=’Ingest’]/Errors/ErrorItem/ErrorCnt return $i) Tuning the system System tuning requires editing of indexserverconfig. • xDB buffer cache Stores XML file blocks for ingestion and query.xml.2 Administration and Development Guide . and Lucene index structures. located in dsearch_home/config. page 90. You can turn off alternative lemmas to improve ingestion and query performance. Compressed content is about 30% of submitted XML content. set compression to false for subpath indexes in indexserverconfig. For example. These storage options do not have equal performance. See Configuring text extraction. The compress element in indexserverconfig. SAN.xml. – More memory is available to index large documents. Tuning virtual environments VMware deployments require more instances than physical deployments. page 42. a 32–bit VM performs better. 64–bit vs. VMware is limited to eight cores. – 64–bit supports higher ingestion and query rates. Sizing the disk I/O subsystem xPlore supports local disk. Compression can slow the ingestion rate by 10-20% when I/O capacity is constrained. EMC Documentum xPlore Version 1. Using compression Indexes can be compressed to enhance performance. change the value of query-result-cache-size in search service configuration and restart the search service. If ingestion starts fast and gets progressively slower. For example.2 Administration and Development Guide 271 . For information on viewing and updating this file. 32–bit64-bit operating systems have advantages and disadvantages in an xPlore installation: • Advantages – More memory is used to cache index structures for faster query access. Jumbo frame support is helpful as is higher bandwidth. see Modifying indexserverconfig. • Disadvantages – Per-object memory space is higher. Compression uses more I/O memory. NAS devices send more data and packets between the host and subsystem. Performance and Disk Space • xPlore caches Temporary cache to buffer results. – Garbage collection activity limits the size of the 64–bit VM .xml.xml specifies which elements in the ingested document have content compression to save storage space. Using xPlore administrator. and NAS storage. If memory is low. page 117. Performance and Disk Space Documentum index agent performance Index agent settings The parameters described in this section can affect index agent performance. Factors in indexing rate The following factors affect indexing rate: • The complexity of documents 272 EMC Documentum xPlore Version 1. • exporter.xml) /exporter_queue_threshold (dm_ftindex_agent_config) Internal queue of objects submitted for indexing • indexer.xml) / indexer_queue_threshold (dm_ftindex_agent_config) Queue of objects submitted for indexing • indexer.xml located in index_agent_WAR/WEB-INF/classes/.xml) / connectors_batch_size (dm_ftindex_agent_config) Number of items picked up for indexing when the index agent queries the repository for queue items. also set the corresponding parameters in the dm_ftindex_agent_config object. If there is a conflict. the settings in the config object override the settings in indexagent. used for both migration and normal mode) Size of queue to hold requests sent to xPlore for indexing. All Averages up to Last Activity measures the time between index agent startup and last indexing activity.xml.xml and from the dm_ftindex_agent_config object.queue_size (indexagent. index agent configuration is loaded from indexagent.callback_queue_size (only in indexagent. set the parameters in the indexagent. Do not change these values unless you are directed to change them by EMC technical support. You can tune some indexing and xDB parameters and adjust allowable document size. the index agent waits until the callback queue has reached 100% less the callback_queue_low_percent. In normal mode.xml) Number of threads that extract metadata into DFTXML using DFC • connectors. In migration mode.xml.file_connector. Indexing performance Various factors affect the rate of indexing.queue_size (indexagent. In normal mode. Find the details for Indexed content KB/sec and Indexed documents/sec.2 Administration and Development Guide . • exporter.batch_size (indexagent.thread_count (indexagent. Measuring index agent performance Verify index agent performance using the index agent UI details page. All Averages measures the average time between index agent startup and current run time. When the queue reaches this size. see Document processing and indexing service configuration parameters. more users. • The indexing server I/O subsystem capabilities • The number of CPS instances For heavy ingestion loads or high availability requirements. you can set up an active/active high availability system so that failure in a single system does not disrupt business. See Routing a query to a specific collection. Performance and Disk Space For example. (Documents can be indexed into specific target collections. To scale up for large ingestion requirements or for high availability.1.2 Administration and Development Guide 273 . and higher ingestion rates than a 32-bit processor. Document maximum size Set the maximum document size in the index agent configuration file indexagent. Alternatively. more collections. Avoid this bottleneck with frequent incremental backups. After ingestion has completed. add CPS instances to increase content processing bandwidth. You can configure CPS and indexing settings using xPlore administrator. Document size and performance The default values for maximum document and text size have been optimized for 32-bit environments. EMC Documentum xPlore Version 1. page 283. For a list of these properties. MS Excel files take much longer to index due to their complex cell structure. You can adjust up these values for a 64-bit environment. which is located in indexagent_home/jboss5. The biggest impact on ingestion rate is with threadpool size and processing buffer size. page 208 • Recovery during heavy ingestion If the system crashes during a period of heavy ingestion.war/WEB-INF/classes. transactional recovery could take a long time as it replays the log.0/server/DctmServer_Indexagent/deploy/IndexAgent. page 138. See Moving a temporary collection. Tunable indexing properties The number of threads. a simple text document containing thousands of words can take longer to index than a much larger Microsoft Word document full of pictures. Creating temporary collections for ingestion You can create a collection and ingest documents to that collection. batch size.xml. • Processor version A 64-bit processor supports more domains. • The number of collections Create multiple collections spread over multiple xPlore instances to scale xPlore. move the collection to become a subcollection of existing collection. The recovery process is single-threaded. and queue size at each stage of indexing impacts ingestion performance. route queries to specific collections. tracking DB cache size. add more CPS instances. which shorten the restore period. For best search performance. thread wait time. Text maximum size Set the maximum size of text within a document and the text in CPS batch in CPS configuration. use as large a RAM buffer as possible for the host.2 Administration and Development Guide . Maximum setting: 2 GB. which is located in the CPS instance directory dsearch_home/dsearch/cps/cps_daemon. For faster indexing. For example. Restart the index agent. Choose an instance in xPlore administrator and click Configuration. dirty merge. • mergeFactor 274 EMC Documentum xPlore Version 1. which you can address by increasing I/O capabilities. Max text threshold sets the size limit. ingestion performance can degrade under heavy load. With the guidance of Documentum technical support.Performance and Disk Space Edit the contentSizeLimit parameter within the parent element exporter. which is located in the directory WEB-INF/classes of the primary instance. you can add them. Tunable xDB properties Most applications do not need to modify xDB properties. Default: 3. During the xDB checkpoint process. Above this size. the entire batch of submitted documents fails. Default: 10485760 (10 MB). Higher values use more memory and support faster indexing.xml. • ramBufferSizeMB Size in megabytes of the RAM buffer for document additions. Edit max_data_per_process: The upper limit in bytes for a batch of documents in CPS processing. the zip file is expanded to evaluate document size. the index and black list (change log) are dirty because part of them can be in the application cache or OS system cache. new index. Default: 20 MB. optimized for a 32-bit environment. Default: 300. Maximum text size in CPS batch Edit the CPS configuration file configuration. updates. • cleanMergeInterval Interval in seconds before a non-final merge into a fresh. maximized for 32-bit environment. • maxRamDirectorySize Maximum RAM in bytes for in-memory Lucene index. Note: Increasing the maximum text size can negatively affect CPS memory consumption under heavy load. In this case. the data can be flushed to disk and they are clean. • cleaningInterval Interval in seconds between LRU-based cache cleanup. Larger values can slow ingestion rate and cause more instability. you can set the following properties in xdb. • dirtyMergeInterval Interval in seconds before a non-final. and deletions. If you increase this threshold. Default: 30 MB. After committing. only the document metadata is tokenized. if an email has a zip attachment. Default: 3000000. Default: 30. in bytes. If these properties are not listed. Default: 120. The value is in bytes. The bottleneck for indexing is usually the process of writing index files to disk. Includes expanded attachments.properties. Maximum setting: 2 GB. • maxMergeDoc Sets the maximum size of a segment that can be merged with other segments.ftengine The query execution plan recorded is in dsearch. Examine the query load to see if the system is overloaded.log. If RAM buffer size is exceeded before max merge doc. memory consumption. see Getting the query execution plan. choose Tracing. Performance and Disk Space Number of index entries to keep in memory before storing to disk and how often segments are merged. Note: High values can cause a "too many open files" exception.fulltext. A low value uses less memory and causes more frequent index updates. (For more information on the query plan. Examine the Start time column to see whether slow queries occur at a certain period during the day or certain days of the month. Search performance Measuring query performance Make sure that search auditing is enabled. in bytes.SUBSYSTEM.2 Administration and Development Guide 275 . You can increase the maximum number of open files allowed on a UNIX or Linux host by increasing the nofile setting to greater than 65000. Non-final merge is executed frequently to reduce the number of file descriptors. Small batches slow indexing.S. You can also turn on tracing information for query execution.S. and subindex creation.c. Default: 14400 (4 hours). Run the report Top N slowest queries . Save the query execution plan to find out whether you need an additional index on a metadata element. which is located in the logs subdirectory of the JBoss deployment directory. Units are seconds. Low values are better for interactive indexing because this limits the length of merging pauses during indexing. For example.MODIFY_TRACE. then flush is triggered. the segments are merged. page 204.NULL. and then choose Enable. but searches on unoptimized indexes are faster. Default: 1000000 • nonFinalMaxMergeSize Maximum size of internal Lucene index that is eligible for merging. Factors in query performance The following features of full-text search can affect search performance: • Single-box search in Webtop EMC Documentum xPlore Version 1. When the tenth segment has been added.VALUE. Select an instance. Default: 300000000 • finalMergingInterval Interval after which final subindexes are merged. a factor of 10 creates a segment for every 10 XML documents added to the index. Default: 10. High values are better for batch indexing and faster searches.) Documentum clients can save the plan with the following iAPI command: apply. A high value improves batch indexing and optimized search performance and uses more RAM. The first page of results loads while the remaining results are fetched.xquery. • Size of query result set Consume results in a paged display for good performance.parallel_execution. as well as use of 64-bit hosts. Webtop limits results to 350. To set parallel mode for DFC-based search applications.Performance and Disk Space The default operator for multiple terms is AND. you can set the target collection or domain to index-only during ingestion. especially when the first term is unselective. so response times are slower. • Response times slower during heavy ingestion Slow queries during ingestion are usually an issue only during migration from FAST. Scaling to more instances on the same or multiple hosts. response time rises as the number of collections rises. xPlore does not match parts of words. When queries are routed to a collection. CenterStage limits results to 150.enable = true • Caches empty on system startup At startup.properties to true: dfc. 276 EMC Documentum xPlore Version 1. Changing the security cache sizes Monitor the query audit record to determine security performance. Support for fragment matches (leading and trailing wildcards) can be enabled.2 Administration and Development Guide . For information on targeted queries. but this impacts performance. Paging is especially important to limit result sets for underprivileged users.option. try to limit the number of collections in your xPlore federation. The value of <TOTAL_INPUT_HITS_TO_FILTER> records how many hits a query had before security filtering. see Routing a query to a specific collection. A more limited support for leading wildcards in metadata search can also be enabled.search. you can schedule ingestion during an off-peak time. If you do not use targeted queries. • Flexible metadata search (FTDQL) Searches on multiple object attributes can affect performance. page 208. Alternatively. performance is much better. Make sure that you have allocated sufficient memory for the file system buffer cache and good response time from the I/O subsystem. but performance is much slower. WHERE object_name LIKE ’foo%’ matches foo bar but not football. If your environment has large batch migrations once a month or quarterly. • Security Native xPlore security performs faster than security applied to results in the Content Server. set the following property in dfc. • Leading or trailing wildcards By default. • Number of collections If queries are not run in parallel mode (across several collections at once). Queries can be targeted to specific collections to avoid this problem. the query and security caches have not been filled. with a smaller page size (from 10 to 100). The default can be configured to OR (the old Webtop default). but this impacts performance. can also improve search performance. For example. • Number of documents Documents can be routed to specific collections based on age or other criteria. The latter option can be enabled. Lower values can trigger re-collection operations and increase query response time. A high value improves batch indexing and optimized search performance and uses more RAM. GROUP_OUT_CACHE_FILL). If you have many ACLs. For underprivileged users. You can increase the maximum number of open files allowed on a UNIX or Linux host by increasing the nofile setting to greater than 65000.properties. Suggested size: 350. A low hit ratio indicates an underprivileged user.properties on the search client to increase the value of dfc. For information on how to change these configuration settings. Note: High values can cause a "too many open files” exception.xml. page 51. the segments are merged. but searches on unoptimized indexes are faster. The audit record reports how many times these caches were hit for a query (GROUP_IN_CACHE_HIT. For information on viewing and updating this file. especially for unselective queries. Tuning xDB properties for search You can set the following properties in xdb. increase the value of acl-cache-size (number of permission sets in the cache). If these properties are not listed. Higher values can consume more memory. • mergeFactor Number of index entries to keep in memory before storing to disk and how often segments are merged. Edit dfc. GROUP_OUT_CACHE_HIT). For example.xml. the window size is expanded twice for the next collecting round. who often has slower query response times than other users. Performance and Disk Space The value of <HITS_FILTERED_OUT> shows how many hits were discarded because the user did not have permissions for the results. a factor of 10 creates a segment for every 10 XML nodes added to the index.batch_hint_size. see Configuring the security cache. Some of these properties affect indexing performance as well as search performance. which is located in the directory WEB-INF/classes of the primary instance. If the total result number is larger than the window. For highly privileged users (members of many groups). Increasing query batch size In a Documentum client application based on DFC. Default: 10. The hits filtered out divided by the total number of hits is the hit ratio. Default: 12000 Troubleshooting slow queries Measuring performance EMC Documentum xPlore Version 1. increase the not-in-groups cache size to reduce the number of times this cache must be checked. Indexing is slower. There are two caches that affect security performance: Groups that a user belongs to. see Modifying indexserverconfig.2 Administration and Development Guide 277 . Cache sizes are configured in indexserverconfig. The record reports how many times the query added a group to the cache (GROUP_IN_CACHE_FILL. Default: 50. • queryResultsWindowSize Result window for a single query. When the tenth segment has been added. and groups that a user does not belong to. increase the groups-in-cache size to reduce the number of times this cache must be checked. you can set the query batch size. page 42. A low value uses less memory and causes more frequent index updates. you can add them. . and Search Parameters This appendix covers the following topics: ∙ Index agent configuration parameters ∙ Document processing and indexing service configuration parameters ∙ Search service configuration parameters ∙ API Reference EMC Documentum xPlore Version 1. Appendix A Index Agent. CPS.2 Administration and Development Guide 279 . Indexing. Default is 9200 dsearch_domain Repository name group_exclusion_list Add this parameter to exclude specific group attributes from indexing.1. Default: 5 280 EMC Documentum xPlore Version 1. index_type_mode Object types to be indexed.xml is located in dsearch_home/jboss5. Contains an acl_attributes_exclude_list element. acl_attributes_exclude_list Specifies a space delimited list of ACL attributes that will not be indexed.36 Indexagent configuration parameters in generic_indexer.2 Administration and Development Guide . Contains a group_attributes_exclude_list element. Default: The domain default collection. Check with technical support before you add or modify this list. Most of these parameters are set at optimal settings for all environments. each can index either ACLs or sysobjects.parameter_list Parameter name Description acl_exclusion_list Add this parameter to exclude specific ACL attributes from indexing. dsearch_qrserver_host Fully qualified host name or IP address of host for xPlore server dsearch_qrserver_port Port used by xPlore server.war/WEB-INF/classes. If you use two index agents. max_requests_in_batch Maximum number of objects to be indexed in a batch. group_attributes_exclude_list Specifies a space delimited list of group attributes that will not be indexed. Check with technical support before you add or modify this list.0/server/DctmServer_Indexagent/deploy/IndexAgent. collection Specifies the name of a collection to which all documents will be routed. Index agent configuration parameters Index agent configuration parameters General index agent parameters The index agent configuration file indexagent. Values: both (default) | aclgroup | sysobject. Generic index agent parameters Table A. reduce both max_requests_in_batch and max_submission_timeout_sec. the index agent will wait for the queue to be lower than queue_size less (queue_size * queue_low_percent). The default setting (1) is for high indexing throughput. CPS. Default: 2 group_attributes_exclude_list Attributes of a group to exclude from indexing General index agent runtime parameters Requests for indexing pass from the exporter queue to the indexer queue to the callback queue.1) = 450. queue_low_percent Percent of queue size at which the index agent will resume processing the queue. callback_queue_low_percent Percent of callback queue size at which the index agent will resume sending requests to xPlore. and Search Parameters Parameter name Description max_batch_wait_msec. Index Agent. Indexing. For example. When this timeout is reached. When the queue reaches this size. If your index agent has a low ingestion rate of documents and you want to have low latency. Default: 10000 max_tries Maximum number of tries to add the request to the internal queue when the queue is full. then the agent will resume indexing when the queue is lower than 500 . Table A. max_pending_requests Maximum number of indexing requests in the queue.2 Administration and Development Guide 281 .37 Index agent runtime configuration in indexer element Parameter Description queue_size Size of queue for indexing requests.(500 * . Maximum wait time in milliseconds for a batch to reach the max_requests_in_batch size. the batch is submitted to xPlore. the index agent will wait until the callback queue has reached 100% less the callback_queue_low_percent. When the queue reaches this limit. EMC Documentum xPlore Version 1. if the queue_size is 500 and queue_low_percent is 10%. callback_queue_size Size of queue to hold requests sent to xPlore for indexing. Set to the same value as runaway timeout. page 66. See Mapping Server storage areas to collections. content_clean_interval Timeout to clean local content area. thread_count Number of threads to be used by index agent. runaway_timeout Timeout for runaway requests. Sets the maximum size for documents to be sent for indexing.parameter_list. Other index agent parameters Table A..38 Other index agent parameters Parameter Description contentSizeLimit In exporter. Other index agent parameters Parameter Description wait_time Time in seconds that the indexing thread waits before reading the next item in the indexing queue. documents may still remain in the queue for xPlore processing. The value is in bytes. shutdown_timeout Time the index agent should wait for thread termination and cleanup before shutdown. 282 EMC Documentum xPlore Version 1. partition_config You can add this element and its contents to map partitions to specific collections.2 Administration and Development Guide . Default: 20MB. After the local content area is cleaned. Range: 1-2147483647. • CPS-requests-batch-size: Maximum number of CPS requests in a batch. Default: 1000. • index-threadpool-max-size: Maximum number of threads used to process a single incoming request. not embedded. CPS. Default: 100. Valid values: 1 . Default: 1000. • index-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch. Default: 1000.100. • index-threadpool-core-size: Minimum number of threads used to process a single incoming request. Default: 10. Larger content is passed in a file. Default: 10. • CPS-thread-wait-time: Time in milliseconds to accumulate requests in a batch. EMC Documentum xPlore Version 1. Default: 10. Default: 1000. Index Agent. • commit-option: Default: -1. Range: 1-2147483647. Valid values: 1 . Default: 10.100. choose Indexing Service in the tree and click Configuration. Default: 1000. The default values have been optimized for most environments. Document processing (CPS) global parameters You can configure the following settings for the CPS and indexing services in xPlore administrator. • CPS-threadpool-max-size: Maximum number of threads used to process a single incoming request. ingestion rate can slow down. Indexing global parameters • index-requests-max-size: Maximum size of internal index queue. choose Indexing Service in the tree and click Configuration. • CPS-requests-max-size: Maximum size of CPS queue. Valid values: 1 .100. For CPS and indexing processing settings. • CPS-threadpool-core-size: Minimum number of threads used to process a single incoming request. • CPS-executor-retry-wait-time: Wait time in milliseconds after queue and worker thread maximums have been reached. Valid values: 1 . Default: 2048. Range: 1-2147483647. Default: 5. For CPS and indexing processing settings. Note: If you decrease the threadpool size. • rebuild-index-batch-size: Number of documents to add to rebuild of index in a batch. Default: 100. Indexing.100. • rebuild-index-embed-content-limit: Maximum embedded content for index rebuilding. • CPS-executor-queue-size: Maximum size of CPS queue before spawning a new worker thread. The default values have been optimized for most environments.2 Administration and Development Guide 283 . • index-requests-batch-size: Maximum number of index requests in a batch. Default: 1000. The per-instance CPS settings relate to the instance and do not overlap with the CPS settings in Indexing Service configuration. and Search Parameters Document processing and indexing service configuration parameters You can configure the following settings for the CPS and indexing services in xPlore administrator. Decreasing can slow ingestion. Disable (default) to remove temporary files after the specified time in seconds. especially whenyou have multiple layers of subcollections. • enable-subcollection-ftindex: Set to true to create a multi-path index to search on specific subcollections. Default: 1000. Enabling temp file has a large impact on performance. Range: 1-600. Valid values: 1-100. Default: 1000. May slow ingestion. CPS instance configuration parameters You can configure the following CPS settings for each instance in xPlore administrator. • Port number: Listener port for CPS daemon. Disable if you do not want the daemon restarted. 284 EMC Documentum xPlore Version 1. Default: false. Default: 4. • status-requests-batch-size: Maximum number of status update requests in a batch. • Heartbeat: Interval in seconds between the CPS manager and daemon. subcollection indexes are not rebuilt when you rebuild a collection index. Decreasing the number can affect performance. • Restart threshold: Check After processed. Time range in seconds: 1-604800 (1 week). Default: 1000. This value is set during xPlore configuration. • Keep intermediate temp file: Keep content in a temporary CPS folder for debugging. • status-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch. Increasing the number of connections consumes more memory. • Connection pool size: Maximum number of concurrent connections. • Daemon path: Specifies the path to the installed CPS daemon (read-only). Default: true. Default: 60. If false. The default values have been optimized for most environments. • index-executor-retry-wait-time: Wait time in milliseconds after index queue and worker thread maximums have been reached. • rebuild-index-batch-size: Sets the number of documents to be reindexed. Default: 10. Default: 64321. • rebuild-index-embed-content-limit: Sets the maximum size of embedded content for language detection in index rebuild.2 Administration and Development Guide .. Default: 1000. • index-check-duplicate-at-ingestion: Set to true to check for duplicate documents.. This value is set during xPlore configuration. and specify the number of requests after which to restart the CPS daemon. used by the CPS manager. Ingestion is slower. Larger content is streamed. CPS instance configuration parameters • index-executor-queue-size: Maximum size of index queue before spawning a new worker thread. The values are recorded in dsearch_home/dsearch/cps/cps_daemon/configuration. Default: 2048.xml. Range: 5MB . a temporary file is created for processing. Default: 5. Default: 2 MB (2097152 bytes). they are saved in this path. If the results are larger than Result buffer threshold. only the document metadata is tokenized. Causes instability at heavy CPU load. the Documentum index agent). consequently. • Thread pool size: Number of threads used to process a single incoming request such as text extraction and linguistic processing. This list is configurable. Check No return results to a file. Above this size. Uncheck to pass the file to a plug-in analyzer for processing (for example. in bytes.16 MB. If this threshold is exceeded. because the processing results are always embedded in the return to xPlore. Consumes more memory. Larger size can speed ingestion when CPU is not under heavy load. Valid values: 100 KB-10 MB. • Result buffer size threshold: Number of bytes at which the result buffer returns results to file. Valid values: 8 . CPS. For example. • Processing buffer size threshold: Specifies the number of bytes of the internal memory chunk used to process small documents. xPlore replaces illegal characters with white space. the zip file is expanded to evaluate document size. if an email has a zip attachment. • System language: ISO 639-1 language code that specifies the language for CPS. Indexing. file:///c:/.2 Administration and Development Guide 285 . Maximum possible setting: 2 GB. impacts ingestion. • Load file to memory: Check to load the submitted file into memory for processing. and specify the file path for export. • Export file path: Valid URI at which to store CPS processing results. Default: 1 MB (1048576 bytes). Larger values can slow ingestion rate and cause more instability. for example. Larger value can accelerate process but can cause more instability. ingestion performance can degrade under heavy load. • Batch in batch count: Average number of batch requests in a batch request. optimized for a 32-bit environment. for the text within documents. Increase the value to speed processing. Default: 10485760 (10 MB). To create a token separator. Embedded return increases communication time and. • Max text threshold: Sets the size limit. CPS assigns the number of Connection pool threads for each batch_in_batch count. This setting does not apply to remote CPS instances. • Illegal char file: Specifies the URI of a file that defines illegal characters. Range: 1-100.2GB expressed in bytes. defaults of batch_in_batch of 5 and connection_pool_size of 5 result in 25 threads. If you increase this threshold. Index Agent. For example. EMC Documentum xPlore Version 1. and Search Parameters • Embedded return: Check Yes (default) to return embedded results to the buffer. Default: 10). Range: 1-100. Includes expanded attachments. • normalize_form: Set to true to remove accents in the index. Valid values: 2 . Default: unchecked. which allows search for the same word without the accent. A smaller number increases the risk of language misidentification. Default: 600. maximized for 32-bit environment. Default: dsearch_home/dsearch/cps/cps_daemon/temp. a warning is logged You can configure the following additional parameters in the CPS configuration file configuration. Note: The index agent also has batch size parameters. Set the maximum number of requests in the queue. Queries are processed for language identification. • max_data_per_process: The upper limit in bytes for a batch of documents in CPS processing. Default: dsearch_home/dsearch/cps/cps_daemon/temp. Default: 65536. lemmatization. • The regular queue processes indexing requests. • temp_directory: Directory for CPS temporary files. Set the maximum number of requests in the queue. CPS instance configuration parameters • Request time out: Number of seconds before a single request times out.65538 (default: 1024). • Use express queue: This queue processes admin requests and query requests. Increase to 16384 or larger for CenterStage or other client applications that have a high volume of metadata. and tokenization. The express queue has priority over the regular queue. These settings apply to all CPS instances. 286 EMC Documentum xPlore Version 1. • temp_file_folder: Directory for temporary format and language identification. • IP version: Internet Protocol version of the host machine. Default: 128. Default: 30 MB. • slim_buffer_size_threshold: Sets memory buffer for CPS temporary files. A larger number slows the ingestion process.xml. • max_batch_size: Limit for the number of requests in a batch. • detect_data_len: The number of bytes used for language identification. • Daemon standalone: Check to stop daemon if no manager connects to it. Dual stack is not supported. which is located in the CPS instance directory dsearch_home/dsearch/cps/cps_daemon.2 Administration and Development Guide . The bytes are analyzed from the beginning of the file. Range: 60-3600. Maximum setting: 2 GB. • When the token count is zero and the extracted text is larger than the configured threshold. Values: IPv4 or IPv6. Default: 1024. Negative values default to no timeout (not recommended). Default: 0. and idle threads are removed down to this minimum number. Default: 100 units. • query-summary-default-highlighter: Class that determines summary and highlighting. Interval after which results fetching is suspended when the result cache is full.100. Default: milliseconds.indexserver. The default values have been optimized for most environments. • query-threadpool-max-size: Maximum number of threads used to process incoming requests. Default: 200. Default: 100. See your release notes for supported languages in this release. the thread waits indefinitely until space is available in the cache (freed up when the client application retrieves results). • query-default-result-batch-size: Default size of result batches that are sent to the client. Default: 600000. search performance can decrease. EMC Documentum xPlore Version 1.DefaultSummary. because the client application has not retrieved the result. service is denied to additional requests. Default: com. Default: en (English). no more results are fetched from xDB until the client asks for more results. For a value of 0.100.fulltext. CPS. Default: empty string. • query-threadpool-keep-alive-time-unit: Unit of time for query-thread-pool-keep-alive-time. Default: 64.services. • query-default-locale: Default locale for queries. • query-summary-display-length: Number of characters to return as a dynamic summary. Default: 10. • query-threadpool-keepalive-time: Interval after which idle threads are terminated. see Configuring query summaries.) Default: 3600000. page 174. • query-result-spool-location: Path to location at which to spool results. In a Documentum environment. Default: 100. Default: 200.batch_hint_size in dfc. • query-executor-retry-interval: Wait time in milliseconds after search queue and worker thread maximums have been reached. dfc. • query-summary-highlight-begin-tag: HTML tag to insert at beginning of summary.core. • query-threadpool-core-size: Minimum number of threads used to process incoming requests. Default: 60000. When this limit is reached. (Threads are freed immediately after a result is retrieved.Note: If you decrease the threadpool size.documentum. Default: 3. • query-threadpool-queue-size: Maximum number in threadpool queue before spawning a new worker thread.emc. Indexing. Negative values default to a single batch. Valid values: 1 . Default: empty string.2 Administration and Development Guide 287 . After this limit is reached.properties overrides this setting. Default: dsearch_home/dsearch/spool • query-default-timeout: Interval in milliseconds for a query to time out. For query summary configuration. and Search Parameters Search service configuration parameters You can configure the following settings for the search service in xPlore administrator. See Basistech documentation for identified language codes. • query-result-cache-size: Default size of results buffer. • query-thread-sync-interval: Used for xPlore internal synchronization. • query-executor-retry-limit: Number of times to retry query.summary. Valid values: 1 . • query-summary-highlight-end-tag: HTML tag to insert at end of summary. • query-thread-max-idle-interval: Query thread is freed up for reuse after this interval. Threads are allocated at startup. Index Agent. For example. if query-facet-max-result-size=12. set to false to return as a summary the first n chars defined by the query-summary-display-length configuration parameter. • query-facet-max-result-size: Documentum only. the number of results per facet is reduced accordingly. If a query has many facets. For summaries evaluated in context. Search service configuration parameters • query-enable-dynamic-summary: If context is not important. Sets the maximum number of results used to compute facet values. only 12 results for all facets in a query are returned. • query-index-covering-values: Supports Documentum DQL evaluation. set to true (default). 288 EMC Documentum xPlore Version 1. Default: 10000.2 Administration and Development Guide . Do not change unless tech support directs you to. String usage).client. For example: addCPS("primary".fulltext.emc. String coll = cfg.documentum. Valid values: 1 . String eng = cfg.3. or unknown • getEngineConfig(). System.out. use index. This package is in the SDK jar file dsearchadmin-api. Valid values: 1 .import com.100.api. URL url.core.client. The following example assumes that a connection to the repository has been established and saved in the class variable m_session (instance of IDfSession. Use IDfFtConfig to get the following information: • getCollectionPath. Returns the complete path of the root collection.2 Administration and Development Guide 289 . public void getEngineConfig() throws Exception { IDfFtConfig cfg = m_session.fulltext. If the instance is used for CPS alone. • CPS-threadpool-max-size: Number of threads used to process a single incoming request such as text extraction and linguistic processing.interfaces. useful for constructing some queries.4/services".100. index. } EMC Documentum xPlore Version 1.IDfFtConfig. Returns DSEARCH. FAST.2.jar.documentum. Get the IDfFtConfig interface from a DFC session.getEngine(). Index Agent. Indexing." index") CPS configuration keys for setCPSConfig() • CPS-requests-max-size: Maximum size of CPS queue • CPS-requests-batch-size: Maximum number of CPS requests in a batch • CPS-thread-wait-time: Maximum wait time in milliseconds to accumulate requests in a batch • CPS-threadpool-core-size: Minimum number of threads used to process a single incoming request such as text extraction and linguistic processing.println("Full-text engine: " + eng).out. • getEngine(). CPS.println("Collection path: " + coll). addCPSTo add a CPS instance using the API addCPS(String instanceName.fc. and Search Parameters API Reference CPS APIs Content processing service APIs are available in the interface IFtAdminCPS in the package com. System.getFtConfig(). or search. the following values are valid for usage: all.getCollectionPath(). Lucene.admin." http://1. • CPS-executor-queue-size: Maximum size of CPS executor queue before spawning a new worker thread • CPS-executor-retry-wait-time: Wait time in milliseconds after executor queue and worker thread maximums have been reached Indexing engine APIs (DFC) DFC exposes APIs to get information about the index engine that a repository uses. documentum. xPlore and FAST support this feature.fulltext. see Facets.emc. which reduces the actual count returned for the query.admin. resulting in faster query results.core.fulltext.emc. security_eval_in_fulltext.client.core. • IFtSearchSession in com.jar.api.Note: The count that is returned for DQL queries does not reflect the application of security. hit_count.admin. and -1 when it cannot be determined.fulltext. – hit_count: The full-text engine returns the total number of hits before returning results. FAST supports zone searching (search topic) for backward compatibility. Search APIs • isCapabilitySupported(String capability). 290 EMC Documentum xPlore Version 1. Supported returns are as follows: – scope_search: XML element data can be searched.2 Administration and Development Guide . : • IFtAdminSearch in the package com.interface.interfaces.client.jar. 0 for unsupported. This package is in the SDK jar file dsearchadmin-api.documentum. like the <IN> operator in previous versions of Content Server.emc. This feature is available only in xPlore. – relevance_ranking: The full-text engine scores results using configurable criteria. – search_topic: The full-text engine indexes all XML elements and their attributes.admin.emc.documentum. For information on creating and retrieving facets. FAST supports this feature.core. xPlore does not support it.fulltext.api. xDB data management APIs The data management APIs are available in the interface IFtAdminDataManagement in the package com.search • IFtQueryOptions in com.search. This package is in the SDK jar file dsearchadmin-api.core. xquery. Returns 1 for supported.fulltext.jar. Auditing APIs Auditing APIs are available in the interface IFtAdminAudit in the package com.interfaces.core.common. Search APIs Search service APIs are available in the following packages of the SDK jar file dsearchadmin-api.documentum. and search_topic. xPlore and FAST support this feature. Supported inputs are scope_search. – xquery: xPlore supports XQuery syntax. – security_eval_in_fulltext: Security is evaluated in the full-text engine before results are returned to Content Server.emc.documentum.client. relevance_ranking. The xPlore server returns false.api.client. page 225. Appendix B Documentum DTDs This appendix covers the following topics: ∙ Extensible Documentum DTD EMC Documentum xPlore Version 1.2 Administration and Development Guide 291 . Using xPlore administrator. dmftcontents Contains content-related attributes and one or more pointers to content files. dmftsearchinternals Contains tokens used by static and dynamic summaries. are repeated in other elements as noted. Table B. dmftversions Contains version labels and iscurrent for sysobjects. Extensible Documentum DTD Extensible Documentum DTD Viewing the DFTXML representation of a document Documentum repository content is stored in XML format. including custom object types. click the document in the collection view. for example. To find the path of a specific attribute in DFTXML. acl_domain. The actual content can be stored within the child element dmftcontent as a CDATA section. Each attribute is modeled as an element and value. dmftinternal Contains attributes used internally for query processing. Click the resulting document to see the DFTXML representation. Some metadata. To view the DFTXML representation using xPlore administrator. The object type is the element in the path dmftdoc/dmftmetadata/type_name. dmftfolders Contains the folder ID and folder parents.39 DFTXML top-level elements Element Description dmftkey Contains Documentum object ID (r_object_id) dmftmetadata Contains elements for all indexable attributes from the standard Documentum object model. Following are the top-level elements under dmftdoc.2 Administration and Development Guide . See Injecting data and supporting joins. dmftvstamp Contains the internal version stamp (i_vstamp) attribute. dmftdoc/dmftmetadata/dm_document. Requires a TBO. dmftsecurity Contains security attributes from the object model plus computed attributes: acl_name. use a Documentum client to look up the object ID of a custom object. open the target collection and paste the object ID into the Filter word box. Each element specifies an attribute of the object type. Customer-defined elements and attributes can be added to the DTD as children of dmftcustom. such as r_object_id. dmftcustom Contains searchable information supplied by custom applications. Repeating attributes repeat the element name and contain a unique value. page 70. and ispublic. DTD This DTD is subject to change. 292 EMC Documentum xPlore Version 1. 2 Administration and Development Guide 293 . Documentum DTDs Example DFTXML of a custom object type <?xml version="1.txt</object_name> <r_object_type dmfttype="dmstring">techpubs</r_object_type> <r_creation_date dmfttype="dmdate">2010-04-09T21:40:47</r_creation_date> <r_modify_date dmfttype="dmdate">2010-04-09T21:40:47</r_modify_date> <r_modifier dmfttype="dmstring">Administrator</r_modifier> <r_access_date dmfttype="dmdate"/> <a_is_hidden dmfttype="dmbool">false</a_is_hidden> <i_is_deleted dmfttype="dmbool">false</i_is_deleted> <a_retention_date dmfttype="dmdate"/> <a_archive dmfttype="dmbool">false</a_archive> <a_link_resolved dmfttype="dmbool">false</a_link_resolved> <i_reference_cnt dmfttype="dmint">1</i_reference_cnt> <i_has_folder dmfttype="dmbool">true</i_has_folder> <i_folder_id dmfttype="dmid">0c0a0d6880000105</i_folder_id> <r_link_cnt dmfttype="dmint">0</r_link_cnt> <r_link_high_cnt dmfttype="dmint">0</r_link_high_cnt> <r_assembled_from_id dmfttype="dmid">0000000000000000</r_assembled_from_id> <r_frzn_assembly_cnt dmfttype="dmint">0</r_frzn_assembly_cnt> <r_has_frzn_assembly dmfttype="dmbool">false</r_has_frzn_assembly> <r_is_virtual_doc dmfttype="dmint">0</r_is_virtual_doc> <i_contents_id dmfttype="dmid">060a0d688000ec61</i_contents_id> <a_content_type dmfttype="dmstring">crtext</a_content_type> <r_page_cnt dmfttype="dmint">1</r_page_cnt> <r_content_size dmfttype="dmint">130524</r_content_size> <a_full_text dmfttype="dmbool">true</a_full_text> <a_storage_type dmfttype="dmstring">filestore_01</a_storage_type> <i_cabinet_id dmfttype="dmid">0c0a0d6880000105</i_cabinet_id> <owner_name dmfttype="dmstring">Administrator</owner_name> <owner_permit dmfttype="dmint">7</owner_permit> <group_name dmfttype="dmstring">docu</group_name> <group_permit dmfttype="dmint">5</group_permit> <world_permit dmfttype="dmint">3</world_permit> EMC Documentum xPlore Version 1.0"?> <dmftdoc dmftkey="090a0d6880008848" dss_tokens=":dftxml:1"> <dmftkey>090a0d6880008848</dmftkey> <dmftmetadata> <dm_sysobject> <r_object_id dmfttype="dmid">090a0d6880008848</r_object_id> <object_name dmfttype="dmstring">mylog. Example DFTXML of a custom object type <i_antecedent_id dmfttype="dmid">0000000000000000</i_antecedent_id> <i_chronicle_id dmfttype="dmid">090a0d6880008848</i_chronicle_id> <i_latest_flag dmfttype="dmbool">true</i_latest_flag> <r_lock_date dmfttype="dmdate"/> <r_version_label dmfttype="dmstring">1.0</r_version_label> <r_version_label dmfttype="dmstring">CURRENT</r_version_label> <i_branch_cnt dmfttype="dmint">0</i_branch_cnt> <i_direct_dsc dmfttype="dmbool">false</i_direct_dsc> <r_immutable_flag dmfttype="dmbool">false</r_immutable_flag> <r_frozen_flag dmfttype="dmbool">false</r_frozen_flag> <r_has_events dmfttype="dmbool">false</r_has_events> <acl_domain dmfttype="dmstring">Administrator</acl_domain> <acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name> <i_is_reference dmfttype="dmbool">false</i_is_reference> <r_creator_name dmfttype="dmstring">Administrator</r_creator_name> <r_is_public dmfttype="dmbool">true</r_is_public> <r_policy_id dmfttype="dmid">0000000000000000</r_policy_id> <r_resume_state dmfttype="dmint">0</r_resume_state> <r_current_state dmfttype="dmint">0</r_current_state> <r_alias_set_id dmfttype="dmid">0000000000000000</r_alias_set_id> <a_is_template dmfttype="dmbool">false</a_is_template> <r_full_content_size dmfttype="dmdouble">130524</r_full_content_size> <a_is_signed dmfttype="dmbool">false</a_is_signed> <a_last_review_date dmfttype="dmdate"/> <i_retain_until dmfttype="dmdate"/> <i_partition dmfttype="dmint">0</i_partition> <i_is_replica dmfttype="dmbool">false</i_is_replica> <i_vstamp dmfttype="dmint">0</i_vstamp> <webpublish dmfttype="dmbool">false</webpublish> </dm_sysobject> </dmftmetadata> <dmftvstamp> <i_vstamp dmfttype="dmint">0</i_vstamp> </dmftvstamp> <dmftsecurity> <acl_name dmfttype="dmstring">dm_450a0d6880000101</acl_name> <acl_domain dmfttype="dmstring">Administrator</acl_domain> <ispublic dmfttype="dmbool">true</ispublic> </dmftsecurity> 294 EMC Documentum xPlore Version 1.2 Administration and Development Guide Documentum DTDs <dmftinternal> <docbase_id dmfttype="dmstring">658792</docbase_id> <server_config_name dmfttype="dmstring">DSS_LH1</server_config_name> <contentid dmfttype="dmid">060a0d688000ec61</contentid> <r_object_id dmfttype="dmid">090a0d6880008848</r_object_id> <r_object_type dmfttype="dmstring">techpubs</r_object_type> <i_all_types dmfttype="dmid">030a0d68800001d7</i_all_types> <i_all_types dmfttype="dmid">030a0d6880000129</i_all_types> <i_all_types dmfttype="dmid">030a0d6880000105</i_all_types> <i_dftxml_schema_version dmfttype="dmstring">5.3</i_dftxml_schema_version> </dmftinternal> <dmftversions> <r_version_label dmfttype="dmstring">1.0</r_version_label> <r_version_label dmfttype="dmstring">CURRENT</r_version_label> <iscurrent dmfttype="dmbool">true</iscurrent> </dmftversions> <dmftfolders> <i_folder_id dmfttype="dmid">0c0a0d6880000105</i_folder_id> </dmftfolders> <dmftcontents> <dmftcontent> <dmftcontentattrs> <r_object_id dmfttype="dmid">060a0d688000ec61</r_object_id> <page dmfttype="dmint">0</page> <i_full_format dmfttype="dmstring">crtext</i_full_format> </dmftcontentattrs> <dmftcontentref content-type="text/plain" islocalcopy="true" lang="en" encoding="US-ASCII" summary_tokens="dmftsummarytokens_0"> <![CDATA[...]]> </dmftcontentref> </dmftcontent> </dmftcontents> <dmftdsearchinternals dss_tokens="excluded"> <dmftstaticsummarytext dss_tokens="excluded"><![CDATA[mylog.txt ]]> </dmftstaticsummarytext> <dmftsummarytokens_0 dss_tokens="excluded"><![CDATA[1Tkns ...]]> </dmftsummarytokens_0></dmftdsearchinternals></dmftdoc> EMC Documentum xPlore Version 1.2 Administration and Development Guide 295 Appendix C XQuery and VQL Reference This appendix covers the following topics: ∙ Tracking and status XQueries ∙ VQL and XQuery Syntax Equivalents EMC Documentum xPlore Version 1.2 Administration and Development Guide 297 Tracking and status XQueries Tracking and status XQueries You can issue the following XQuery expressions against the tracking database for each domain. Many of these expressions are available in xPlore administrator or as audit reports. These XQuery expressions can be submitted in the xDB console. Object count from tracking DB Get object count in a collection count(//trackinginfo/document[collection-name="<Collection_name>"]) For example: for $i in collection("dsearch/SystemInfo") return count($i//trackinginfo/document) Get object count in all collections (all indexed objects) count(//trackinginfo/document) For example: Get object count in library count(//trackinginfo/document[library-path="<LibraryPath>"]) Find documents Find collection in which a document is indexed //trackinginfo/document[@id="<DocumentId>"]/collection-name/string(.) For example: for $i in collection("dsearch/SystemInfo") where $i//trackinginfo/document[@id="TestCustomType_txt1276106246060"] return $i//trackinginfo/document/collection-name Find library in which a document is indexed //trackinginfo/document[@id="<DocumentId>"]/library-path/string(.) Get tracking information for a document //trackinginfo/document[@id="<DocumentId>"] Status information Get operations and status information for a document //trackinginfo/operation[@doc-id="<DocumentId>"] 298 EMC Documentum xPlore Version 1.2 Administration and Development Guide XQuery and VQL Reference VQL and XQuery Syntax Equivalents xPlore does not support the Verity Query Language (VQL). The following table maps VQL syntax examples that have equivalent in XQuery. Table C.40 DQL and XQuery mapping DQL XQuery IN for $i in collection(’ /XX/dsearch/Data’)/dmftdoc[ (dmftcontents/dmftcontent ftcontains (’test1’)) ] NEAR/N for $i in collection(’ /XX/dsearch/Data’)/dmftdoc[ (dmftcontents/dmftcontent ftcontains (’test1’ ftand ’test2’ distance exactly N words)) ] ORDERED for $i in collection(’ /XX/dsearch/Data’)/dmftdoc[ (dmftcontents/dmftcontent ftcontains (’test1’ ftand ’test2’) ordered] ENDS let $result := ( for $i in collection(’ /XX/dsearch/Data’)/dmftdoc[ (dmftcontents/dmftcontent ftcontains (’test1’)) and (ends-with(dmftmetadata/dm_sysobject/ object_name, ’test2’))] STARTS for $i in collection(’ /XX/dsearch/Data’)/dmftdoc[ (dmftcontents/dmftcontent ftcontains (’test1’)) and starts-with(dmftinternal/r_object_type, ’dm_docu’)] EMC Documentum xPlore Version 1.2 Administration and Development Guide 299 xPlore Glossary Term Description category A category defines a class of documents and their XML structure. ingestion Process in which xPlore receives an XML representation of a document and processes it into an index. used by many Content Server clients FTDQL Full-text Documentum Query Language ftintegrity A standalone Java program that checks index integrity against Content Server repository documents. collection A collection is a logical group of XML documents that is physically stored in an xDB library. full-text index Index structure that tracks terms and their occurrence in a document. content processing service (CPS) The content processing service (CPS) retrieves indexable content from content sources and determines the document format and primary language. DQL Documentum Query Language. A collection represents the most granular data management unit within xPlore. . index agent Documentum application that receives indexing requests from the Content Server. The agent prepares and submits an XML representation of the document to xPlore for indexing. domain A domain is a separate. independent group of collections with an xPlore deployment. The ftintegrity script calls the state of index job in the Content Server. CPS parses the content into index tokens that xPlore can process into full-text indexes. and indexing failures. called a lemma. This job is run from Documentum Administrator. search. text extraction Identification of terms in a content file.xml. Java-based full-text indexing and search engine. status library A status library reports on indexing status for a domain. lemmatization Lemmatization is a normalization process in which the lemmatizer finds a canonical or dictionary form for a word. indexing. It does not denote host. The ftintegrity script calls this job. Content that is indexed is also lemmatized unless lemmatization is turned off. Lucene Apache open-source. node In xPlore and xDB. which reports on index completeness. token Piece of an input string defined by semantic processing rules. xPlore can have multiple instances installed on the same host.Term Description instance A xPlore instance is one deployment of the xPlore WAR file to an application server container. node is sometimes used to denote instance. indexing. status. stop words Stop words are words that are filtered out before indexing. xPlore administrator. state of index job Content Server configuration installs the state of index job dm_FTStateOfIndex. You can have multiple instances on the same host (vertical scaling). and search metrics. . The following processes can run in an xPlore instance: CPS. Configurable in indexserverconfig. persistence library Saves CPS. although it is more common to have one xPlore instance per host (horizontal scaling). Terms in search queries are also lemmatized unless lemmatization is turned off. There is one status library for each domain. to save the size of the index and to prevent searches on common words. positional filters.Term Description tracking library An xDB library that records the object IDs and location of content that has been indexed. an xDB library stores a collection as a Lucene index and manages the indexes on the collection. Support for XQFT includes logical full-text operators. xPlore receives xQuery expressions that are compliant with the XQuery standard and returns results. xDB xDB is a database that enables high-speed storage and manipulation of many XML documents. and score variables. There is one tracking database for each domain. The XML content of indexed documents can optionally be stored.0. then merged into larger indexes. XQuery W3C standard query language that is designed to query XML data. In xPlore. anyall option. wildcard option. . watchdog service Installed by the xPlore installer. When an index is written to disk. it is considered clean. transactional support Small in-memory indexes are created in rapid transactional updates. the watchdog service pings all xPlore instances and sends an email notification when an instance does not respond. Committed and uncommitted data before the merge is searchable along with the on-disk index. XQFT W3C full-text XQuery and XPath extensions described in XQuery and XPath Full Text 1. 232 backup out of the box. 29 Content Server index server. 28 Get query text. 50 exporter_thread_count. 29 verifying indexes. 231 FAST backward compatibility. 69 attach. 23 fragment collection FAST compatibility. 28 G dmi_registry. 158 dm_FTStateOfIndex. 226 incremental. 28 script. 181 categories force Documentum. 137 freshness-weight. 62 dm_fulltext_index_user. 137 ftintegrity overview. 231 numeric. 210 EMC Documentum xPlore Version 1. 60 D xPlore. 83. 29 detach. 27 software installation. 134 running. 242 documen maximum size for ingestion. 27. 132 Documentum. 268 federation cache restore with xDB. 50 exporter queue_size. 287 string in DFC. 27. 26 in results summary. in DFC. 24 FT_CONTAIN_FRAGMENT. 27. 60 connectors_batch_size. 272 aclreplication. 50 register. sizing. 25 domain create. 22 reset state. 26 detach and attach. 169 global. in DFC. 175 domains. 178 configure. 272 full-text indexing CONTAINS WORD. 152 DQL using IDfQueryBuilder. 230 batch_hint_size. 181 folder capacity descend. 152 folder. 151 results from IDfQueryProcessor. 158 overview. 25 overview. 285 document size maximum. 29 indexing. 178 C migration. 273 H Documentum highlighting categories. Index A E ACL replication events job. 181 allocate and deallocate. 272 ACLs large numbers of. 178 move to another instance. 158 F facets B date. 29 overview. 178 Content Server documents.2 Administration and Development Guide 303 . caching. 152 restore with xDB. 12 folder cache size. 242 change. 210 save-tokens. 274 password Top N slowest queries. 286 W query summary. 27. 27. with xDB. in Documentum. 242 Query counts by user. 53 performance language identification. 88 incremental backup. 68 indexserverconfig. 62 summary dynamic. 268 deactivate . 169 T text size P maximum. 35 search. 29 CPS. 152 restore domain. 152 304 EMC Documentum xPlore Version 1. 29 security index server manually update. 176 watchdog service. with xDB. 53 indexer queue_size. 28 R recent documents boost in results. 178 role in indexing process. 62 state of index . 29 view in Content Server. 50 role in indexing process. 176 M metadata boost in results. 178 index agent SEARCH DOCUMENT CONTAINS. 26 sizing installing indexing software.xml maximum. 28 content size limit. 243 query definition. 285 Documentum categories. 52 indexing size queue items. 175 performance. 35 Q query counts by user. 35 jobs state of index. 269 instance migration from FAST. 243 Top N slowest queries. 242 reset domain state. 34 primary instance replace. 27. Index I S IDfQueryBuilder. 272 view in log. 35 J replace primary. 269 spare instance deactivate. for ingestion. 28 report Get query text. 169 reindexing. 151 SDC. 152 federation. 210 queue items. 83.2 Administration and Development Guide .
Copyright © 2024 DOKUMEN.SITE Inc.