QS-AVI Address Cleansing as a Web Service for IBMInfoSphere Identity Insight Author: Bhaveshkumar R Patel ([email protected]) Address cleansing – sometimes referred to as address “hygiene” or “standardization” – is a process used with the Identity Insight pipeline to help you correct and standardize address information for optimal entity resolution processing. This new IBM® InfoSphereTM Identity Insight feature enables the use of an industry standard address data standardization solution that includes: • • • • AddressDoctor® IBM InfoSphere Information Server IBM InfoSphereDataStage® IBM InfoSphere WebSphere QualityStageTM. Enabling support for an address standardization module provided by AddressDoctor eliminates the dependencies and limitations often associated with other standardization databases such as Worldwide Address Verification and Enhancement System (WAVES). The AddressDoctor address standardization module can be used for Identity Insight entity resolution by using the DataStage and QualityStage Address Verification interface. This process is generally referred to as QS-AVI in this document. This techdoc describes how to create and apply an address data-cleansing job that standardizes address data for use by IBM Identity Insight. The job is defined in DataStage and uses QS-AVI Data Quality stages. Note that the steps are described and illustrated in a Windows client environment. The basic steps for implementing this address cleansing job as a Web service are: STEP STEP STEP STEP 1: 2: 3: 4: STEP 5: STEP 6: STEP 7: Verify prerequisite software Define a QS-AVI Data stage Job to cleanse the address data. Enable the Data stage job for Information Services Use the Information Server Console to define a Data stage job as a service Use the Information Server Console to deploy this new job as service Examine the WSDL file Test the service STEP 1: Verify prerequisite software You should have the following software installed: • • • Data stage InfoServer Version 8.0.1 QS-AVI Data Quality stages Address Doctor Database (Required Country Database) STEP 2: Define a QS-AVI Data stage Job to cleanse the address data. 1. Open DataStage Designer. Start -> All Programs -> IBM Information Server -> IBM WebSphere DataStage and QualityStage Designer a. In the Attach to Project window, enter ibmpassw0rd as the password to connect to the Project. Click OK. Figure 1 - Attach to Project window b. Close the window New. 2. In the Palette pane, open the Data Quality folder to browse through the available stages. Make sure you are able to find the QS-AVI “Address Verification” stage as shown in figure 2. Copy the AddressValidateWS.dsx file from the QS-AVI package to your local hard drive (C:\). select the job AddressValidateWS in the Jobs folder.Address Verification in the Data Quality folder 3.Figure 2 . 4. and open it by selecting Edit. Figure 3 .dsx file to DataStage. This is a predefined address cleansing job and has been designed for IBM Identity Insight and QS-AVI integration. 5.DataStage Designer Repository pane . Import the AddressValidateWS. In the Repository pane. Update the stage-> properties as follows: a. STEP 3: Enable the DataStage job for Information Services One more step must be performed before the new job is enabled for Information Services. Update Full Preload with the required county database. You must change the properties of the job and specify that multiple instances of the job can be run. by selecting Properties.) 7. Open the Address_Verification_8 stage. (See figure 4. b. In this window you can examine and modify stage->properties. and that the job can be made available as a Web service. Figure 4 . .AddressVerification stage window.6. Update Reference database path with the AddressDoctor Database installation location. Save the job by selecting Save from the File menu. 2.1. The job must have been set in DataStage with the property “Enabled for Information Services”. open the job properties by selecting the Edit menu and then Job Properties). Set run time parameters. Choose one service interface binding (such as SOAP over HTTP. 2. or press F7. Compile the job by selecting Compile from File menu. Step 4: Define a DataStage job as a service using the Information Server Console The Console for IBM Information Server allows you to define a data transformation or cleansing job (DataStage job) as a service. On the General page of the job properties. 5. Set up the request and response messages for the operation. In the Repository pane. 4. Select the DataStage job to expose as the first operation of the new service. check the following 3 boxes: • • • Enable hashed file cache sharing Allow Multiple Instances Enabled for Information Services Figure 5 . 3. Click OK to save the job properties. 4. EJB). The tool includes a wizard to guide you through the task.Job Properties window 3. The wizard walks you through the following task steps: 1. . 5. Name and describe the new service. The service name is IBM WebSphere Application Server V6 – bhapatelNode02. From the Windows Start button. Open the AddressValidateProject by selecting File -> Open Project. start the service. Verify that the IBM WebSphere Application Server service is already started. When you are prompted for user name and password. . c. If it is not started. a. b. In the Open Project window. Open and customize the Information Services Application. AddressValidateApp. select AddressValidateProject. which was installed as part of the IBM Information Server install. enter AddressValidateApp. and click Open. a. and Click Tasks->New (right side panel). 1. select All Programs -> IBM Information Server -> IBM Information Server Console. Click New Project to create new project. On the Overview page. Application name. 3. Name : AddressValidateProject Figure 6 . Click the Develop icon. Open the Information Services Application window. 4.Before you begin Before starting the IBM Information Server Console. The example configuration in this document uses WebSphere Application Server. you must have an application server up and running. a. Open the Console for IBM Information Server.New Project window. Type (Select) : Information Services b. enter: user name password : IBM_XXXX : XXXXXX 2. d.e. enter AddressValidateService. and the operation that the service invokes. . describe the service (i. Figure 8 .Figure 7 .Select Information Services Application. The wizard lets you fill in information about the general properties of the new service. In the Description field. and who to call for help. which function the service performs. On the Overview page. This information is useful when users look at the Services Directory to find out what services exist. You can use the wizard to help you create and deploy the new service. e. for Service Name. “QS-AVI-WISD Web Service”).Open AddressValidateApp. the binding used by the service. The name of the operation must start with a lower case letter or you will not be able to successfully save your service definition. Note that currently the system offers you a choice between SOAP over HTTP and EJB as binding. Specify the operation performed by the service. Click Select to choose the information provider for this operation.f. g. Change the name of the operation to addressValidateOps. On the bindings page (under “NewService1” in the Services folder). and select Operation from the menu. select DataStage and QualityStage. as type of information provider. i. In the Information Provider window. click New. A new window is displayed. At the bottom of the “Select a View” portion of the window. and select SOAP over HTTP as binding.a new operation. . click the Attach Bindings menu button (bottom right). which lets you specify the operation to invoke. j. h. Figure 9 . which were enabled for information services earlier when you set up the job in DataStage and QualityStage Designer. Figure 10 . Navigate now through the folders to find the job named AddressValidateWS. If the job name is not listed. it is likely that you did not compile the DataStage job. Click OK. If so.Select the IaaS_Proj job in the Job folder.k. . l. Select the job located in the Job folder: IaaS_Proj. go back to the DataStage and QualityStage Designer and compile the job. m. Click Close Application. and how requests will be handled in the pipeline. Click Save Application to complete the definition of the service. which should look like this: . the load balancing delay. Outputs and Provider Properties tabs to review input and output parameters for the service. However the service will not return multiple rows of data. These parameters control the number of job instances allowed. as explained in step m. you must go back to DataStage Designer to slightly modify the original job. The Provider Properties tab contains important runtime parameter settings. You could now deploy the service. During the definition of these stages.new operation detail pane. Remember that this DataStage job is enabled to Information Services and includes a WISD_Input and a WISD_Output stage. n. you should have identified the columns that would be used as input and output. You can browse through the Inputs. You are now returned to the Application window. o.Figure 11 . Step 5: Use the Information Server Console to deploy this new job as a service Deploying an application will install an Enterprise Application on an application server. The deployment is also performed using the Console for IBM Information Server. select the AddressValidateApp. 2. 1. This enables the services to be invoked by other applications or services.a defined service. You have now completed the registration of the service and can deploy the job as a service. In the Information Services Application window. The window with the Service Objects to deploy is displayed.Figure 12 . Deploy the service object named AddressValidateService. 3. . Click Deploy. For this example. 8. which you can expand by selecting Details. Once the deployment completes. 5. 7.Deploying the application. The application is now successfully deployed. The bottom of the screen has an activity status window. the deployment status window shows a change in status from “Executing” to “Completed”. 4. Close the Activity Status window. You can browse the Manage Providers section. especially if your system does not have 3GB or more of system memory. keep all of the default options. Note that deploying an application can take a very long time. . 6. Click Deploy (located at the bottom of the window).Figure 13 . 1. you should see the name of the application AddressValidateApp that you just deployed. click the OPERATE icon and select Deployed Information Services Application from the menu. WISD generates the WSDL “on the fly”. Open the “Deployed Information Services Application” window. Expand the AddressValidateApp folder. Select the name of the service: AddressValidateService. 3. WSDL contains all the necessary descriptions (meta data) that a client application would need to invoke the service. The AddressValidateApp folder contains the name of the service(s) defined in the application. . you will not be able to generate the definition. you also can display the operation being called by the service. 2.Step 6: Examine the WSDL file Verify the deployment by generating the Web service definition language (WSDL) document for the new service. 4. In the Deployed Applications window. If your application was not deployed successfully. Figure 14 .Deployed Information Services Application window. On the WISD Navigation bar. for each service. 5. .Figure 15 . Select View Service in Catalog to open the Information Server Administrator Web Client with the Information Services Catalog view displayed.AddressValidateService details. You can browse through the various pages to see the information related to bindings. open the Bindings page. 6. .View Service in Catalog results. and expand the SOAP over HTTP box. attributes and operations. To find the WSDL document. The above window contains the general properties of the service.Figure 16 . The file is being displayed in a new browser window.Figure 17 . 7. Click the link Open WSDL Document to generate the WSDL file for the AddressValidateService service. .Bindings view. . Keep the name of the URL associated to the WSDL file: http://bhapatel:9080/wisd/AddressValidateApp/AddressValidateService/wsdl/Address ValidateService.Generated WSDL file. You can now test the service. Save the WSDL file in the folder C:\SOADEMO\Results. Step 7: Test the Service You can use the WebSphere Integration Developer environment provides to easily verify that a service is working properly. 3. Accept the workspace as displayed: SOAiis.Figure 18 . and exit the Console for IBM Information Server. 9. 11. Select Run then Launch to open the Web services Explorer. 1. without having to write an application. Open WebSphere Integration Developer: Start -> All Programs -> IBM WebSphere -> Integration Developer v6 -> WebSphere Integration Developer v6 2. Close the window displaying the WSDL file and the window labeled Header Microsoft Internet Explorer. 8.wsdl 10. wsdl . Enter the WSDL URL: http://bhapatel:9080/wisd/AddressValidateApp/AddressValidateService/wsdl/Address ValidateService.4. In the Web Browser pane. 6. 5. Click WSDL Main in the Navigator. select the icon representing WSDL Page (upper righthand corner). displaying the operation names associated with a service. Click Go to get the operation name associated with the service. a. You generated that address by opening the WSDL document from the View Service in Catalog in the Information Server Administrator Web Client at the end of the previous section. Click the operation named addressValidateOps.Figure 19 . 7. .Open the WSDL URL This is the address that is associated with the service. The next screen displays the operation name(s) associated with the service. Figure 20 . Let’s assume that you want to standardize the name and address of this customer: addr1 city state country : : : : 4100 bohanon Dr Menlo Park CA USA Figure 21 .8.Invoking a WSDL operation. you should see the result containing the standardized named and address for the customer entered as input. In the Status window. You must specify the input values that this operation requires. switch from a Source view to a Form view to get a nicely formatted response document. . In the Status window. 9. The service is being invoked. Click Go. a form view of a response document. Figure 22 shows that service has been successfully invoked and that you have successfully enabled a data-cleansing job as a Service.Figure 22 . .