IS Architecture – Informatica


Integration Service Architecture is name its self show …

The Integration Service moves data from sources to targets based on workflow and mapping metadata stored in a repository. When a workflow starts, the Integration Service retrieves mapping, workflow, and session metadata from the repository. It extracts data from the mapping sources and stores the data in memory while it applies the transformation rules configured in the mapping. The Integration Service loads the transformed data into one or more targets.

The following figure shows the processing path between the Integration Service, repository, source, and target:

Intservice01

To move data from sources to targets, the Integration Service uses the following components:

*    Integration Service process. The Integration Service starts one or more Integration Service processes to run and monitor workflows. When you run a workflow, the Integration Service process starts and locks the workflow, runs the workflow tasks, and starts the process to run sessions. For more information, see Integration Service Process.

**   Load Balancer. The Integration Service uses the Load Balancer to dispatch tasks. The Load Balancer dispatches tasks to achieve optimal performance. It may dispatch tasks to a single node or across the nodes in a grid. For more information, see Load Balancer.

***  Data Transformation Manager (DTM) process. The Integration Service starts a DTM process to run each Session and Command task within a workflow. The DTM process performs session validations, creates threads to initialize the session, read, write, and transform data, and handles pre- and post- session operations. For more information, see Data Transformation Manager (DTM) Process.

The Integration Service can achieve high performance using symmetric multi-processing systems. It can start and run multiple tasks concurrently. It can also concurrently process partitions within a single session. When you create multiple partitions within a session, the Integration Service creates multiple database connections to a single source and extracts a separate range of data for each connection. It also transforms and loads the data in parallel. For more information, see Processing Threads. Integration Service Architecture – Informatica

You can create an Integration Service on any machine where you installed the PowerCenter Services. You can configure the Integration Service using the Administration Console or the pmcmd command line program.

Integration Service Connectivity

The Integration Service connects to the following PowerCenter components:

*    PowerCenter Client

**   Repository Service

***  Source and target databases

The Integration Service is a repository client. It connects to the Repository Service to retrieve workflow and mapping metadata from the repository database. When the Integration Service process requests a repository connection, the request is routed through the master gateway, which sends back Repository Service information to the Integration Service process. The Integration Service process connects to the Repository Service. The Repository Service connects to the repository and performs repository metadata transactions for the client application.

The Workflow Manager communicates with the Integration Service process over a TCP/IP connection. The Workflow Manager communicates with the Integration Service process each time you schedule or edit a workflow, display workflow details, and request workflow and session logs. Use the connection information defined for the domain to access the Integration Service from the Workflow Manager.

The Integration Service process connects to the source or target database using ODBC or native drivers. The Integration Service process maintains a database connection pool for stored procedures or lookup databases in a workflow. The Integration Service process allows an unlimited number of connections to lookup or stored procedure databases. If a database user does not have permission for the number of connections a session requires, the session fails. You can optionally set a parameter to limit the database connections. For a session, the Integration Service process holds the connection as long as it needs to read data from source tables or write data to target tables.

Table summarizes the software you need to connect the Integration Service to the platform components, source databases, and target databases:-

Integration Service Connection

Connectivity Requirement

PowerCenter Client

TCP/IP

Other Integration Service Processes

TCP/IP

Repository Service

TCP/IP

Source and target databases

Native database drivers or ODBC

Note: Both the Windows and UNIX versions of the Integration Service can use ODBC drivers to connect to databases. Use native drivers to improve performance.

Integration Service Process The Integration Service starts an Integration Service process to run and monitor workflows. The Integration Service process is also known as the pmserver process. The Integration Service process accepts requests from the PowerCenter Client and from pmcmd. It performs the following tasks:

*  Manages workflow scheduling.

*  Locks and reads the workflow.

*  Reads the parameter file.

*  Creates the workflow log.

*  Runs workflow tasks and evaluates the conditional links connecting tasks.

*  Starts the DTM process or processes to run the session.

*  Writes historical run information to the repository.

*  Sends post-session email in the event of a DTM failure.

Managing Workflow Scheduling The Integration Service process manages workflow scheduling in the following situations:

When you start the Integration Service. When you start the Integration Service, it queries the repository for a list of workflows configured to run on it.

When you save a workflow. When you save a workflow assigned to an Integration Service to the repository, the Integration Service process adds the workflow to or removes the workflow from the schedule queue.

Locking and Reading the Workflow When the Integration Service process starts a workflow, it requests an execute lock on the workflow from the repository. The execute lock allows the Integration Service process to run the workflow and prevents you from starting the workflow again until it completes. If the workflow is already locked, the Integration Service process cannot start the workflow. A workflow may be locked if it is already running.

The Integration Service process also reads the workflow from the repository at workflow run time. The Integration Service process reads all links and tasks in the workflow except sessions and worklet instances. The Integration Service process reads session instance information from the repository. The DTM retrieves the session and mapping from the repository at session run time. The Integration Service process reads worklets from the repository when the worklet starts.

Reading the Parameter File When the workflow starts, the Integration Service process checks the workflow properties for use of a parameter file. If the workflow uses a parameter file, the Integration Service process reads the parameter file and expands the variable values for the workflow and any worklets invoked by the workflow.

The parameter file can also contain mapping parameters and variables and session parameters for sessions in the workflow, as well as service and service process variables for the service process that runs the workflow. When starting the DTM, the Integration Service process passes the parameter file name to the DTM.

Creating the Workflow Log The Integration Service process creates a log for the workflow. The workflow log contains a history of the workflow run, including initialization, workflow task status, and error messages. You can use information in the workflow log in conjunction with the Integration Service log and session log to troubleshoot system, workflow, or session problems.

Running Workflow Tasks The Integration Service process runs workflow tasks according to the conditional links connecting the tasks. Links define the order of execution for workflow tasks. When a task in the workflow completes, the Integration Service process evaluates the completed task according to specified conditions, such as success or failure. Based on the result of the evaluation, the Integration Service process runs successive links and tasks.

Running Workflows Across the Nodes in a Grid When you run an Integration Service on a grid, the service processes run workflow tasks across the nodes of the grid. The domain designates one service process as the master service process. The master service process monitors the worker service processes running on separate nodes. The worker service processes run workflows across the nodes in a grid.

Starting the DTM Process When the workflow reaches a session, the Integration Service process starts the DTM process. The Integration Service process provides the DTM process with session and parameter file information that allows the DTM to retrieve the session and mapping metadata from the repository. When you run a session on a grid, the worker service process starts multiple DTM processes that run groups of session threads.

When you use operating system profiles, the Integration Services starts the DTM process with the system user account you specify in the operating system profile.

Load Balancer

The Load Balancer is a component of the Integration Service that dispatches tasks to achieve optimal performance and scalability. When you run a workflow, the Load Balancer dispatches the Session, Command, and predefined Event-Wait tasks within the workflow. The Load Balancer matches task requirements with resource availability to identify the best node to run a task. It dispatches the task to an Integration Service process running on the node. It may dispatch tasks to a single node or across nodes.

The Load Balancer dispatches tasks in the order it receives them. When the Load Balancer needs to dispatch more Session and Command tasks than the Integration Service can run, it places the tasks it cannot run in a queue. When nodes become available, the Load Balancer dispatches tasks from the queue in the order determined by the workflow service level.

The concepts describe Load Balancer functionality:

Dispatch process. The Load Balancer performs several steps to dispatch tasks, more information, have a look Dispatch Process.

Resources. The Load Balancer can use PowerCenter resources to determine if it can dispatch a task to a node.  more information, have a look Resources.

Resource provision thresholds. The Load Balancer uses resource provision thresholds to determine whether it can start additional tasks on a node.  more information, have a look Resource Provision Thresholds.

Dispatch mode. The dispatch mode determines how the Load Balancer selects nodes for dispatch. For more information, see Dispatch Mode.

Service levels. When multiple tasks are waiting in the dispatch queue, the Load Balancer uses service levels to determine the order in which to dispatch tasks from the queue.  more information, have a look Service Levels.

Dispatch Process The Load Balancer uses different criteria to dispatch tasks depending on whether the Integration Service runs on a node or a grid.

Dispatching Tasks on a Node When the Integration Service runs on a node, the Load Balancer performs the following steps to dispatch a task:

1.  The Load Balancer checks resource provision thresholds on the node. If dispatching the task causes any threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later.

The Load Balancer checks different thresholds depending on the dispatch mode.

2.  The Load Balancer dispatches all tasks to the node that runs the master Integration Service process.

Dispatching Tasks Across a Grid When the Integration Service runs on a grid, the Load Balancer performs the following steps to determine on which node to run a task:

1.  The Load Balancer verifies which nodes are currently running and enabled.

2.  If you configure the Integration Service to check resource requirements, the Load Balancer identifies nodes that have the PowerCenter resources required by the tasks in the workflow.

3.  The Load Balancer verifies that the resource provision thresholds on each candidate node are not exceeded. If dispatching the task causes a threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later.

The Load Balancer checks thresholds based on the dispatch mode.

4.  The Load Balancer selects a node based on the dispatch mode.

Resources You can configure the Integration Service to check the resources available on each node and match them with the resources required to run the task. If you configure the Integration Service to run on a grid and to check resources, the Load Balancer dispatches a task to a node where the required PowerCenter resources are available. Example, if a session uses an SAP source, the Load Balancer dispatches the session only to nodes where the SAP client is installed. If no available node has the required resources, the Integration Service fails the task.

You configure the Integration Service to check resources in the Administration Console.

You define resources available to a node in the Administration Console. You assign resources required by a task in the task properties.

The Integration Service writes resource requirements and availability information in the workflow log.

Resource Provision Thresholds The Load Balancer uses resource provision thresholds to determine the maximum load acceptable for a node. The Load Balancer can dispatch a task to a node when dispatching the task does not cause the resource provision thresholds to be exceeded.

The Load Balancer checks the following thresholds:

Maximum CPU Run Queue Length. The maximum number of runnable threads waiting for CPU resources on the node. The Load Balancer excludes the node if the maximum number of waiting threads is exceeded.

The Load Balancer checks this threshold in metric-based and adaptive dispatch modes.   Maximum Memory %. The maximum percentage of virtual memory allocated on the node relative to the total physical memory size. The Load Balancer excludes the node if dispatching the task causes this threshold to be exceeded.

The Load Balancer checks this threshold in metric-based and adaptive dispatch modes.   Maximum Processes. The maximum number of running processes allowed for each Integration Service process that runs on the node. The Load Balancer excludes the node if dispatching the task causes this threshold to be exceeded.

The Load Balancer checks this threshold in all dispatch modes. If all nodes in the grid have reached the resource provision thresholds before any PowerCenter task has been dispatched, the Load Balancer dispatches tasks one at a time to ensure that PowerCenter tasks are still executed.

You define resource provision thresholds in the node properties in the Administration Console.

Related Topics:   Defining Resource Provision Thresholds

Dispatch Mode The dispatch mode determines how the Load Balancer selects nodes to distribute workflow tasks. The Load Balancer uses the following dispatch modes:

Round-robin. The Load Balancer dispatches tasks to available nodes in a round-robin fashion. It checks the Maximum Processes threshold on each available node and excludes a node if dispatching a task causes the threshold to be exceeded. This mode is the least compute-intensive and is useful when the load on the grid is even and the tasks to dispatch have similar computing requirements.

Metric-based. The Load Balancer evaluates nodes in a round-robin fashion. It checks all resource provision thresholds on each available node and excludes a node if dispatching a task causes the thresholds to be exceeded. The Load Balancer continues to evaluate nodes until it finds a node that can accept the task. This mode prevents overloading nodes when tasks have uneven computing requirements.

Adaptive. The Load Balancer ranks nodes according to current CPU availability. It checks all resource provision thresholds on each available node and excludes a node if dispatching a task causes the thresholds to be exceeded. This mode prevents overloading nodes and ensures the best performance on a grid that is not heavily loaded.

When the Load Balancer runs in metric-based or adaptive mode, it uses task statistics to determine whether a task can run on a node. The Load Balancer averages statistics from the last three runs of the task to estimate the computing resources required to run the task. If no statistics exist in the repository, the Load Balancer uses default values.

In adaptive dispatch mode, the Load Balancer can use the CPU profile for the node to identify the node with the most computing resources.

You configure the dispatch mode in the domain properties in the Administration Console.

Service Levels Service levels establish priority among tasks that are waiting to be dispatched. When the Load Balancer has more Session and Command tasks to dispatch than the Integration Service can run at the time, the Load Balancer places the tasks in the dispatch queue. When nodes become available, the Load Balancer dispatches tasks from the queue. The Load Balancer uses service levels to determine the order in which to dispatch tasks from the queue.

Each service level has the following properties:  Name.

Name of the service level.

Dispatch priority. A number that establishes the task priority in the dispatch queue. The Load Balancer dispatches tasks with high priority before it dispatches tasks with low priority. When multiple tasks in the queue have the same dispatch priority, the Load Balancer dispatches the tasks in the order it receives them.

Maximum dispatch wait time. The amount of time a task can wait in the dispatch queue before the Load Balancer changes its dispatch priority to the maximum priority. This ensures that no task waits forever in the dispatch queue.

You create and edit service levels in the domain properties in the Administration Console. You assign service levels to workflows in the workflow properties in the Workflow Manager.

D                 T                 M

Data Transformation Manager (DTM) Process The Integration Service process starts the DTM process to run a session. The DTM process is also known as the pmdtm process. The DTM is the process associated with the session task. The DTM process performs the tasks:

-Retrieves and validates session information from the repository.

-Performs pushdown optimization when the session is configured for pushdown optimization.

-Adds partitions to the session when the session is configured for dynamic partitioning.

-Forms partition groups when the session is configured to run on a grid.

-Expands the service process variables, session parameters, and mapping variables and parameters.

-Creates the session log.

-Validates source and target code pages.

-Verifies connection object permissions.

-Runs pre-session shell commands, stored procedures, and SQL.

-Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid.

-Creates and runs mapping, reader, writer, and transformation threads to extract, transform, and load data.

-Runs post-session stored procedures, SQL, and shell commands.

-Sends post-session email.

Note: If you use operating system profiles, the Integration Service runs the DTM process as the operating system user you specify in the operating system profile.                                                                                            Next will be Processing Threads in Informatica Page …

Your comments are always welcome here ….                                                                                                 Skip to top

Mehboob

MCTS – BI

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s