What is the use of BusinessObjects Data Services?
BusinessObjects Data Services provides a graphical interface that allows you to easily create jobs that extract data from heterogeneous sources, transform that data to meet the business requirements of your organization, and load the data into a single location.
What is the difference between OLTP and a Data warehouse?
Indexes − OLTP system has only few indexes while in an OLAP system there are many indexes for performance optimization.
Joins − In an OLTP system, large number of joins and data is normalized however in an OLAP system there are less joins and de-normalized.
Aggregation − In an OLTP system data is not aggregated while in an OLAP database more aggregations are used.
Why do we need a staging area in an ETL process?
There is a staging area that is required during ETL load. There are various reasons why a staging area is required −
As source systems are only available for specific period of time to extract data and this time is less than total data load time so Staging area allows you to extract the data from source system and keep it in staging area before time slot is ended.
Staging area is required when you want to get data from multiple data sources together. If you want to join two or more systems together Example- You will not be able to perform a SQL query joining two tables from two physically different databases.
Data extractions time slot for different systems vary as per the time zone and operational hours.
Data extracted from source systems can be used in multiple data warehouse system, Operation Data stores, etc.
During ETL you can perform complex transformations that allow you to perform complex transformations and require extra area to store the data.
What are SAP Data services?
SAP BO Data Services is an ETL tool used for Data integration, data quality, data profiling and data processing and allows you to integrate, transform trusted data to data warehouse system for analytical reporting.
BO Data Services consists of a UI development interface, metadata repository, data connectivity to source and target system and management console for scheduling of jobs.
Explain the architecture of BODS with Job Server, Repository Manager, and Management Console?
You can also divide BODS architecture in below layers −
Web Application Layer, Database Server Layer, Data Services Service Layer.
Define Data Services components.
Data Services includes the following standard components:
- Job Server
- Access Server
- Real-time Services
- Address Server
- Cleansing Packages, Dictionaries, and Directories
- Management Console [sociallocker]
What is a repository in BODS? What are the different types of Repositories in BODS?
Repository is used to store meta-data of objects used in BO Data Services. Each Repository should be registered in Central Management Console CMC and is linked with single or many job servers which is responsible to execute jobs that are created by you.
There are three types of Repositories −
Local Repository −
It is used to store the metadata of all objects created in Data Services Designer like project, jobs, data flow, work flow, etc.
Central Repository −
It is used to control the version management of the objects and is used for multiuse development. Central Repository stores all the versions of an application object so it allows you to move to previous versions.
Profiler Repository −
This is used to manage all the metadata related to profiler tasks performed in SAP BODS designer. CMS Repository stores metadata of all the tasks performed in CMC on BI platform. Information Steward Repository stores all the metadata of profiling tasks and objects created in information steward.
What is single object and reusable objects in Data services?
Reusable Objects −
Most of the objects that are stored in repository can be reused. When a reusable objects is defined and save in the local repository, you can reuse the object by creating calls to the definition. Each reusable object has only one definition and all the calls to that object refer to that definition. Now if definition of an object is changed at one place you are changing object definition at all the places where that object appears.
An object library is used to contain object definition and when an object is drag and drop from library, it means a new reference to an existing object is created.
Single Use Objects −
All the objects that are defined specifically to a job or data flow, they are called single use objects. Example-specific transformation used in any data load.
What is a Data Store in Data services designer and what are different types of Data Stores?
Datastore are used to setup connection between an application and database. You can directly create Datastore or can be created with help of adapters. Datastore allows an application/software to read or write metadata from an application or database and to write to that database or application.
You want to set up a new repository in BODS. How do you create it?
To create BODS Repository you need a database installed. You can use SQL Server, Oracle database, My SQL, SAP HANA, Sybase, etc. You have to create below users in database while installing BODS and to create Repositories. These users are required to login to different servers CMS Server, Audit Server. To create a new repository, you have to login to Repository manager.
What is a real time Job?
Real-time jobs “extract” data from the body of the real time message received and from any secondary sources used in the job.
How do you manage object versions in BODS?
Central repository is used to control the version management of the objects and is used for multiuse development. Central Repository stores all the versions of an application object so it allows you to move to previous versions.
You want to generate the quality reports in DS system, data validation, and documentation. Where you can see this?
Data Services Management Console
What is the template table?
In Data Services, you can create a template table to move to target system that has same structure and data type as source table.
How do you check the execution history of a job or a data flow?
DS Management Console → Job Execution History
What is SAP Data Services Designer? What are main ETL functions that can be performed in Designer tool?
It is a developer tool which is used to create objects consist of data mapping, transformation, and logic. It is GUI based and work as designer for Data Services.
You can create various objects using Data Services Designer like Project, Jobs, Work Flow, Data Flow, mapping, transformations, etc.
How do you check existing objects in DS repository?
In Object library in DS Designer
How do you improve the performance of data flows using memory datastore?
You can create Datastore using memory as database type. Memory Datastore are used to improve the performance of data flows in real time jobs as it stores the data in memory to facilitate quick access and doesn’t require to go to original data source.
A memory Datastore is used to store memory table schemas in the repository. These memory tables get data from tables in Relational database or using hierarchical data files like XML message and IDocs.
The memory tables remain alive till job executes and data in memory tables can’t be shared between different real time jobs.
What is linked data store? Explain with an example?
There are various database vendors which only provide one way communication path from one database to another database. These paths are known as database links. In SQL Server, Linked server allows one way communication path from one database to other.
Consider a local database Server name “Product” stores database link to access information on remote database server called Customer. Now users that are connected to remote database server Customer can’t use the same link to access data in database server Product. User that are connected to “Customer” should have a separate link in data dictionary of the server to access the data in Product database server.
This communication path between two databases are called database link and Datastores which are created between these linked database relationships is known as linked Datastores.
There is a possibility to connect Datastore to another Datastore and importing an external database link as option of Datastore.
You want to import application metadata into repository. How you can perform this?
Adapter Datastore allows you to import application metadata into repository. You can also access application metadata and you can also move batch and real time data between different applications and software.
What are the different types of files can be used as source and target file format?
- SAP Transport
- Unstructured Text
- Unstructured Binary
- Fixed Width
You want to extract data from an Excel work book. How you can do this?
You can use Microsoft Excel workbook as data source using file formats in Data Services. Excel work book should be available on Windows file system or Unix File system.
What is the use of data flow in DS?
Data flow is used to extract, transform and load data from source to target system. All the transformations, loading and formatting occurs in dataflow.
What are different objects that you can add to a dataflow?
What are the different properties that you can set for a data flow?
- Execute once
- Database links
What are the steps included in Data integration process?
- Stage data in an operational datastore, data warehouse, or data mart.
- Update staged data in batch or real-time modes.
- Create a single environment for developing, testing, and deploying the entire data integration platform.
- Manage a single metadata repository to capture the relationships between different extraction and access methods and provide integrated lineage and impact analysis.
Define the terms Job, Workflow, and Dataflow
- A job is the smallest unit of work that you can schedule independently for execution.
- A work flow defines the decision-making process for executing data flows.
- Data flows extract, transform, and load data. Everything having to do with data, including reading sources, transforming data, and loading targets, occurs inside a data flow.
Arrange these objects in order by their hierarchy: Dataflow, Job, Project, and Workflow.
Project, Job, Workflow, Dataflow.
Why do you use work flow in DS?
Workflows are used to determine the process for executing the workflows. Main purpose of workflow is to prepare for executing the data flows and to set the state of system once data flow execution is completed.
What are the different objects that you can add to work flow?
- Work flow
- Data flow
Try or Catch Blocks
Is it possible that a workflow call itself in Daa services job?
Give an example of work flow in production?
There is a fact table that you want to update and you have created a data flow with the transformation. Now If you want to move the data from source system, you have to check last modification for fact table so that you extract only rows that has been added after last update.
In order to achieve this, you have to create one script which determines last update date and then pass this as input parameter to data flow.
You also have to check if data connection to a particular fact table is active or not. If it is not active, you need to setup a catch block which automatically sends an email to administrator to notify about this problem.
What is the use of conditionals?
You can also add Conditionals to workflow. This allows you to implement If/Else/Then logic on the workflows.
What is a transformation in Data Services?
Transforms are used to manipulate data sets as inputs and creating one or multiple outputs. There are various transforms that can be used in Data Services.
What are the common transformations that are available in Data Services?
- Data Integration
- Data Quality
- Text data processing
What are different transformations under data integration?
- Table_Comparision, etc.
What is the use of query transformation?
This is most common transformation used in Data Services and you can perform below functions −
- Data filtering from sources
- Joining data from multiple sources
- Perform functions and transformations on data
- Column mapping from input to output schemas
- Assigning Primary keys
- Add new columns, schemas and functions resulted to output schemas
- As Query transformation is most commonly used transformation, so a shortcut is provided for this query in tool palette.
What is text data processing transformation?
This allows you to extract the specific information from large volume of text. You can search for facts and entities like customer, product, and financial facts specific to an organization.
This transform also checks the relationship between entities and allows the extraction.
The data extracted using text data processing can be used in Business Intelligence, Reporting, query, and analytics.
What is difference between text data processing and data cleansing?
Text data processing is used for finding relevant information from unstructured text data however data cleansing is used for standardization and cleansing structured data.
What is a real time job in Data Services?
You can create real time jobs to process real time messages in Data Services designer. Like a batch job, real time job extracts the data, transform and load it.
Each real time job can extract data from a single message or you can also extract data from other sources like tables or files.
Explain the different between real time and batch job in Data Services?
Transform like branches and control logic are used more often in real time job unlike the batch jobs in designer.
Real time jobs are not executed in response of a schedule or internal trigger unlike the batch jobs.
What is an embedded data flow?
Embedded data flow is known as data flows which are called from another data flow in the design. The embedded data flow can contain multiple number of source and targets but only one input or output pass data to main data flow.
What are the different types of embedded data flow?
One Input − Embedded data flow is added at the end of dataflow.
One Output − Embedded data flow is added at the beginning of a data flow.
No input or output − Replicate an existing data flow.
What are local and global variables in Data services job?
Local variables in data services are restricted to object in which they are created.
Global variables are restricted to jobs in which they are created. Using global variables, you can change values for default global variables at run time.
Expressions that are used in work flow and data flow they are called parameters.
All the variables and parameters in work flow and data flows are shown in variable and parameters window.
What are the different recovery mechanism that can be used in failed jobs?
Automatic Recovery – This allows you to run unsuccessful jobs in recovery mode.
Manually Recovery – This allows you to rerun the jobs without considering partial rerun previous time.
What is the use of Data Profiling?
Data Services Designer provides a feature of Data Profiling to ensure and improve the quality and structure of source data. Data Profiler allows you to −
Find anomalies in source data, validation and corrective action and quality of source data.
The structure and relationship of source data for better execution of jobs, work flows and data flows.
The content of source and target system to determine that your job returns the result as expected.
Explain the different performance optimization techniques in BODS?
The performance of an ETL job depends on the system on which you are using Data Services software, number of moves, etc. There are various other factors that contribute to the performance in an ETL task −
- Source Data Base
- Source Operating System
- Target Database
- Target Operating System
- Job Server OS
- BODs Repository Database
What do you understand by multiuser development in BODS? How do you manage multiuser development?
SAP BO Data Services support multi user development where each user can work on application in their own local repository. Each team uses central repository to save main copy of an application and all the versions of objects in the application.
You want to perform multiuser migration in SAP BODS. How you can perform this?
In SAP Data Services, job migration can be applied at different levels- Application Level, Repository Level, Upgrade level.
To copy the content of one central repository to other central repository, you can’t do it directly and you need to make use of local repository.
First is to get the latest version of all objects from central repository to local repository. Activate the central repository in which you want to copy the contents.
Add all the objects you want to copy from local repository to central repository.
Suppose you have updated the version of Data Services software? Is it required to update the repository version?
If you update version of SAP Data Services, there is a need to update version of Repository. Below points should be considered when migrating a central repository to upgrade version −
Take the backup of central repository all tables and objects.
To maintain version of objects in data services, maintain a central repository for each version. Create a new central history with new version of Data Services software and copy all objects to this repository.
It is always recommended if you install new version of Data Services, you should upgrade your central repository to new version of objects.
Also upgrade your local repository to same version as different version of central and local repository may not work at the same time.
Before migrating the central repository, check in all the objects. As you don’t upgrade central and local repository simultaneously, so there is a need to check in all the objects. As once you have your central repository upgraded to new version, you will not be able to check in objects from local repository which is having older version of Data Services.
What is slowly changing dimension?
SCDs are dimensions that have data that changes over time.
How do you manage slowly changing dimensions? What are the fields required in managing different types if SCD?
SCD Type 1 No history preservation
Natural consequence of normalization
SCD Type 2 Preserving all history and new rows
There are new rows generated for significant changes
You need to use of a unique key
There are new fields are generated to store history data
You need to manage an Effective_Date field.
SCD Type 3 Limited history preservation
In this only two states of data are preserved – current and old
Is file format in Data Services type of a data store?
No, File format is not a datastore type.