1.What is Datastage?
Ans: DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and IBM InfoSphere. It uses a graphical notation to construct data integration solutions and is available in various versions such as the Server Edition, the Enterprise Edition, and the MVS Edition.
2.What are the types of stages?
- General Objects
- Stages of Data Quality
- Development and Debug Stages
- Database connectors
- Restructure stages
- Real-time stages
- Debug and Development stages
- Sequence activities
3. What are the components of DataStage?
Ans: DataStage has the number of client and server components. It has four main components, namely:
- Datastage Designer
- Datastage Director
- Datastage Manager
- Datastage Administrator
4 Explain a few features of DataStage.
- Extracts data from any number or types of database.
- Handles all the metadata definitions required to define your data warehouse.
- You can view and modify the table definitions at any point during the design of your application.
- Aggregates data
- You can modify SQL SELECT statements used to extract data
- DataStage transforms data easily. It has a set of predefined transforms and functions.
- You can use it to convert your data and you can easily extend the functionality by defining your own transforms to use.
- Loads the data warehouse
5. What are the jobs available in DataStage?
- Server job
- Parallel job
- Sequencer job
- Container job
6.Describe the Architecture of DataStage?
Ans: DataStage follows the client-server model. It has different types of client-server architecture for different versions of DataStage.
DataStage architecture contains the following components.
- Client Components
7.How to read multiple files using a single DataStage job if files have the same metadata?
- Search if the metadata of files is different or same then specify file names in the sequential stage.
- Attach the metadata with a sequential stage in its properties.
- Select Read method as ‘Specific File(s)’then add all files by selecting ‘file’ property from the ‘available properties to add.’
It will look like:
- File= /home/myFile1.txt
- File= /home/myFile2.txt
- File= /home/myFile3.txt
- Read Method= Specific file(s) fcec
8. Explain IBM InfoSphere information server and highlight its main features?
IBM InfoSphere Information Server is a leading data integration platform which contains a group of products that enable you to understand, filter, monitor, transform, and deliver data. The scalable solution facilitates with massively parallel processing capabilities to help you to manage small and massive data volumes. It assists you in forwarding reliable information to your key business goals such as big data and analytics, data warehouse modernization, and master data management.
Features of IBM InfoSphere information server
- IBM InfoSphere can connect with multiple source systems as well as write to various target systems. It acts as a single platform for data integration.
- It is based on centralized layers. All the modules of the suit can share the baseline architecture of the suite.
- It has some additional layers for the unified repository, for integrated metadata services, and sharing a parallel engine.
- It has tools for analysis, monitoring, cleansing, transforming and delivering data.
- It has extremely parallel processing capabilities that provide high-speed processing.
9.What is IBM DataStage Flow Designer?
Ans: IBM DataStage Flow Designer allows you to create, edit, load, and run jobs in DataStage. DFD is a thin client, web-based version of DataStage. Its a web-based UI for DataStage than DataStage Designer, which is a Window-based thick client.
10.How is a DataStage source file filled?
Ans: We can develop a SQL query or we can use a row generator extract tool through which we can fill the source file in DataStage.
11. How is merging done in DataStage?
Ans: In DataStage, merging is done when two or more tables are expected to be combined based on their primary key column.
12.What is a routine in DataStage?
Ans: DataStage Manager defines a collection of functions within a routine. There are basically three types of routines in DataStage, namely, job control routine, before/after subroutine, and transform function.
13.What is the process for removing duplicates in DataStage?
Ans: bDuplicates in DataStage can be removed using the sort function. While running the sort function, we need to specify the option which allows for duplicates by setting it to false.
14.How do you start developing a Datastage project?
Ans: The very first step is to create a Datastage job on the Datastage server. All the Datastage objects that we create are stored in the Datastage project. A Datastage project is a separate environment on the server for jobs, tables, definitions, and routines.
A Datastage project is a separate environment on the server for jobs, tables, definitions, and routines.
15.What is a DataStage job?
Ans: The Datastage job is simply a DataStage code that we create as a developer. It contains different stages linked together to define data and process flow.
Stages are nothing but the functionalities that get implemented.
For Example: Let’s assume that I want to do a sum of the sales amount. This can be a ‘group by’ operation that will be performed by one stage.
Now, I want to write the result to a target file. So, this operation will be performed by another stage. Once, I have defined both the stages, I need to define the data flow from my ‘group by’ stage to the target file stage. This data flow is defined by DataStage links.
Once, I have defined both the stages, I need to define the data flow from my ‘group by’ stage to the target file stage. This data flow is defined by DataStage links.
16.Name the different sorting methods in Datastage.
Ans: There are two methods available:
- Link sort
- Inbuilt Datastage Sort
17.How do you import and export the Datastage jobs?
Ans: For this, below command-line functions for this
- Import: dsimport.exe
- Export: dsexport.exe
18. What are routines in Datastage? Enlist various types of routines.
Ans: Routine is a set of functions that are defined by the DS manager. It is run via the transformer stage.
There are 3 kinds of routines:
- Parallel routines
- Mainframe routines
- Server routines
19.What is the advantage of using Modular development in data stage?
Ans: We should aim to use modular development techniques in your job designs in order to maximize the reuse of parallel jobs and components and save yourself time.
20. What is Link buffering?
Ans: InfoSphere DataStage automatically performs buffering on the links of certain stages. This is primarily intended to prevent deadlock situations arising (where one stage is unable to read its input because a previous stage in the job is blocked from writing to its output).
21. How do you import and export data into Datastage?
Ans: Here are the points how to import and export data into datastage
- The import/export utility consists of these operators:
- The import operator: imports one or more data files into a single data set.
- The export operator: exports a data set to one or more data files.
22.What is the quality state in DataStage?
Ans:The quality state is used for cleansing the data with the DataStage tool. It is a client-server software tool that is provided as part of the IBM Information Server.
23.What is a repository table in DataStage?
Ans: The term ‘repository’ is another name for a data warehouse. It can be centralized or distributed. The repository table is used for answering ad-hoc, historical, analytical, or complex queries.
24.How do we compare the Validated OK with the Compiled Process in DataStage?
Ans: The Compiled Process ensures that the important stage parameters are mapped and these are correct such that it creates an executable job. Whereas in the Validated OK, we make sure that the connections are valid.
25. Explain the feature of data type conversion in DataStage.
Ans: If we want to do data conversion in DataStage, then we can use the data conversion function. For this to be successfully executed, we need to ensure that the input or the output to and from the operator is the same, and the record schema needs to be compatible with the operator.
26.What are the different kinds of views available in a Datastage director?
Ans: There are 3 kinds of views available in the Datastage director. They are:
- Log view
- Status view
- Job view
27.What is the difference between passive stage and active stage?
Ans: Passive stages are utilized for extraction and loading whereas active stages are utilized for transformation.
28 What are the various kinds of containers available in Datastage?
Ans: We have below 2 containers in Datastage:
- Local container
- Shared container
29.What are the different types of jobs in Datastage?
Ans: We have two types of jobs in Datastage:
- Server jobs (They run in a sequential manner)
- Parallel jobs (They get executed in a parallel way)
30. What is the use of Datastage director?
Ans: Through Datastage director, we can schedule a job, validate the job, execute the job and monitor the job.