- What is the difference between the ETL and ELT?
Ans: ETL:
Extract, Transform, and load(ETL) is a process that involves extracting data from outside source, transforming it to fit operational needs (sometimes using staging tables), then loading it into the end target database or data warehouse. This approach is reasonable as long as many different databases are involved in your data warehouse landscape. In this scenario you have to transport data from one place to another anyway, so it’s a legitimate way to do the transformation work in a separate specialized engine.
ELT:
Extract, Load, Transform(ELT) is a process where data is extracted, then loaded into staging table in the database, transforming it. Where it sits in the database and then loading it into the target database or data warehouse.
- What is thew use of tLoqateAddressRow component in Talend?
Ans: This Component is used to correct mailing addresses associated with customer data to ensure a single customer view and better delivery for their customer mailings.
- What do you understand by MDM in Talend?
Ans: Master Data Management, through which an organization builds and manages a single, consistent, accurate view of key enterprise data, has demonstrated substantial business value including improvements to operational efficiency, marketing effectiveness, strategic planning and regulatory compliance. To data, however, MDM has been the privilege of a relatively small number of large, resource-rich organizations. Thwarted by the prohibitive costs of proprietary MDM software and the great difficulty of building and maintaining an in-house MDM solution, most organization have had to forego MDM despite its clear value.
- What’s new in v5.6?
Ans: This technical note highlights the important new features and capabilities of version 5.6 of Talend’s comprehensive suite of Platform, Enterprise and Open Studio solutions.
With version 5.6 Talend:
- Extends it big data leadership position enabling firms to move beyond batch processing and into real-time big data by providing technical previews of the Apache Spark, Apache Spark Streaming and Apache Storm frameworks.
- Enhances its support for the Internet of Things (loT) by introducing support for key loT protocols (MQTT, AMQP) to gather and collect information from machines, sensors, or other devices.
- Improves Big Dta performance: Map Reduce executes on average 24% faster in v5.6 and 53% faster than in v5.4, while Big Data profiling performance is typically 20 times faster in v5.6 compared to v5.5.
- Enables faster updates to MDM data models and provides deeper control of data lineage, more visibility and control.
- Offers further enterprise application connectivity and support by continuing to add to its extensive list of over 800 connectors and components with enhanced support for enterprise applications such as SAP BAPI and Tables, Oracle 12 GoldenGate CDC, Microsoft HDInsight, Marketo and Salesforce.com
- What is the advantage of Talend?
Ans: Talend is cost-effective, easy to use, readily adaptable and extremely versatile. With the help of the graphical user interface we can easily and quickly link up a large number of source systems using the standard connectors.
- Describe the ETL process?
Ans: Extraction, Transformation and Loading (ETL) processes are critical components for feeding a data warehouse, a business intelligence system, or a big data platform. While mostly invisible to users of a business intelligence platform, an ETL process retrieves data from operational systems and pre-processes it for further analysis by reporting and analytics tools. The accuracy and timeliness of the entire business intelligence platform rely on ETL processes, specifically:
- Extraction of the data from production applications and databases (ERP, CRM, RDBMS, files, etc.)
- Transformation of this data to reconcile it across source systems, perform calculations or string parsing, enrich it with external lookup information, and also match the format required by the target system (third normal form, star schema, slowly changing dimensions, etc.)
- Loading of the resulting data into The business intelligence (BI) applications: Data Warehouse or Enterprise Data Warehouse, Data Marts, Online Analytical Processing (OLAP) applications or “cubes”, etc.
- What is tJoin?
Ans: tJoin joins two tables by doing an exact match on several columns. It compares columns from the main flow with reference columns from the lookup flows and outputs the main flow data and/or the rejected data.
- What is tDenormalizeSortedRow?
Ans: tDenormalizeSortedRow combines in a group all input sorted rows. Distinct values of the denormalized sorted row are joined with item separators. tDenormalizeSortedRow helps synthesizing sorted input flow to save memory.
- Which Talend component is used for data transform using buitl in .NET classes?
Ans: tDotNETRow helps you facilitate data transform by utilizing custom or built-in .NET classes.
- What is tJoin?
Ans: tJoin joins two tables by doing an exact match on several columns. It compares columns from the main flow with reference columns from the lookup flow and outputs the main flow data and/or the rejected data.