Q1. What Is Rdbms?
Relational Data Base Management Systems (RDBMS) are database management systems that maintain data records and indices in tables. Relationships may be created and maintained across and among the data and tables. In a relational database, relationships between data items are expressed by means of tables. Interdependencies among these tables are expressed by data values rather than by pointers. This allows a high degree of data independence. An RDBMS has the capability to recombine the data items from different files, providing powerful tools for data usage.
Q2. What is a table?
The data in an RDBMS is stored in database objects which are called as tables. This table is basically a collection of related data entries and it consists of numerous columns and rows.
Remember, a table is the most common and simplest form of data storage in a relational database. The following program is an example of a CUSTOMERS table −
| ID | NAME | AGE | ADDRESS | SALARY |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
Q3. What Is Normalization?
Database normalization is a data design and organization process applied to data structures based on rules that help build relational databases. In relational database design, the process of organizing data to minimize redundancy. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Q4. What are E-R diagrams?
E-R diagram also termed as Entity-Relationship diagram shows relationship between various tables in the database.
Q5. What is a field?
Every table is broken up into smaller entities called fields. The fields in the CUSTOMERS table consist of ID, NAME, AGE, ADDRESS and SALARY.
A field is a column in a table that is designed to maintain specific information about every record in the table.
Q6. What Are Different Normalization Forms?
1NF: Eliminate Repeating Groups Make a separate table for each set of related attributes, and give each table a primary key. Each field contains at most one value from its attribute domain.
2NF: Eliminate Redundant Data If an attribute depends on only part of a multi-valued key, remove it to a separate table.
3NF: Eliminate Columns Not Dependent On Key If attributes do not contribute to a description of the key, remove them to a separate table. All attributes must be directly dependent on the primary key.
BCNF: Boyce-Codd Normal Form If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables.
4NF: Isolate Independent Multiple Relationships No table may contain two or more 1:n or n:m relationships that are not directly related.
5NF: Isolate Semantically Related Multiple Relationships There may be practical constrains on information that justify separating logically related many-to-many relationships.
ONF: Optimal Normal Form A model limited to only simple (elemental) facts, as expressed in Object Role Model notation.
DKNF: Domain-Key Normal Form A model free from all modification anomalies. Remember, these normalization guidelines are cumulative. For a database to be in 3NF, it must first fulfill all the criteria of a 2NF and 1NF database.
Q7. What is a Record or a Row?
A record is also called as a row of data is each individual entry that exists in a table. For example, there are 7 records in the above CUSTOMERS table. Following is a single row of data or record in the CUSTOMERS table −
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
A record is a horizontal entity in a table.
Q8. What Is Stored Procedure?
A stored procedure is a named group of SQL statements that have been previously created and stored in the server database. Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data. And when the procedure is modified, all clients automatically get the new version. Stored procedures reduce network traffic and improve performance. Stored procedures can be used to help ensure the integrity of the database.
e.g. sp_helpdb, sp_renamedb, sp_depends etc.
Q9. What is a column?
A column is a vertical entity in a table that contains all information associated with a specific field in a table.
For example, a column in the CUSTOMERS table is ADDRESS, which represents location description and would be as shown below −
| ADDRESS |
| Ahmedabad |
| Delhi |
| Kota |
| Mumbai |
| Bhopal |
| MP |
| Indore |
Q10. What Is Trigger?
A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS. Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion. A trigger cannot be called or executed; the DBMS automatically fires the trigger as a result of a data modification to the associated table. Triggers can be viewed as similar to stored procedures in that both consist of procedural logic that is stored at the database level. Stored procedures, however, are not event-drive and are not attached to a specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the procedure while triggers are implicitly executed. In addition, triggers can also execute stored procedures.
Q11. What is a NULL value?
A NULL value in a table is a value in a field that appears to be blank, which means a field with a NULL value is a field with no value.
It is very important to understand that a NULL value is different than a zero value or a field that contains spaces. A field with a NULL value is the one that has been left blank during a record creation.
Q12. What Is View?
A simple view can be thought of as a subset of a table. It can be used for retrieving data, as well as updating or deleting rows. Rows updated or deleted in the view are updated or deleted in the table the view was created with. It should also be noted that as data in the original table changes, so does data in the view, as views are the way to look at part of the original table. The results of using a view are not permanently stored in the database. The data accessed through a view is actually constructed using standard T-SQL select command and can come from one to many different base tables or even other views.
Q13. What are SQL Constraints?
Constraints are the rules enforced on data columns on a table. These are used to limit the type of data that can go into a table. This ensures the accuracy and reliability of the data in the database.
Constraints can either be column level or table level. Column level constraints are applied only to one column whereas, table level constraints are applied to the entire table.
Following are some of the most commonly used constraints available in SQL −
- NOT NULL Constraint − Ensures that a column cannot have a NULL value.
- DEFAULT Constraint − Provides a default value for a column when none is specified.
- UNIQUE Constraint − Ensures that all the values in a column are different.
- PRIMARY Key − Uniquely identifies each row/record in a database table.
- FOREIGN Key − Uniquely identifies a row/record in any another database table.
- CHECK Constraint − The CHECK constraint ensures that all values in a column satisfy certain conditions.
- INDEX − Used to create and retrieve data from the database very quickly.
Q14. What Is Index?
An index is a physical structure containing pointers to the data. Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes, they are just used to speed up queries. Effective indexes are one of the best ways to improve performance in a database application. A table scan happens when there is no index available to help a query. In a table scan SQL Server examines every row in the table to satisfy the query results. Table scans are sometimes unavoidable, but on large tables, scans have a terrific impact on performance. Clustered indexes define the physical sorting of a database table’s rows in the storage media. For this reason, each database table may have only one clustered index. Non-clustered indexes are created outside of the database table and contain a sorted list of references to the table itself.
Q15. List the categories of Data Integrity
The following categories of data integrity exist with each RDBMS −
- Entity Integrity −There are no duplicate rows in a table.
- Domain Integrity −Enforces valid entries for a given column by restricting the type, the format, or the range of values.
- Referential integrity −Rows cannot be deleted, which are used by other records.
- User-Defined Integrity −Enforces some specific business rules that do not fall into entity, domain or referential integrity.
Q16. What Is Database?
A database is a logically coherent collection of data with some inherent meaning, representing some aspect of real world and which is designed, built and populated with data for a specific purpose.
Q17. Explain Database Normalization
Database normalization is the process of efficiently organizing data in a database. There are two reasons of this normalization process −
- Eliminating redundant data, for example, storing the same data in more than one table.
- Ensuring data dependencies make sense.
Both these reasons are worthy goals as they reduce the amount of space a database consumes and ensures that data is logically stored. Normalization consists of a series of guidelines that help guide you in creating a good database structure.
Normalization guidelines are divided into normal forms; think of a form as the format or the way a database structure is laid out. The aim of normal forms is to organize the database structure, so that it complies with the rules of first normal form, then second normal form and finally the third normal form.
It is your choice to take it further and go to the fourth normal form, fifth normal form and so on, but in general, the third normal form is more than enough.
- First Normal Form (1NF)
- Second Normal Form (2NF)
- Third Normal Form (3NF)
Q18. What Is Dbms?
It is a collection of programs that enables user to create and maintain a database. In other words it is general-purpose software that provides the users with the processes of defining, constructing and manipulating the database for various applications.
Q19. What’s difference between DBMS and RDBMS ?
DBMS provides a systematic and organized way of storing, managing and retrieving from collection of logically related information. RDBMS also provides what DBMS provides but above that it provides relationship integrity. So in short we can say
RDBMS = DBMS + REFERENTIAL INTEGRITY
These relations are defined by using “Foreign Keys” in any RDBMS.Many DBMS companies claimed there DBMS product was a RDBMS compliant, but according to industry rules and regulations if the DBMS fulfills the twelve CODD rules it’s truly a RDBMS. Almost all DBMS (SQL SERVER, ORACLE etc) fulfills all the twelve CODD rules and are considered as truly RDBMS.
Q20. What Is A Database System?
The database and DBMS software together is called as Database system.
Q21. What are CODD rules?
In 1969 Dr. E. F. Codd laid down some 12 rules which a DBMS should adhere in order to get the logo of a true RDBMS.
Rule 1:Information Rule- “All information in a relational data base is represented explicitly at the logical level and in exactly one way – by values in tables.”
Rule 2:Guaranteed access Rule- “Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.”In flat files we have to parse and know exact location of field values. But if a DBMS is truly RDBMS you can access the value by specifying the table name, field name, for instance Customers.Fields [‘Customer Name’].
Rule 3: Systematic treatment of null values- “Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.”.
Rule 4: Dynamic on-line catalog based on the relational model- “The data base description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.”The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tell you the structure of the database.
Rule 5: Comprehensive data sub-language Rule- “A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items
Data Manipulation (Interactive and by program).
Transaction boundaries ( Begin , commit and rollback)
Rule 6: View updating Rule- “All views that are theoretically updatable are also updatable by the system.”
Rule 7: High-level insert, update and delete- “The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data.”
Rule 8: Physical data independence- “Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.”
Rule 9: Logical data independence- “Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit un-impairment are made to the base tables.”
Rule 10: Integrity independence- “Integrity constraints specific to a particular relational data base must be definable in the relational data sub-language and storable in the catalog, not in the application programs.”
Rule 11: Distribution independence- “A relational DBMS has distribution independence.”
Rule 12: Non-subversion Rule- “If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).”
Q22. You are an experienced Data Warehouse architect, designer and developer. What are the main lessons you have learned in your career?
I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood. I believe this stems from a lack of understanding of the requirement – the second most important lesson.
Everyone from the senior stakeholders to architects, designers and developers need to fully understand the goal. Not the solution, but the “problem we’re trying to solve”. End users never ask for what they need (the requirement), but instead, they describe a potential solution. IT professionals are by nature delivery focused, and, get frustrated when it appears “the user changed their mind”. I find the user seldom changes their mind. In reality, the requirement was never fully understood.
To summarise. Focus on the problem not the solution. Then (once understood), suggest a dozen solutions and pick the best one. But keep it simple.
Q23. How many types of relationship exist in database designing?
There are three major relationship models:-
Q24. RDBMS vs. NoSQL vs. NewSQL: How do you compare Database Technologies?
The traditional RDBMS solutions from Oracle, IBM and Microsoft implement the relational model on a 1970s hardware architecture, and typically provide a good general purpose database platform which can be applied to most OLTP and Data Warehouse use cases.
However, as Dr. Michael Stonebraker indicated in this 2007 paper, The End of an Architectural Era (It’s Time for a Complete Rewrite), these are no longer fit for purpose, as both the hardware technology, and processing demands have moved on. In particular, the need for real time (millisecond) performance, greater scalability to handle web-scale applications, and the need to handle unstructured and semi-structured data.
Q25. Whereas the legacy RDBMS is a general purpose (will do anything) database, the NoSQL and NewSQL solutions are dedicated to a single problem, for example, short lived OLTP operations.
The Key-Value NoSQL databases were developed to handle the massive transaction volume, and low latency needed to handle web commerce at Amazon and LinkedIn. Others (eg. MongoDB) where developed to handle semi-structured data, while still others (eg. Neo4J) were built to efficiently model data networks of the type found at Facebook or LinkedIn.
The common thread with NoSQL databases is they tend to use an API interface rather than industry standard SQL, although increasingly that’s changing. They do however, entirely reject the relational model and ACID compliance. They typically don’t support cross-table join operations, and are entirely focused on low latency, trading consistency for scalability.
The so-called NewSQL databases include VoltDB , MemSQL and CockroachDB are a return to the relational model, but re-architected for modern hardware and web scale use cases. Like NoSQL solutions, they tend to run on a shared nothing architecture, and scale to millions of transactions per second, but they also have full transaction support and ACID compliance that are critical for financial operations.
Q26. What is denormalization ?
Denormalization is the process of putting one fact in numerous places (its vice-versa of normalization).Only one valid reason exists for denormalizing a relational design – to enhance performance.The sacrifice to performance is that you increase redundancy in database.
Q27. What are the typical trade-off of performance and consistency when using NoSQL and NewSQL databases to support high velocity OLTP and real time analytics?
The shared nothing architecture is built to support horizontal scalability, and when combined with data replication, can provide high availability and good performance. If one node in the cluster fails, the system continues, as the data is available on other nodes. The NoSQL database is built upon this architecture, and to maximize throughput, ACID compliance is relaxed in favor of Eventual Consistency, and in some cases (eg. Cassandra), it supports tunable consistency, allowing the developer to trade performance for consistency, and durability.
For example, after a write operation, the change cannot be considered durable (the D in ACID) until the change is replicated to at least one, ideally two other nodes , but this would increase latency, and reduce performance. It’s possible however, to relax this constraint, and return immediately, with the risk the change may be lost if the node crashes before the data is successfully replicated. This becomes even more of a potential issue if the node is temporarily disconnected from the network, but is allowed to continue accepting transactions until the connection is restored. In practice, consistency will be eventually be achieved when the connection is reestablished – hence the term Eventual Consistency.
A NewSQL database on the other hand accepts no such compromise, and some databases (eg. VoltDB), even support full serializability, executing transactions as if they were executed serially. Impressively, they manage this impressive feat at a rate of millions of transactions per second, potentially on commodity hardware.
Q28. Can you explain Fourth Normal Form and Fifth Normal Form ?
In fourth normal form it should not contain two or more independent multi-v about an entity and it should satisfy “Third Normal form”.
Fifth normal form deals with reconstructing information from smaller pieces of information. These smaller pieces of information can be maintained with less redundancy.
Q29. One of the main challenges for real time systems architects is the potentially massive throughput required which could exceed a million transactions per second. How do handle such a challenge?
The short answer is – with care! The longer answer is described in my article, Big Data – Velocity. I’d break the problem into three components, Data Ingestion, Transformation and Storage.
Data ingestion requires message based middleware (eg. Apache Kafka), with a range of adapters and interfaces, and the ability to smooth out the potentially massive spikes in velocity, with the ability to stream data to multiple targets.
Transformation, typically requires an in-memory data streaming solution to restructure and transform data in near-real time. Options include Spark Streaming, Storm or Flink.
Storage and Analytics is sometimes handled by a NoSQL database, but for application simplicity (avoiding the need to implement transactions or handle eventual consistency problems in the application), I’d recommend a NewSQL database.
All the low-latency, high throughput of the NoSQL solutions, but with the flexibility and ease of a full relational database, and full SQL support.
In conclusion, the solution needs to abandon the traditional batch oriented solution in favour of an always-on streaming solution with all processing in memory.
Q30. Have you heard about sixth normal form?
If we want relational system in conjunction with time we use sixth normal form. At this moment SQL Server does not supports it directly.
Q31. Michael Stonebraker introduced the so called “One Size no longer fits all”-concept. Has this concept come true on the database market?
First stated in the paper One Size Fits All – An Idea Whose Time Has Come And Gone Dr. Michael Stonebraker argued that the legacy RDBMS dominance was at an end, and would be replaced by specialized database technology including stream processing, OLTP and Data Warehouse solutions.
Certainly disruption in the Data Warehouse database market has been accelerated with the move towards the cloud, and as this Gigaom Report illustrates, there are at least nine major players in the market, with new specialized tools including Google Big Query, Amazon Redshift and Snowflake, and the column store (in memory or on secondary storage) dominates.
Finally, the explosion of specialized NoSQL and NewSQL databases, each with its own specialty including Key-Value, Document Stores, Text Search and Graph databases lend credence to the statement “One Size no longer fits all”.
I do think however, we’re still in a transformation stage, and the shake-out is not yet complete. I think a lot of large corporations (especially Financial Services) are wary of change, but it’s already happening.