What are confirmed dimensions, junk dimension and degenerated dimensions?
Confirm dimension: It is the dimension which is sharable across the multiple facts or data model. This is also called as Role Playing Dimensions.
junk dimension: A number of very small dimensions might be lumped (a small irregularly shaped) together to form a single dimension, a junk dimension – the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.
Degenerated dimension: In this degenerate dimension contains their values in fact table and the dimension id not available in dimension table. Degenerated Dimension is a dimension key without corresponding dimension.
Example: In the PointOfSale Transaction Fact table, we have:
Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS Transaction Number
Date Dimension corresponds to Date Key, Production Dimension corresponds to Production Key. In a traditional parent-child database, POS Transactional Number would be the key to the transaction header record that contains all the info valid for the transaction as a whole, such as the transaction date and store identifier. But in this dimensional model, we have already extracted this info into other dimension. Therefore, POS Transaction Number looks like a dimension key in the fact table but does not have the corresponding dimension table.
What are the types of database schema?
They are 3 types of database schema they are
Star
Snowflake
Starflake
What are the difference between data mart and data warehouse?
Datawarehouse is complete data where as Data mart is Subset of the same.
Ex:
All the organisation data may related to finance department, HR, banking dept are stored in data warehouse where as in data mart only finance data or HR department data will be stored. So data warehouse is a collection of different data marts.
Splunk Technology Frequently Asked SSAS Interview Questions Answers |
Have you ever worked on performance tuning, if yes what are the steps involved in it?
We need to identify the bottlenecks to tune the performance, to overcome the bottleneck we need to following the following.
Avoid named queries
Unnecessary relationships between tables
Proper attribute relationships to be given
Proper aggregation design
Proper partitioning of data
Proper dimension usage design
Avoid unnecessary many to many relationships
Avoid unnecessary measures
Set AttributeHierarchyEnabled = FALSE to Attributes that is not required
Won’t take even single measure which is not necessary.
What are the difficulties faced in cube development?
This question is either to test whether you are really experienced or when he doesnot have any questions to ask ..
You can tell any area where you feel difficult to work. But always the best answers will be the following.
Giving attribute relationships
Calculations
Giving dimension usage (many to many relationship)
Analyzing the requirements
Explain the flow of creating a cube?
Steps to create a cube in ssas
Create a data source.
Create a datasource view.
Create Dimensions
Create a cube.
Deploy and Process the cube.
What is a datasource or DS?
The data source is the Physical Connection information that analysis service uses to connect to the database that host the data. The data source contains the connection string which specifies the server and the database hosting the data as well as any necessary authentication credentials.
What is datasourceview or DSV?
A data source view is a persistent set of tables from a data source that supply the data for a particular cube. BIDS also includes a wizard for creating data source views, which you can invoke by right-clicking on the Data Source Views folder in Solution Explorer.
Datasource view is the logical view of the data in the data source.
Data source view is the only thing a cube can see.
What is named calculation?
A named calculation is a SQL expression represented as a calculated column. This expression appears and behaves as a column in the table. A named calculation lets you extend the relational schema of existing tables or views in a data source view without modifying the tables or views in the underlying data source.
Named calculation is used to create a new column in the DSV using hard coded values or by using existing columns or even with both.
What is named query?
Named query in DSV is similar to View in Database. This is used to create Virtual table in DSV which will not impact the underlying database. Named query is mainly used to merge the two or more table in the datasource view or to filter columns of a table.
How you provide security to cube?
By defining roles we provide security to cubes. Using roles we can restrict users from accessing restricted data. Procedure as follows –
Define Role
Set Permission
Add appropriate Users to the role
How much time it take to Process the Cube?
This is Very very important question. This again depends on the SIZE of database,Complexity of the database and your server settings. For database with 50 cr transaction records, it generally takes 3.5 hrs.
How many Calculation you done in Your Project?
I answer more than 5000 and if you tell the same then you are caught unless you are super good in MDX. Best answer for you is “Worked on 50 calculations”.
How will you hide an attribute?
We can hide the attribute by selecting “AttributeHierarchyVisible = False” in properties of the attribute.
How will you make an attribute not process?
By selecting “ AttributeHierarchyEnabled = False”, we can make an attribute not in process.
Why we need named queries?
A named query is used to join multiple tables, to remove unnecessary columns from a table of a database. You can achieve the same in database using Views but this Named Queries will be the best bet whe you don’t have access to create Views in database.
How will you add a new column to an existing table in data source view?
By using named calculations we can add a new column to an existing table in the data source view. Named Calculation is explained above.
What is dimension table?
A dimension table contains hierarchical data by which you’d like to summarize. A dimension table contains specific business information, a dimension table that contains the specific name of each member of the dimension. The name of the dimension member is called an “attribute”
The key attribute in the dimension must contain a unique value for each member of the dimension. This key attribute is called “primary key column”
The primary key column of each dimension table corresponding to the one of the key column in any related fact table.
What is fact table?
A fact table contains the basic information that you wish to summarize. The table that stores the detailed value for measure is called fact table. In simple and best we can define as “The table which contains METRICS” that are used to analyse the business.
It consists of 2 sections
1) Foregine key to the dimesion
2) measures/facts(a numerical value that used to monitor business activity)
What is Factless fact table?
This is very important interview question. The “Factless Fact Table” is a table which is similar to Fact Table except for having any measure; I mean that this table just has the links to the dimensions. These tables enable you to track events; indeed they are for recording events.
Factless fact tables are used for tracking a process or collecting stats. They are called so because, the fact table does not have aggregatable numeric values or information. They are mere key values with reference to the dimensions from which the stats can be collected
What is star, snowflake and star flake schema?
Star schema: In star schema fact table will be directly linked with all dimension tables. The star schema’s dimensions are denormalized with each dimension being represented by a single table. In a star schema a central fact table connects a number of individual dimension tables.
Snowflake: The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy. In snow flake schema fact table will be linked directly as well as there will be some intermediate dimension tables between fact and dimension tables.
Star flake: A hybrid structure that contains a mixture of star(denormalized) and snowflake(normalized) schema’s.
What is use of AttributeHierarchyEnabled?
AttributeHierarchyEnabled: Determines whether an attribute hierarchy is generated by Analysis Services for the attribute. If the attribute hierarchy is not enabled, the attribute cannot be used in a user-defined hierarchy and the attribute hierarchy cannot be referenced in Multidimensional Expressions (MDX) statements.
What is use of AttributeHierarchyOptimizedState?
AttributeHierarchyOptimizedState: Determines the level of optimization applied to the attribute hierarchy. By default, an attribute hierarchy is FullyOptimized, which means that Analysis Services builds indexes for the attribute hierarchy to improve query performance. The other option, NotOptimized, means that no indexes are built for the attribute hierarchy. Using NotOptimized is useful if the attribute hierarchy is used for purposes other than querying, because no additional indexes are built for the attribute. Other uses for an attribute hierarchy can be helping to order another attribute.
What is use of AttributeHierarchyOrdered ?
AttributeHierarchyOrdered: Determines whether the associated attribute hierarchy is ordered. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False.
What is the use of AttributeHierarchyVisible ?
AttributeHierarchyVisible : Determines whether the attribute hierarchy is visible to client applications. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False.
What are types of storage modes?
There are three standard storage modes in OLAP applications
MOLAP
ROLAP
HOLAP
What is MOLAP and its advantage?
MOLAP (Multi dimensional Online Analytical Processing) : MOLAP is the most used storage type. Its designed to offer maximum query performance to the users. the data and aggregations are stored in a multidimensional format, compressed and optimized for performance. This is both good and bad. When a cube with MOLAP storage is processed, the data is pulled from the relational database, the aggregations are performed, and the data is stored in the AS database. The data inside the cube will refresh only when the cube is processed, so latency is high.
Advantages:
Since the data is stored on the OLAP server in optimized format, queries (even complex calculations) are faster than ROLAP.
The data is compressed so it takes up less space.
And because the data is stored on the OLAP server, you don’t need to keep the connection to the relational database.
Cube browsing is fastest using MOLAP.
What is ROLAP and its advantage?
ROLAP (Relational Online Analytical Processing) : ROLAP does not have the high latency disadvantage of MOLAP. With ROLAP, the data and aggregations are stored in relational format. This means that there will be zero latency between the relational source database and the cube.
Disadvantage of this mode is the performance, this type gives the poorest query performance because no objects benefit from multi dimensional storage.
Advantages:
Since the data is kept in the relational database instead of on the OLAP server, you can view the data in almost real time.
Also, since the data is kept in the relational database, it allows for much larger amounts of data, which can mean better scalability.
Low latency.
What is HOLAP and its advantage?
Hybrid Online Analytical Processing (HOLAP): HOLAP is a combination of MOLAP and ROLAP. HOLAP stores the detail data in the relational database but stores the aggregations in multidimensional format. Because of this, the aggregations will need to be processed when changes are occur. With HOLAP you kind of have medium query performance: not as slow as ROLAP, but not as fast as MOLAP. If, however, you were only querying aggregated data or using a cached query, query performance would be similar to MOLAP. But when you need to get that detail data, performance is closer to ROLAP.
Advantages:
HOLAP is best used when large amounts of aggregations are queried often with little detail data, offering high performance and lower storage requirements.
Cubes are smaller than MOLAP since the detail data is kept in the relational database.
Processing time is less than MOLAP since only aggregations are stored in multidimensional format.
Low latency since processing takes place when changes occur and detail data is kept in the relational database.
What are Translations and its use?
Translation: The translation feature in analysis service allows you to display caption and attributes names that correspond to a specific language. It helps in providing GLOBALIZATION to the Cube.
What is Database dimension?
All the dimensions that are created using NEW DIMENSION Wizard are database dimensions. In other words, the dimensions which are at Database level are called Database Dimensions.
What is Cube dimension?
A cube dimension is an instance of a database dimension within a cube is called as cube dimension. A database dimension can be used in multiple cubes, and multiple cube dimensions can be based on a single database dimension
Difference between Database dimension and Cube dimension?
The Database dimension has only Name and ID properties, whereas a Cube dimension has several more properties.
Database dimension is created one where as Cube dimension is referenced from database dimension.
Database dimension exists only once.where as Cube dimensions can be created more than one using ROLE PLAYING Dimensions concept.
I really appreciate the kind of topics you post here. Thanks for sharing us a great information that is actually helpful. Good day
ReplyDeleteDevOps Training in Chennai
DevOps Online Training in Chennai
DevOps Training in Bangalore
DevOps Training in Hyderabad
DevOps Training in Coimbatore
DevOps Training
DevOps Online Training
Good post and informative. Thank you very much for sharing this good article, it was so good to read and useful to improve my knowledge as updated, keep blogging.
ReplyDeleteThanks for one marvelous posting! I enjoyed reading it; you are a great author.
oracle training in chennai
oracle training institute in chennai
oracle training in bangalore
oracle training in hyderabad
oracle training
oracle online training
hadoop training in chennai
hadoop training in bangalore
Post a Comment