Assignment Ques

Section A: (2 marks each)

  1. Explain Business Intelligence Architecture.

    Business Intelligence (BI) architecture refers to the overall framework and structure that is used to design, develop, and implement business intelligence systems. The architecture outlines the technical components and tools required to process and analyze data in order to generate insights that can inform business decisions.

    There are typically five key layers in a BI architecture:

    1. Data Source Layer: This layer involves the collection of data from various sources such as databases, files, web services, and other applications.
    2. ETL (Extract-Transform-Load) & Integration Layer: This layer extracts data from the data source layer and consolidate datasets from multiple sources for a unified view, transforms it into a standardized format, and loads it into the data warehouse layer.
    3. Data Warehouse Layer: This layer stores the integrated and standardized data in a centralized repository for analysis. It provides a comprehensive view of the organization's data and supports complex queries and analysis.
    4. Metadata Layer: This layer provides information about the data in the data warehouse, such as its structure, relationships, and meanings. It helps users to understand the data and enables them to search, retrieve, and analyze it.
    5. End User Layer: This layer is a combination of data cleaning, data analysis & data visualisation. This layer delivers the results of analysis to the end-users in various formats such as dashboards, reports, ad hoc queries, and data visualization tools. It enables users to interpret and understand the data easily.
  2. Compare

    (i) Data mart and data warehouse: Data mart and data warehouse are two types of data storage solutions used in business intelligence. Data marts are smaller subsets of data warehouses that are designed to support a specific business function or department. Data warehouses, on the other hand, are larger storage solutions that store data from multiple sources across the organization. Data warehouses are designed to support enterprise-wide reporting and analysis.

    (ii) Star schema and Snowflake schema: Star schema and snowflake schema are two common data modeling techniques used in data warehousing. Star schema is a simple, denormalized design where each dimension is represented by a single table, and the fact table contains the measures. Snowflake schema is a normalized design where each dimension table is broken down into smaller tables. Star schema is easier to understand and query, while snowflake schema offers better data integrity.

    Star schema:

    Snowflake schema:

    (iii) OLAP and OLTP: OLAP (Online Analytical Processing) and OLTP (Online Transaction Processing) are two types of data processing used in business intelligence. OLTP is used for transactional processing, where data is updated in real-time, while OLAP is used for analytical processing, where data is analyzed and reported on for decision-making purposes.

    OLAP:

    OLTP:

    (iv) Top-down and Bottom-up design approach: Top-down design approach involves designing the entire system from the top-down, starting with the highest-level components and working down to the lower-level components. Bottom-up design approach, on the other hand, involves designing the system from the bottom-up, starting with the lower-level components and building them up into a larger system. Top-down approach is best suited for large and complex systems, where a clear understanding of the overall architecture is necessary before developing individual components. Bottom-up design can be faster as it allows for the development of individual components in parallel, while top-down design may take longer as it requires a more comprehensive understanding of the system before development can begin. Bottom-up design approach is more flexible as it allows for the creation of individual components that can be reused in other systems. Top-down design approach, on the other hand, may be less flexible as it focuses on the overall architecture of the system.

  3. Explain the functions of Data Transformation in Data Analytics.

    Data Transformation is a critical component of Data Analytics. It involves the conversion of raw data into a structured format suitable for analysis.

Section B: (3 marks)

  1. What are the benefits and challenges of Data Transformation?

    Data transformation is an essential part of the data preparation process in data analytics. It involves cleaning, formatting, and restructuring raw data to make it more suitable for analysis. The benefits of data transformation are as follows:

    However, there are also some challenges associated with data transformation, including:

  2. Explain OLAP in Data Analytics.

    OLAP stands for Online Analytical Processing, which is a technology used to analyze data from multiple dimensions, such as time, geography, and product. It allows users to perform complex analyses of large data sets quickly and easily. The key features of OLAP are:

    OLAP is commonly used in business intelligence and data warehousing, where it is used to analyze large amounts of data from different sources.

  3. Explain the issues in critical planning of Data Warehouse projects in Data Analytics.

    Data warehouse projects are complex and require careful planning to ensure their success. There are several issues that need to be considered in the critical planning phase, including:

    Addressing these issues in the critical planning phase can help ensure the success of the data warehouse project.

Section C: (7 marks)

  1. Explain Cumulative Distributive Function and its properties.

    Cumulative Distribution Function (CDF) is a probability function that is used to represent the cumulative probability distribution of a random variable. It represents the probability of observing a value less than or equal to a given input value. CDF has several properties that are important to understand, which include:

  2. Explain Bayesian logistic regression method and its importance.

    Bayesian logistic regression is a statistical method used to model the relationship between a binary response variable and one or more predictor variables. It uses Bayes' theorem to estimate the probability of the response variable based on the predictor variables. Bayesian logistic regression has several advantages over traditional logistic regression, including:

    Bayesian logistic regression is important in data analytics because it allows for the modeling of complex relationships between variables, and it provides a way to quantify the uncertainty associated with the estimated parameters.

  3. Explain likelihood, the prior distributions, and posterior distributions.

    In Bayesian statistics, likelihood is a function that describes the probability of observing the data given the model parameters. Prior distributions represent the prior beliefs about the model parameters before observing the data, and posterior distributions represent the updated beliefs about the model parameters after observing the data. The relationship between the likelihood, prior distributions, and posterior distributions can be expressed using Bayes' theorem:

    posterior distribution = likelihood * prior distribution / evidence

    where the evidence is the normalizing constant that ensures the posterior distribution integrates to 1.

    The likelihood function is important in Bayesian statistics because it represents the information contained in the data about the model parameters. The prior distributions are important because they allow for the incorporation of prior knowledge or beliefs about the model parameters. The posterior distributions are important because they represent the updated beliefs about the model parameters after observing the data, and they are used for inference and prediction.

  4. Explain Bayesian Machine Learning.

  5. Explain Classification with Bayesian Logistic Regression.

  6. Explain Data warehousnig components and Implementation options.

  7. Explain Fact constellation and Slow changing Dimensions.

  8. Explain physical design process, deployment, and ongoing maintenance in DW.

Unit 1

List of Important Topics: