Course Code: MCS 226

Assignment Code: MCS 226 ASST/TMA/2024-25

Marks: 100

NOTE: All questions are compulsory

Last date of submission for July 2023 session is 31st October, 2023 and for January 2024 session is 30th April 2024.

Q.1 Describe data science. What uses does it have? In the context of data analysis, define the terms descriptive, exploratory, and predictive.

Data science is a multidisciplinary field that involves extracting insights and knowledge from data through various techniques such as statistics, machine learning, data mining, and visualization. It encompasses a wide range of processes including data collection, cleaning, analysis, interpretation, and communication of findings to inform decision-making.

Descriptive analysis involves summarizing and presenting data in a meaningful and understandable way. It focuses on describing the main features of a dataset, such as central tendencies (mean, median, mode), dispersion (range, standard deviation), distribution (histograms, frequency tables), and other summary statistics. Descriptive analysis provides insights into the basic characteristics of the data, allowing researchers to understand its structure and patterns.

Exploratory analysis involves investigating the data to discover relationships, trends, and patterns that may exist within it. Unlike descriptive analysis, which focuses on summarizing existing data, exploratory analysis aims to uncover hidden insights and generate hypotheses for further investigation. Techniques such as data visualization, clustering, and dimensionality reduction are commonly used in exploratory analysis to gain a deeper understanding of the data and identify potential areas of interest for further analysis.

Predictive analysis involves using historical data to make predictions about future outcomes or trends. It employs statistical and machine learning algorithms to build models that can forecast future events based on past observations. Predictive analysis is used in various fields such as finance, healthcare, marketing, and manufacturing to anticipate customer behavior, optimize business processes, detect anomalies, and mitigate risks. Common techniques used in predictive analysis include regression analysis, classification algorithms, time series forecasting, and ensemble methods.

In summary, data science encompasses a range of activities aimed at extracting knowledge and insights from data. Descriptive analysis focuses on summarizing data, exploratory analysis aims to discover patterns and relationships, and predictive analysis involves making predictions about future outcomes. These techniques play a crucial role in helping organizations make informed decisions and drive innovation.

Q.2 A class has 25 students. Create a data set of marks of the students in Mathematics out of a maximum of 50 marks. Discuss and draw, which chart will be best for Visualization & Interpretation. Justify your reasons in support of your answer.

Q.3 What is the purpose of using Apache SPARK, HIVE and HBASE, explain with supporting example.

Q.4 Create a sample data of the marks of 20 students in five different subjects using MSExcel. Discuss the different chart and graphing library packages supported by R programming language. Write programs using R programming language to create four different plots using this data.

Q.5 What is PageRank? Discuss the basic principle of flow model in PageRank. Explain different mechanisms of finding pagerank?

Q.6 Discuss different data structures in R. Write program using R for the following tasks: (i) Computation of income tax of a vector of size 10, consisting of the total annual income of 10 different persons. The tax computation should be 10%, if annual income is below 5 lakhs and 20% if it is above 5 lakhs. (ii) Matrix addition, subtraction and multiplication (iii) Finding inverse of a matrix

Q.7 Discuss the need for Statistical Hypothesis Testing with the help of an example. Explain types of Errors in Hypothesis Testing.

Q.8 Discuss the Classification, Clustering and Association Rules with different examples. Explain, where we can use Random Forest Algorithm? Use R programming language to discuss Random Forest Algorithm.

Q.9 What is NoSQL database? Discuss how does a Column Database and Document database Work? List and briefly discuss Graph database examples.

Q.10 Explain the process and issues of the following: Advertising on web, Recommendation system, Mining of social networks.

