data analytics assignment example

DATA 275 Introduction to Data Analytics

Getting Started with SPSS
Variable View
Option Suggestions
SPSS Viewer
Entering Data
Cleaning & Checking Your SPSS Database
Recoding Data: Collapsing Continuous Data
Constructing Scales and Checking Their Reliability
Formatting Tables in APA style
Creating a syntax
Public Data Sources

Data Analytics Project Assignment

Literature Review This link opens in a new window

For your research project you will conduct data analysis and right a report summarizing your analysis and the findings from your analysis. You will accomplish this by completing a series of assignments.

Data 275 Research Project Assignment

In this week’s assignment, you are required accomplish the following tasks:

1. Propose a topic for you project

The topic you select for your capstone depends on your interest and the data problem you want to address. Try to pick a topic that you would enjoy researching and writing about.

Your topic selection will also be influenced by data availability. Because, this is a data analytics project, you will need to have access to data. If you have access to your organization’s data, you are free to use it. If you choose to do so, all information presented must be in secure form because Davenport University does not assume any responsibility for the security of corporate data. Otherwise, you can select a topic that is amenable to publicly available data.

Click the link for some useful suggestions: Project Proposal Suggestions

2. Find a data set of your interest and download it

There are many publicly available data sets that you can use for your project. The library has compiled a list of many possible sources of data. Click on the link below to explore these sources.

Public Data Sources

The data set you select must have:

At least 50 observations (50 rows) and at least 4 variables (columns) excluding identification variables At least one dependent variable

You must provide:

A proper citation of the data source using APA style format A discussion on how the data was collected and by whom The number of variables in the data set The number of observations/subjects in the data set A description of each variable together with an explanation of how it is measured (e.g. the unit of measurement).

Deliverable

A minimum of one page description of your data analytics project which must include the following:

A title for your project A brief description of the project Major stakeholders who would use the information that would be generated from your analysis and how they would use/benefit from that information A description of the dataset you will use for your project

<< Previous: Public Data Sources
Next: Literature Review >>
Last Updated: Mar 15, 2024 10:33 AM
URL: https://davenport.libguides.com/data275

6.894 : Interactive Data Visualization

Assignment 2: exploratory data analysis.

In this assignment, you will identify a dataset of interest and perform an exploratory analysis to better understand the shape & structure of the data, investigate initial questions, and develop preliminary insights & hypotheses. Your final submission will take the form of a report consisting of captioned visualizations that convey key insights gained during your analysis.

Step 1: Data Selection

First, you will pick a topic area of interest to you and find a dataset that can provide insights into that topic. To streamline the assignment, we've pre-selected a number of datasets for you to choose from.

However, if you would like to investigate a different topic and dataset, you are free to do so. If working with a self-selected dataset, please check with the course staff to ensure it is appropriate for the course. Be advised that data collection and preparation (also known as data wrangling ) can be a very tedious and time-consuming process. Be sure you have sufficient time to conduct exploratory analysis, after preparing the data.

After selecting a topic and dataset – but prior to analysis – you should write down an initial set of at least three questions you'd like to investigate.

Part 2: Exploratory Visual Analysis

Next, you will perform an exploratory analysis of your dataset using a visualization tool such as Tableau. You should consider two different phases of exploration.

In the first phase, you should seek to gain an overview of the shape & stucture of your dataset. What variables does the dataset contain? How are they distributed? Are there any notable data quality issues? Are there any surprising relationships among the variables? Be sure to also perform "sanity checks" for patterns you expect to see!

In the second phase, you should investigate your initial questions, as well as any new questions that arise during your exploration. For each question, start by creating a visualization that might provide a useful answer. Then refine the visualization (by adding additional variables, changing sorting or axis scales, filtering or subsetting data, etc. ) to develop better perspectives, explore unexpected observations, or sanity check your assumptions. You should repeat this process for each of your questions, but feel free to revise your questions or branch off to explore new questions if the data warrants.

Final Deliverable

Your final submission should take the form of a Google Docs report – similar to a slide show or comic book – that consists of 10 or more captioned visualizations detailing your most important insights. Your "insights" can include important surprises or issues (such as data quality problems affecting your analysis) as well as responses to your analysis questions. To help you gauge the scope of this assignment, see this example report analyzing data about motion pictures . We've annotated and graded this example to help you calibrate for the breadth and depth of exploration we're looking for.

Each visualization image should be a screenshot exported from a visualization tool, accompanied with a title and descriptive caption (1-4 sentences long) describing the insight(s) learned from that view. Provide sufficient detail for each caption such that anyone could read through your report and understand what you've learned. You are free, but not required, to annotate your images to draw attention to specific features of the data. You may perform highlighting within the visualization tool itself, or draw annotations on the exported image. To easily export images from Tableau, use the Worksheet > Export > Image... menu item.

The end of your report should include a brief summary of main lessons learned.

Recommended Data Sources

To get up and running quickly with this assignment, we recommend exploring one of the following provided datasets:

World Bank Indicators, 1960–2017 . The World Bank has tracked global human developed by indicators such as climate change, economy, education, environment, gender equality, health, and science and technology since 1960. The linked repository contains indicators that have been formatted to facilitate use with Tableau and other data visualization tools. However, you're also welcome to browse and use the original data by indicator or by country . Click on an indicator category or country to download the CSV file.

Chicago Crimes, 2001–present (click Export to download a CSV file). This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.

Daily Weather in the U.S., 2017 . This dataset contains daily U.S. weather measurements in 2017, provided by the NOAA Daily Global Historical Climatology Network . This data has been transformed: some weather stations with only sparse measurements have been filtered out. See the accompanying weather.txt for descriptions of each column .

Social mobility in the U.S. . Raj Chetty's group at Harvard studies the factors that contribute to (or hinder) upward mobility in the United States (i.e., will our children earn more than we will). Their work has been extensively featured in The New York Times. This page lists data from all of their papers, broken down by geographic level or by topic. We recommend downloading data in the CSV/Excel format, and encourage you to consider joining multiple datasets from the same paper (under the same heading on the page) for a sufficiently rich exploratory process.

The Yelp Open Dataset provides information about businesses, user reviews, and more from Yelp's database. The data is split into separate files ( business , checkin , photos , review , tip , and user ), and is available in either JSON or SQL format. You might use this to investigate the distributions of scores on Yelp, look at how many reviews users typically leave, or look for regional trends about restaurants. Note that this is a large, structured dataset and you don't need to look at all of the data to answer interesting questions. In order to download the data you will need to enter your email and agree to Yelp's Dataset License .

Additional Data Sources

If you want to investigate datasets other than those recommended above, here are some possible sources to consider. You are also free to use data from a source different from those included here. If you have any questions on whether your dataset is appropriate, please ask the course staff ASAP!

data.boston.gov - City of Boston Open Data
MassData - State of Masachussets Open Data
data.gov - U.S. Government Open Datasets
U.S. Census Bureau - Census Datasets
IPUMS.org - Integrated Census & Survey Data from around the World
Federal Elections Commission - Campaign Finance & Expenditures
Federal Aviation Administration - FAA Data & Research
fivethirtyeight.com - Data and Code behind the Stories and Interactives
Buzzfeed News
Socrata Open Data
17 places to find datasets for data science projects

Visualization Tools

You are free to use one or more visualization tools in this assignment. However, in the interest of time and for a friendlier learning curve, we strongly encourage you to use Tableau . Tableau provides a graphical interface focused on the task of visual data exploration. You will (with rare exceptions) be able to complete an initial data exploration more quickly and comprehensively than with a programming-based tool.

Tableau - Desktop visual analysis software . Available for both Windows and MacOS; register for a free student license.
Data Transforms in Vega-Lite . A tutorial on the various built-in data transformation operators available in Vega-Lite.
Data Voyager , a research prototype from the UW Interactive Data Lab, combines a Tableau-style interface with visualization recommendations. Use at your own risk!
R , using the ggplot2 library or with R's built-in plotting functions.
Jupyter Notebooks (Python) , using libraries such as Altair or Matplotlib .

Data Wrangling Tools

The data you choose may require reformatting, transformation or cleaning prior to visualization. Here are tools you can use for data preparation. We recommend first trying to import and process your data in the same tool you intend to use for visualization. If that fails, pick the most appropriate option among the tools below. Contact the course staff if you are unsure what might be the best option for your data!

Graphical Tools

Tableau Prep - Tableau provides basic facilities for data import, transformation & blending. Tableau prep is a more sophisticated data preparation tool
Trifacta Wrangler - Interactive tool for data transformation & visual profiling.
OpenRefine - A free, open source tool for working with messy data.

Programming Tools

JavaScript data utilities and/or the Datalib JS library .
Pandas - Data table and manipulation utilites for Python.
dplyr - A library for data manipulation in R.
Or, the programming language and tools of your choice...

The assignment score is out of a maximum of 10 points. Submissions that squarely meet the requirements will receive a score of 8. We will determine scores by judging the breadth and depth of your analysis, whether visualizations meet the expressivenes and effectiveness principles, and how well-written and synthesized your insights are.

We will use the following rubric to grade your assignment. Note, rubric cells may not map exactly to specific point scores.

Submission Details

This is an individual assignment. You may not work in groups.

Your completed exploratory analysis report is due by noon on Wednesday 2/19 . Submit a link to your Google Doc report using this submission form . Please double check your link to ensure it is viewable by others (e.g., try it in an incognito window).

Resubmissions. Resubmissions will be regraded by teaching staff, and you may earn back up to 50% of the points lost in the original submission. To resubmit this assignment, please use this form and follow the same submission process described above. Include a short 1 paragraph description summarizing the changes from the initial submission. Resubmissions without this summary will not be regraded. Resubmissions will be due by 11:59pm on Saturday, 3/14. Slack days may not be applied to extend the resubmission deadline. The teaching staff will only begin to regrade assignments once the Final Project phase begins, so please be patient.

Due: 12pm, Wed 2/19
Recommended Datasets
Example Report
Visualization & Data Wrangling Tools
Submission form

CSE 163, Summer 2020: Homework 3: Data Analysis

In this assignment, you will apply what you've learned so far in a more extensive "real-world" dataset using more powerful features of the Pandas library. As in HW2, this dataset is provided in CSV format. We have cleaned up the data some, but you will need to handle more edge cases common to real-world datasets, including null cells to represent unknown information.

Note that there is no graded testing portion of this assignment. We still recommend writing tests to verify the correctness of the methods that you write in Part 0, but it will be difficult to write tests for Part 1 and 2. We've provided tips in those sections to help you gain confidence about the correctness of your solutions without writing formal test functions!

This assignment is supposed to introduce you to various parts of the data science process involving being able to answer questions about your data, how to visualize your data, and how to use your data to make predictions for new data. To help prepare for your final project, this assignment has been designed to be wide in scope so you can get practice with many different aspects of data analysis. While this assignment might look large because there are many parts, each individual part is relatively small.

Learning Objectives

After this homework, students will be able to:

Work with basic Python data structures.
Handle edge cases appropriately, including addressing missing values/data.
Practice user-friendly error-handling.
Read plotting library documentation and use example plotting code to figure out how to create more complex Seaborn plots.
Train a machine learning model and use it to make a prediction about the future using the scikit-learn library.

Expectations

Here are some baseline expectations we expect you to meet:

Follow the course collaboration policies

If you are developing on Ed, all the files are there. The files included are:

hw3-nces-ed-attainment.csv : A CSV file that contains data from the National Center for Education Statistics. This is described in more detail below.
hw3.py : The file for you to put solutions to Part 0, Part 1, and Part 2. You are required to add a main method that parses the provided dataset and calls all of the functions you are to write for this homework.
hw3-written.txt : The file for you to put your answers to the questions in Part 3.
cse163_utils.py : Provides utility functions for this assignment. You probably don't need to use anything inside this file except importing it if you have a Mac (see comment in hw3.py )

If you are developing locally, you should navigate to Ed and in the assignment view open the file explorer (on the left). Once there, you can right-click to select the option to "Download All" to download a zip and open it as the project in Visual Studio Code.

The dataset you will be processing comes from the National Center for Education Statistics. You can find the original dataset here . We have cleaned it a bit to make it easier to process in the context of this assignment. You must use our provided CSV file in this assignment.

The original dataset is titled: Percentage of persons 25 to 29 years old with selected levels of educational attainment, by race/ethnicity and sex: Selected years, 1920 through 2018 . The cleaned version you will be working with has columns for Year, Sex, Educational Attainment, and race/ethnicity categories considered in the dataset. Note that not all columns will have data starting at 1920.

Our provided hw3-nces-ed-attainment.csv looks like: (⋮ represents omitted rows):

Column Descriptions

Year: The year this row represents. Note there may be more than one row for the same year to show the percent breakdowns by sex.
Sex: The sex of the students this row pertains to, one of "F" for female, "M" for male, or "A" for all students.
Min degree: The degree this row pertains to. One of "high school", "associate's", "bachelor's", or "master's".
Total: The total percent of students of the specified gender to reach at least the minimum level of educational attainment in this year.
White / Black / Hispanic / Asian / Pacific Islander / American Indian or Alaska Native / Two or more races: The percent of students of this race and the specified gender to reach at least the minimum level of educational attainment in this year.

Interactive Development

When using data science libraries like pandas , seaborn , or scikit-learn it's extremely helpful to actually interact with the tools your using so you can have a better idea about the shape of your data. The preferred practice by people in industry is to use a Jupyter Notebook, like we have been in lecture, to play around with the dataset to help figure out how to answer the questions you want to answer. This is incredibly helpful when you're first learning a tool as you can actually experiment and get real-time feedback if the code you wrote does what you want.

We recommend that you try figuring out how to solve these problems in a Jupyter Notebook so you can actually interact with the data. We have made a Playground Jupyter Notebook for you that has the data uploaded. At the top-right of this page in Ed is a "Fork" button (looks like a fork in the road). This will make your own copy of this Notebook so you can run the code and experiment with anything there! When you open the Workspace, you should see a list of notebooks and CSV files. You can always access this launch page by clicking the Jupyter logo.

Part 0: Statistical Functions with Pandas

In this part of the homework, you will write code to perform various analytical operations on data parsed from a file.

Part 0 Expectations

All functions for this part of the assignment should be written in hw3.py .
For this part of the assignment, you may import and use the math and pandas modules, but you may not use any other imports to solve these problems.
For all of the problems below, you should not use ANY loops or list/dictionary comprehensions. The goal of this part of the assignment is to use pandas as a tool to help answer questions about your dataset.

Problem 0: Parse data

In your main method, parse the data from the CSV file using pandas. Note that the file uses '---' as the entry to represent missing data. You do NOT need to anything fancy like set a datetime index.

The function to read a CSV file in pandas takes a parameter called na_values that takes a str to specify which values are NaN values in the file. It will replace all occurrences of those characters with NaN. You should specify this parameter to make sure the data parses correctly.

Problem 1: compare_bachelors_1980

What were the percentages for women vs. men having earned a Bachelor's Degree in 1980? Call this method compare_bachelors_1980 and return the result as a DataFrame with a row for men and a row for women with the columns "Sex" and "Total".

The index of the DataFrame is shown as the left-most column above.

Problem 2: top_2_2000s

What were the two most commonly awarded levels of educational attainment awarded between 2000-2010 (inclusive)? Use the mean percent over the years to compare the education levels in order to find the two largest. For this computation, you should use the rows for the 'A' sex. Call this method top_2_2000s and return a Series with the top two values (the index should be the degree names and the values should be the percent).

For example, assuming we have parsed hw3-nces-ed-attainment.csv and stored it in a variable called data , then top_2_2000s(data) will return the following Series (shows the index on the left, then the value on the right)

Hint: The Series class also has a method nlargest that behaves similarly to the one for the DataFrame , but does not take a column parameter (as Series objects don't have columns).

Our assert_equals only checks that floating point numbers are within 0.001 of each other, so your floats do not have to match exactly.

Optional: Why 0.001?

Whenever you work with floating point numbers, it is very likely you will run into imprecision of floating point arithmetic . You have probably run into this with your every day calculator! If you take 1, divide by 3, and then multiply by 3 again you could get something like 0.99999999 instead of 1 like you would expect.

This is due to the fact that there is only a finite number of bits to represent floats so we will at some point lose some precision. Below, we show some example Python expressions that give imprecise results.

Because of this, you can never safely check if one float is == to another. Instead, we only check that the numbers match within some small delta that is permissible by the application. We kind of arbitrarily chose 0.001, and if you need really high accuracy you would want to only allow for smaller deviations, but equality is never guaranteed.

Problem 3: percent_change_bachelors_2000s

What is the difference between total percent of bachelor's degrees received in 2000 as compared to 2010? Take a sex parameter so the client can specify 'M', 'F', or 'A' for evaluating. If a call does not specify the sex to evaluate, you should evaluate the percent change for all students (sex = ‘A’). Call this method percent_change_bachelors_2000s and return the difference (the percent in 2010 minus the percent in 2000) as a float.

For example, assuming we have parsed hw3-nces-ed-attainment.csv and stored it in a variable called data , then the call percent_change_bachelors_2000s(data) will return 2.599999999999998 . Our assert_equals only checks that floating point numbers are within 0.001 of each other, so your floats do not have to match exactly.

Hint: For this problem you will need to use the squeeze() function on a Series to get a single value from a Series of length 1.

Part 1: Plotting with Seaborn

Next, you will write functions to generate data visualizations using the Seaborn library. For each of the functions save the generated graph with the specified name. These methods should only take the pandas DataFrame as a parameter. For each problem, only drop rows that have missing data in the columns that are necessary for plotting that problem ( do not drop any additional rows ).

Part 1 Expectations

When submitting on Ed, you DO NOT need to specify the absolute path (e.g. /home/FILE_NAME ) for the output file name. If you specify absolute paths for this assignment your code will not pass the tests!
You will want to pass the parameter value bbox_inches='tight' to the call to savefig to make sure edges of the image look correct!
For this part of the assignment, you may import the math , pandas , seaborn , and matplotlib modules, but you may not use any other imports to solve these problems.
For all of the problems below, you should not use ANY loops or list/dictionary comprehensions.
Do not use any of the other seaborn plotting functions for this assignment besides the ones we showed in the reference box below. For example, even though the documentation for relplot links to another method called scatterplot , you should not call scatterplot . Instead use relplot(..., kind='scatter') like we showed in class. This is not an issue of stylistic preference, but these functions behave slightly differently. If you use these other functions, your output might look different than the expected picture. You don't yet have the tools necessary to use scatterplot correctly! We will see these extra tools later in the quarter.

Part 1 Development Strategy

Print your filtered DataFrame before creating the graph to ensure you’re selecting the correct data.
Call the DataFrame describe() method to see some statistical information about the data you've selected. This can sometimes help you determine what to expect in your generated graph.
Re-read the problem statement to make sure your generated graph is answering the correct question.
Compare the data on your graph to the values in hw3-nces-ed-attainment.csv. For example, for problem 0 you could check that the generated line goes through the point (2005, 28.8) because of this row in the dataset: 2005,A,bachelor's,28.8,34.5,17.6,11.2,62.1,17.0,16.4,28.0

Seaborn Reference

Of all the libraries we will learn this quarter, Seaborn is by far the best documented. We want to give you experience reading real world documentation to learn how to use a library so we will not be providing a specialized cheat-sheet for this assignment. What we will do to make sure you don't have to look through pages and pages of documentation is link you to some key pages you might find helpful for this assignment; you do not have to use every page we link, so part of the challenge here is figuring out which of these pages you need. As a data scientist, a huge part of solving a problem is learning how to skim lots of documentation for a tool that you might be able to leverage to solve your problem.

We recommend to read the documentation in the following order:

Start by skimming the examples to see the possible things the function can do. Don't spend too much time trying to figure out what the code is doing yet, but you can quickly look at it to see how much work is involved.
Then read the top paragraph(s) that give a general overview of what the function does.
Now that you have a better idea of what the function is doing, go look back at the examples and look at the code much more carefully. When you see an example like the one you want to generate, look carefully at the parameters it passes and go check the parameter list near the top for documentation on those parameters.
It sometimes (but not always), helps to skim the other parameters in the list just so you have an idea what this function is capable of doing

As a reminder, you will want to refer to the lecture/section material to see the additional matplotlib calls you might need in order to display/save the plots. You'll also need to call the set function on seaborn to get everything set up initially.

Here are the seaborn functions you might need for this assignment:

Bar/Violin Plot ( catplot )
Plot a Discrete Distribution ( distplot ) or Continuous Distribution ( kdeplot )
Scatter/Line Plot ( relplot )
Linear Regression Plot ( regplot )
Compare Two Variables ( jointplot )
Heatmap ( heatmap )

Make sure you read the bullet point at the top of the page warning you to only use these functions!

Problem 0: Line Chart

Plot the total percentages of all people of bachelor's degree as minimal completion with a line chart over years. To select all people, you should filter to rows where sex is 'A'. Label the x-axis "Year", the y-axis "Percentage", and title the plot "Percentage Earning Bachelor's over Time". Name your method line_plot_bachelors and save your generated graph as line_plot_bachelors.png .

Problem 1: Bar Chart

Plot the total percentages of women, men, and total people with a minimum education of high school degrees in the year 2009. Label the x-axis "Sex", the y-axis "Percentage", and title the plot "Percentage Completed High School by Sex". Name your method bar_chart_high_school and save your generated graph as bar_chart_high_school.png .

Do you think this bar chart is an effective data visualization? Include your reasoning in hw3-written.txt as described in Part 3.

Problem 2: Custom Plot

Plot the results of how the percent of Hispanic individuals with degrees has changed between 1990 and 2010 (inclusive) for high school and bachelor's degrees with a chart of your choice. Make sure you label your axes with descriptive names and give a title to the graph. Name your method plot_hispanic_min_degree and save your visualization as plot_hispanic_min_degree.png .

Include a justification of your choice of data visualization in hw3-written.txt , as described in Part 3.

Part 2: Machine Learning using scikit-learn

Now you will be making a simple machine learning model for the provided education data using scikit-learn . Complete this in a function called fit_and_predict_degrees that takes the data as a parameter and returns the test mean squared error as a float. This may sound like a lot, so we've broken it down into steps for you:

Filter the DataFrame to only include the columns for year, degree type, sex, and total.
Do the following pre-processing: Drop rows that have missing data for just the columns we are using; do not drop any additional rows . Convert string values to their one-hot encoding. Split the columns as needed into input features and labels.
Randomly split the dataset into 80% for training and 20% for testing.
Train a decision tree regressor model to take in year, degree type, and sex to predict the percent of individuals of the specified sex to achieve that degree type in the specified year.
Use your model to predict on the test set. Calculate the accuracy of your predictions using the mean squared error of the test dataset.

You do not need to anything fancy like find the optimal settings for parameters to maximize performance. We just want you to start simple and train a model from scratch! The reference below has all the methods you will need for this section!

scikit-learn Reference

You can find our reference sheet for machine learning with scikit-learn ScikitLearnReference . This reference sheet has information about general scikit-learn calls that are helpful, as well as how to train the tree models we talked about in class. At the top-right of this page in Ed is a "Fork" button (looks like a fork in the road). This will make your own copy of this Notebook so you can run the code and experiment with anything there! When you open the Workspace, you should see a list of notebooks and CSV files. You can always access this launch page by clikcing the Jupyter logo.

Part 2 Development Strategy

Like in Part 1, it can be difficult to write tests for this section. Machine Learning is all about uncertainty, and it's often difficult to write tests to know what is right. This requires diligence and making sure you are very careful with the method calls you make. To help you with this, we've provided some alternative ways to gain confidence in your result:

Print your test y values and your predictions to compare them manually. They won't be exactly the same, but you should notice that they have some correlation. For example, I might be concerned if my test y values were [2, 755, …] and my predicted values were [1022, 5...] because they seem to not correlate at all.
Calculate your mean squared error on your training data as well as your test data. The error should be lower on your training data than on your testing data.

Optional: ML for Time Series

Since this is technically time series data, we should point out that our method for assessing the model's accuracy is slightly wrong (but we will keep it simple for our HW). When working with time series, it is common to use the last rows for your test set rather than random sampling (assuming your data is sorted chronologically). The reason is when working with time series data in machine learning, it's common that our goal is to make a model to help predict the future. By randomly sampling a test set, we are assessing the model on its ability to predict in the past! This is because it might have trained on rows that came after some rows in the test set chronologically. However, this is not a task we particularly care that the model does well at. Instead, by using the last section of the dataset (the most recent in terms of time), we are now assessing its ability to predict into the future from the perspective of its training set.

Even though it's not the best approach to randomly sample here, we ask you to do it anyways. This is because random sampling is the most common method for all other data types.

Part 3: Written Responses

Review the source of the dataset here . For the following reflection questions consider the accuracy of data collected, and how it's used as a public dataset (e.g. presentation of data, publishing in media, etc.). All of your answers should be complete sentences and show thoughtful responses. "No" or "I don't know" or any response like that are not valid responses for any questions. There is not one particularly right answer to these questions, instead, we are looking to see you use your critical thinking and justify your answers!

Do you think the bar chart from part 1b is an effective data visualization? Explain in 1-2 sentences why or why not.
Why did you choose the type of plot that you did in part 1c? Explain in a few sentences why you chose this type of plot.
Datasets can be biased. Bias in data means it might be skewed away from or portray a wrong picture of reality. The data might contain inaccuracies or the methods used to collect the data may have been flawed. Describe a possible bias present in this dataset and why it might have occurred. Your answer should be about 2 or 3 sentences long.

Context : Later in the quarter we will talk about ethics and data science. This question is supposed to be a warm-up to get you thinking about our responsibilities having this power to process data. We are not trying to train to misuse your powers for evil here! Most misuses of data analysis that result in ethical concerns happen unintentionally. As preparation to understand these unintentional consequences, we thought it would be a good exercise to think about a theoretical world where you would willingly try to misuse data.

Congrats! You just got an internship at Evil Corp! Your first task is to come up with an application or analysis that uses this dataset to do something unethical or nefarious. Describe a way that this dataset could be misused in some application or an analysis (potentially using the bias you identified for the last question). Regardless of what nefarious act you choose, evil still has rules: You need to justify why using the data in this is a misuse and why a regular person who is not evil (like you in the real world outside of this problem) would think using the data in this way would be wrong. There are no right answers here of what defines something as unethical, this is why you need to justify your answer! Your response should be 2 to 4 sentences long.

Turn your answers to these question in by writing them in hw3-written.txt and submitting them on Ed

Your submission will be evaluated on the following dimensions:

Your solution correctly implements the described behaviors. You will have access to some tests when you turn in your assignment, but we will withhold other tests to test your solution when grading. All behavior we test is completely described by the problem specification or shown in an example.
No method should modify its input parameters.
Your main method in hw3.py must call every one of the methods you implemented in this assignment. There are no requirements on the format of the output, besides that it should save the files for Part 1 with the proper names specified in Part 1.
We can run your hw3.py without it crashing or causing any errors or warnings.
When we run your code, it should produce no errors or warnings.
All files submitted pass flake8
All program files should be written with good programming style. This means your code should satisfy the requirements within the CSE 163 Code Quality Guide .
Any expectations on this page or the sub-pages for the assignment are met as well as all requirements for each of the problems are met.

Make sure you carefully read the bullets above as they may or may not change from assignment to assignment!

A note on allowed material

A lot of students have been asking questions like "Can I use this method or can I use this language feature in this class?". The general answer to this question is it depends on what you want to use, what the problem is asking you to do and if there are any restrictions that problem places on your solution.

There is no automatic deduction for using some advanced feature or using material that we have not covered in class yet, but if it violates the restrictions of the assignment, it is possible you will lose points. It's not possible for us to list out every possible thing you can't use on the assignment, but we can say for sure that you are safe to use anything we have covered in class so far as long as it meets what the specification asks and you are appropriately using it as we showed in class.

For example, some things that are probably okay to use even though we didn't cover them:

Using the update method on the set class even though I didn't show it in lecture. It was clear we talked about sets and that you are allowed to use them on future assignments and if you found a method on them that does what you need, it's probably fine as long as it isn't violating some explicit restriction on that assignment.
Using something like a ternary operator in Python. This doesn't make a problem any easier, it's just syntax.

For example, some things that are probably not okay to use:

Importing some random library that can solve the problem we ask you to solve in one line.
If the problem says "don't use a loop" to solve it, it would not be appropriate to use some advanced programming concept like recursion to "get around" that restriction.

These are not allowed because they might make the problem trivially easy or violate what the learning objective of the problem is.

You should think about what the spec is asking you to do and as long as you are meeting those requirements, we will award credit. If you are concerned that an advanced feature you want to use falls in that second category above and might cost you points, then you should just not use it! These problems are designed to be solvable with the material we have learned so far so it's entirely not necessary to go look up a bunch of advanced material to solve them.

tl;dr; We will not be answering every question of "Can I use X" or "Will I lose points if I use Y" because the general answer is "You are not forbidden from using anything as long as it meets the spec requirements. If you're unsure if it violates a spec restriction, don't use it and just stick to what we learned before the assignment was released."

This assignment is due by Thursday, July 23 at 23:59 (PDT) .

You should submit your finished hw3.py , and hw3-written.txt on Ed .

You may submit your assignment as many times as you want before the late cutoff (remember submitting after the due date will cost late days). Recall on Ed, you submit by pressing the "Mark" button. You are welcome to develop the assignment on Ed or develop locally and then upload to Ed before marking.

Big Data Analytics

Assignment 1
Assignment 2
Assignment 3
Final Team Project
Topic 3: Spark sparkDemo_py.py

Business Essentials
Leadership & Management
Credential of Leadership, Impact, and Management in Business (CLIMB)
Entrepreneurship & Innovation
Digital Transformation
Finance & Accounting
Business in Society
For Organizations
Support Portal
Media Coverage
Founding Donors
Leadership Team

Harvard Business School →
HBS Online →
Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

Career Development
Communication
Decision-Making
Earning Your MBA
Negotiation
News & Events
Productivity
Staff Spotlight
Student Profiles
Work-Life Balance
AI Essentials for Business
Alternative Investments
Business Analytics
Business Strategy
Business and Climate Change
Design Thinking and Innovation
Digital Marketing Strategy
Disruptive Strategy
Economics for Managers
Entrepreneurship Essentials
Financial Accounting
Global Business
Launching Tech Ventures
Leadership Principles
Leadership, Ethics, and Corporate Accountability
Leading with Finance
Management Essentials
Negotiation Mastery
Organizational Leadership
Power and Influence for Positive Impact
Strategy Execution
Sustainable Business Strategy
Sustainable Investing
Winning with Digital Platforms

4 Examples of Business Analytics in Action

15 Jan 2019

Data is a valuable resource in today’s ever-changing marketplace. For business professionals, knowing how to interpret and communicate data is an indispensable skill that can inform sound decision-making.

“The ability to bring data-driven insights into decision-making is extremely powerful—all the more so given all the companies that can’t hire enough people who have these capabilities,” says Harvard Business School Professor Jan Hammond , who teaches the online course Business Analytics . “It’s the way the world is going.”

Before taking a look at how some companies are harnessing the power of data, it’s important to have a baseline understanding of what the term “business analytics” means.

Access your free e-book today.

What Is Business Analytics?

Business analytics is the use of math and statistics to collect, analyze, and interpret data to make better business decisions.

There are four key types of business analytics: descriptive, predictive, diagnostic, and prescriptive. Descriptive analytics is the interpretation of historical data to identify trends and patterns, while predictive analytics centers on taking that information and using it to forecast future outcomes. Diagnostic analytics can be used to identify the root cause of a problem. In the case of prescriptive analytics , testing and other techniques are employed to determine which outcome will yield the best result in a given scenario.

Related : 4 Types of Data Analytics to Improve Decision-Making

Across industries, these data-driven approaches have been employed by professionals to make informed business decisions and attain organizational success.

Check out the video below to learn more about business analytics, and subscribe to our YouTube channel for more explainer content!

Business Analytics vs. Data Science

It’s important to highlight the difference between business analytics and data science . While both processes use big data to solve business problems they’re separate fields.

The main goal of business analytics is to extract meaningful insights from data to guide organizational decisions, while data science is focused on turning raw data into meaningful conclusions through using algorithms and statistical models. Business analysts participate in tasks such as budgeting, forecasting, and product development, while data scientists focus on data wrangling , programming, and statistical modeling.

While they consist of different functions and processes, business analytics and data science are both vital to today’s organizations. Here are four examples of how organizations are using business analytics to their benefit.

Business Analytics | Become a data-driven leader | Learn More

Business Analytics Examples

According to a recent survey by McKinsey , an increasing share of organizations report using analytics to generate growth. Here’s a look at how four companies are aligning with that trend and applying data insights to their decision-making processes.

1. Improving Productivity and Collaboration at Microsoft

At technology giant Microsoft , collaboration is key to a productive, innovative work environment. Following a 2015 move of its engineering group's offices, the company sought to understand how fostering face-to-face interactions among staff could boost employee performance and save money.

Microsoft’s Workplace Analytics team hypothesized that moving the 1,200-person group from five buildings to four could improve collaboration by increasing the number of employees per building and reducing the distance that staff needed to travel for meetings. This assumption was partially based on an earlier study by Microsoft , which found that people are more likely to collaborate when they’re more closely located to one another.

In an article for the Harvard Business Review , the company’s analytics team shared the outcomes they observed as a result of the relocation. Through looking at metadata attached to employee calendars, the team found that the move resulted in a 46 percent decrease in meeting travel time. This translated into a combined 100 hours saved per week across all relocated staff members and an estimated savings of $520,000 per year in employee time.

The results also showed that teams were meeting more often due to being in closer proximity, with the average number of weekly meetings per person increasing from 14 to 18. In addition, the average duration of meetings slightly declined, from 0.85 hours to 0.77 hours. These findings signaled that the relocation both improved collaboration among employees and increased operational efficiency.

For Microsoft, the insights gleaned from this analysis underscored the importance of in-person interactions and helped the company understand how thoughtful planning of employee workspaces could lead to significant time and cost savings.

2. Enhancing Customer Support at Uber

Ensuring a quality user experience is a top priority for ride-hailing company Uber. To streamline its customer service capabilities, the company developed a Customer Obsession Ticket Assistant (COTA) in early 2018—a tool that uses machine learning and natural language processing to help agents improve their speed and accuracy when responding to support tickets.

COTA’s implementation delivered positive results. The tool reduced ticket resolution time by 10 percent, and its success prompted the Uber Engineering team to explore how it could be improved.

For the second iteration of the product, COTA v2, the team focused on integrating a deep learning architecture that could scale as the company grew. Before rolling out the update, Uber turned to A/B testing —a method of comparing the outcomes of two different choices (in this case, COTA v1 and COTA v2)—to validate the upgraded tool’s performance.

Preceding the A/B test was an A/A test, during which both a control group and a treatment group used the first version of COTA for one week. The treatment group was then given access to COTA v2 to kick off the A/B testing phase, which lasted for one month.

At the conclusion of testing, it was found that there was a nearly seven percent relative reduction in average handle time per ticket for the treatment group during the A/B phase, indicating that the use of COTA v2 led to faster service and more accurate resolution recommendations. The results also showed that customer satisfaction scores slightly improved as a result of using COTA v2.

With the use of A/B testing, Uber determined that implementing COTA v2 would not only improve customer service, but save millions of dollars by streamlining its ticket resolution process.

Related : How to Analyze a Dataset: 6 Steps

3. Forecasting Orders and Recipes at Blue Apron

For meal kit delivery service Blue Apron, understanding customer behavior and preferences is vitally important to its success. Each week, the company presents subscribers with a fixed menu of meals available for purchase and employs predictive analytics to forecast demand , with the aim of using data to avoid product spoilage and fulfill orders.

To arrive at these predictions, Blue Apron uses algorithms that take several variables into account, which typically fall into three categories: customer-related features, recipe-related features, and seasonality features. Customer-related features describe historical data that depicts a given user’s order frequency, while recipe-related features focus on a subscriber’s past recipe preferences, allowing the company to infer which upcoming meals they’re likely to order. In the case of seasonality features, purchasing patterns are examined to determine when order rates may be higher or lower, depending on the time of year.

Through regression analysis—a statistical method used to examine the relationship between variables—Blue Apron’s engineering team has successfully measured the precision of its forecasting models. The team reports that, overall, the root-mean-square error—the difference between predicted and observed values—of their projection of future orders is consistently less than six percent, indicating a high level of forecasting accuracy.

By employing predictive analytics to better understand customers, Blue Apron has improved its user experience, identified how subscriber tastes change over time, and recognized how shifting preferences are impacted by recipe offerings.

Related : 5 Business Analytics Skills for Professionals

4. Targeting Consumers at PepsiCo

Consumers are crucial to the success of multinational food and beverage company PepsiCo. The company supplies retailers in more than 200 countries worldwide , serving a billion customers every day. To ensure the right quantities and types of products are available to consumers in certain locations, PepsiCo uses big data and predictive analytics.

PepsiCo created a cloud-based data and analytics platform called Pep Worx to make more informed decisions regarding product merchandising. With Pep Worx, the company identifies shoppers in the United States who are likely to be highly interested in a specific PepsiCo brand or product.

For example, Pep Worx enabled PepsiCo to distinguish 24 million households from its dataset of 110 million US households that would be most likely to be interested in Quaker Overnight Oats. The company then identified specific retailers that these households might shop at and targeted their unique audiences. Ultimately, these customers drove 80 percent of the product’s sales growth in its first 12 months after launch.

PepsiCo’s analysis of consumer data is a prime example of how data-driven decision-making can help today’s organizations maximize profits.

Which HBS Online Business Essentials Course is Right for You? | Download Your Free Flowchart

Developing a Data Mindset

As these companies illustrate, analytics can be a powerful tool for organizations seeking to grow and improve their services and operations. At the individual level, a deep understanding of data can not only lead to better decision-making, but career advancement and recognition in the workplace.

“Using data analytics is a very effective way to have influence in an organization,” Hammond says . “If you’re able to go into a meeting, and other people have opinions, but you have data to support your arguments and your recommendations, you’re going to be influential.”

Do you want to leverage the power of data within your organization? Explore Business Analytics —one of our online business essentials courses —to learn how to use data analysis to solve business problems.

This post was updated on March 24, 2023. It was originally published on January 15, 2019.

About the Author

Python Data Analysis Example: A Step-by-Step Guide for Beginners

data analysis

Doing real data analysis exercises is a great way to learn. But data analysis is a broad topic, and knowing how to proceed can be half the battle. In this step-by-step guide, we’ll show you a Python data analysis example and demonstrate how to analyze a dataset.

A great way to get practical experience in Python and accelerate your learning is by doing data analysis challenges. This will expose you to several key Python concepts, such as working with different file types, manipulating various data types (e.g. integers and strings), looping, and data visualization. Furthermore, you’ll also learn important data analysis techniques like cleaning data, smoothing noisy data, performing statistical tests and correlation analyses, and more. Along the way, you’ll also learn many built-in functions and Python libraries which make your work easier.

Knowing what steps to take in the data analysis process requires a bit of experience. For those wanting to explore data analysis, this article will show you a step-by-step guide to data analysis using Python. We’ll download a dataset, read it in, and start some exploratory data analysis to understand what we’re working with. Then we’ll be able to choose the best analysis technique to answer some interesting questions about the data.

This article is aimed at budding data analysts who already have a little experience in programming and analysis. If you’re looking for some learning material to get up-to-speed, consider our Introduction to Python for Data Science course, which contains 141 interactive exercises. For more in-depth material, our Python for Data Science track includes 5 interactive courses.

Python for Data Analysis

The process of examining, cleansing, transforming, and modeling data to discover useful information plays a crucial role in business, finance, academia, and other fields. Whether it's understanding customer behavior, optimizing business processes, or making informed decisions, data analysis provides you with the tools to unlock valuable insights from data.

Python has emerged as a preferred tool for data analysis due to its simplicity, versatility, and many o pen-source libraries . With its intuitive syntax and large online community, Python enables both beginners and experts to perform complex data analysis tasks efficiently. Libraries such as pandas, NumPy, and Matplotlib make this possible by providing essential functionalities for all aspects of the data analysis process.

The pandas library simplifies the process of working with structured data (e.g. tabular data, time series). NumPy , which is used for scientific computing in Python, provides powerful array objects and functions for numerical operations. It is essential for the mathematical computations involved in data analysis. It’s particularly useful for working with B ig D ata , as it is very efficient. Matplotlib is a comprehensive library for creating visualizations in Python ; it facilitates the exploration and communication of data insights.

In the following sections, we’ll leverage these libraries to analyze a real-world dataset and demonstrate the process of going from raw data to useful conclusions.

The Sunspots Dataset

For this Python data analysis example, we’ll be working with the Sunspots dataset, which can be downloaded from Kaggle . The data includes a row number, a date, and an observation of the total number of sunspots for each month from 1749 to 2021.

Sunspots are regions of the sun's photosphere that are temporarily cooler than the surrounding material due to a reduction in convective transport of energy. As such, they appear darker and can be relatively easily observed – which accounts for the impressively long time period of the dataset. Sunspots can last anywhere from a few days to a few months, and have diameters ranging from around 16 km to 160,000 km. They can also be associated with solar flares and coronal mass ejections, which makes understanding them important for life on Earth.

Some interesting questions that could be investigated are:

What is the period of sunspot activity?
When can we expect the next peak in solar activity?

Python Data Analysis Example

Step 1: import data.

Once you have downloaded the Sunspots dataset, the next step is to import the data into Python. There are several ways to do this; the one you choose depends on the format of your data.

If you have data in a text file, you may need to read the data in line-by-line using a for loop . As an example, take a look at how we imported the atmospheric sounding dataset in the article 7 Datasets to Practice Data Analysis in Python .

Alternatively, the data could be in the JSON format . In this case, you can use Python’s json library . This is covered in the How to Read and Write JSON Files in Python course.

A common way to store data is in either Excel (.xlsx) or comma-separated-value (.csv) files. In both of these cases, you can read the data directly into a pandas DataFrame. This is a useful way to parse data, since you can directly use many helpful pandas functions to manipulate and process the data. The How to Read and Write CSV Files in Python and How to Read and Write Excel Files in Python courses include interactive exercises to demonstrate this functionality.

Since the Sunspots dataset is in the CSV format, we can read it in using pandas. If you haven’t installed pandas yet, you can do so with a quick command:

Now, you can import the data into a DataFrame:

The read_csv() function automatically parses the data. It comes with many arguments to customize how the data is imported. For example, the index_col argument defines which column to use as the row label. The parse_dates argument defines which column holds dates. Our DataFrame, called df, holds our sunspots data with the variable name Monthly Mean Total Sunspot Number and the date of observation with the variable name Date .

Step 2: Data Cleaning and Preparation

Cleaning the data involves handling missing values, converting variables into the correct data types, and applying any filters.

If your data has missing values, there are a number of possible ways to handle them. You could simply just convert them to NaN (not a number). Alternatively, you could do a forward (backward) fill, which copies the previous (next) value into the missing position. Or you could also interpolate by using neighboring values to extrapolate a value into the missing position. The method you choose depends on your use case.

You should also check to see that numerical data is stored as a float or integer; if not, you need to convert it to the correct data type. If there are outliers in your data, you may consider removing them so as not to bias your results.

Or maybe you’re working with text data and you need to remove punctuation and numbers from your text and convert everything to lowercase. All these considerations fall under the umbrella of data cleaning. For some concrete examples, see our article Python Data Cleaning: A How-to Guide for Beginners .

Let’s start by getting an overview of our dataset:

The df.head() function prints the first 5 rows of data. You can see the row number (starting from zero), the date (in yyyy-mm-dd format), and the observation of the number of sunspots for the month. To check the datatypes of the variables, execute the following command:

The date has the datatype datetime64 , which is used to store dates in pandas, and the number of sunspots variable is a float.

Next, here's how to check if there are any missing data points in the Monthly Mean Total Sunspot Number variable:

This takes advantage of the built-in isna() function, which checks to see if there are any missing values. It returns a series of booleans – True if a value is missing, False if not. Then, we use the built-in function any() to check if any of the booleans are True. This returns False , which indicates there are no missing values in our data. You can find more details about this important step in The Most Helpful Python Data Cleaning Modules .

Step 3: Exploratory Data Analysis

The next stage is to start analyzing your data by calculating summary statistics, plotting histograms and scatter plots, or performing statistical tests. The goal is to gain a better understanding of the variables, and then use this understanding to guide the rest of the analysis. After performing exploratory data analysis, you will have a better understanding of what your data looks like and how to use it to answer questions. Our article Python Exploratory Data Analysis Cheat Sheet contains many more details, examples, and ideas about how to proceed.

A good starting point is to do a basic statistical analysis to determine the mean, median, standard deviation, etc. This can easily be achieved by using the df.describe() function:

We have a total of 3,265 observations and a mean of over 81 sunspots per month. The minimum is zero and the maximum is 398. This gives us an idea of the range of typical values. The standard deviation is about 67, which gives us an idea about how much the number of sunspots varies.

Notice the 50% percentile is less than the mean. This implies the data is skewed to lower values. This is very useful information if we want to do more advanced statistics since some tests assume a normal distribution.

We can confirm this by plotting a histogram of the number of sunspots per month. Visualization is an important skill in Python data analysis. Check out our article The Top 5 Python Libraries for Data Visualization . For our purposes, we’ll use matplotlib . This too can easily be installed with a quick pip install command. The code to plot a histogram looks like this:

Now we can see the most common value is less than 20 sunspots for the month, and numbers above 200 are quite rare. Finally, let’s plot the time series to see the full dataset:

We can see from the above plot there is a periodic increase and decrease in the number of sunspots. It looks like the maximum occurs roughly every 9 – 12 years. A natural question arises as to exactly how long that period is.

Signal processing is a detailed topic, so we’ll skim over some of the hairy details. To keep it simple, we need to decompose the above signal into a frequency spectrum, then find the dominant frequency. From this we can then compute the period. To compute the frequency spectrum, the Fourier Transform can be used, which is implemented in NumPy :

Try plotting the frequency spectrum and you’ll notice many peaks. One of those hairy details of signal processing is the presence of peaks at the start and end of the array np.abs(fft_result) . We can see from the time series we plotted above the period should be somewhere between 9 – 12 years, so we can safely exclude these peaks by slicing the magnitude array to filter out unwanted frequencies:

The output is as follows:

We used NumPy’s argmax() function to find the index of the maximum frequency, used this to find the frequency, and then converted this to a period. We finally print the results as a period of years.

This is a great example of using the understanding gained from exploratory data analysis to inform our data processing so we get a result that makes sense.

Step 4: Drawing Conclusions from Data

We were able to learn that the average number of sunspots per month is around 81, but the distribution is highly skewed to lower numbers. Indeed, the most common number of sunspots per month is less than 20, but in a period of high solar activity (75 th percentile), there could be over 120.

By plotting the time series, we could see the signal is periodic and get an idea that there is a regular maximum and minimum in the number of sunspots. By doing some signal processing, we determined the maximum number of sunspots is about every 11 years. From the timeseries plot we can see the last maximum was around 2014, meaning the next should be around 2025.

Further Python Data Analysis Examples

Working with the Sunspots dataset presents some unique advantages – e.g. it’s not a common dataset. We discuss this in our article 11 Tips for Building a Strong Data Science Portfolio with Python . This example of Python data analysis can also teach us a lot about programming in Python. We learnt how to read data into a pandas DataFrame and summarize our data using built-in functions. We did some plotting with Matplotlib and got a taste of signal processing with NumPy. We also did a little array slicing to get results that make sense. You’ll learn many of these important topics in the Introduction to Python for Data Science course and the Python for Data Science track.

We just scratched the surface of this analysis of sunspot data in Python. There are many more interesting questions which could be answered. For example, is there a trend in the number of sunspots over the 272 years of data? How long does the maximum last? How many sunspots should there be during our predicted next maximum? These questions can all be answered with Python.

There’s always more to learn on your Python data analysis journey, and books are a great resource. Our article The Best Python Books for Data Science has some great suggestions for your next trip to a bookstore. All the suggestions there will give you the tools to delve deeper into Python and data analysis techniques. Then, it’s a matter of practicing what you learn by starting a new data science project. Here are some Python Data Science Project Ideas . Happy coding!

Top 10 Data Analysis Templates with Samples and Examples

Mohammed Sameer

If people could eat data instead of food, we could end world hunger with enough spare data left over to tackle 3 famines.

This startling but obvious statement underscores the abundance of data available to the human race today and the humungous rate at which it has grown in our digital age. Just as sustenance nourishes our bodies, data fuels our intellect, satiating the hunger for insights and understanding.

Data is the foundation upon which the structure of information stands tall. Imagine gazing at a puzzle's scattered pieces – each is important, might be beautiful and vital, but the true picture emerges only when the pieces interlock. Similarly, data is the root of knowledge for today’s businesses. Our new Data Analysis Templates are the masterful hands that bring all that scattered knowledge and wisdom together.

These PPT Presentations emerge as essential companions in a landscape where accurate decision-making means the difference between thriving and surviving. Understanding data is pivotal in the symphony of business strategies, marketing endeavors, and research pursuits.

The 100% customizable nature of the templates provides you with the desired flexibility to edit your presentations. The content-ready slides give you the much-needed structure.

Let’s explore!

Template 1: Data Analysis Process PPT Set

Use this PPT Set to help stakeholders understand difficulties that mar the data analysis process and gain valuable insights. Explore the crucial stages of data analysis, from establishing data requirements and efficient data collection to thorough data processing and cleaning. This PPT Design highlights the often underestimated yet pivotal phase of data cleaning. With this template, you'll understand how data lays the foundation for seamless analysis, leading to more accurate results and impactful communication. Download now!

Download this template

Template 2: Data Analysis Business Evaluation Process for Visualization and Presentation

This holistic PPT Bundle guides you through the complex stages of visualization and presentation while offering a profound understanding of each crucial phase. Use this presentation template to understand the essence of successful data analysis, as it breaks down the process into digestible segments. From the initial steps of business issue comprehension and data understanding to data preparation, exploratory analysis, monitoring, validation, and finally, captivating visualization and presentation – every facet is covered. This PPT Preset goes beyond mere process explanation, offering a robust framework for the holistic development of data conceptualization, collection, analysis, and cleaning procedures. Get it today!

Get this template

Template 3: Data Requirement Analysis PPT Bundle

Navigating challenges of problem-solving, prioritization, and data insight, this PPT Presentation presents a strategic roadmap that transforms raw information into actionable intelligence. It starts with a deep dive into the heart of your business challenges. Focusing on defining the core problems, this presentation template guides you through the process of setting priorities, ensuring every move is a step closer to your objectives. Data collection, a crucial cornerstone, is explained through insightful visual aids and organized segments. Witness the transformation of disparate data points into a coherent narrative, empowering you to decipher trends, anomalies, and opportunities.

This PPT Template equips you with the tools to not only gather data but also comprehend its implications, turning information into true knowledge. Navigating the challenges of data requirement analysis is no longer a daunting task. From security gaps that demand attention to complex data systems that require expertise, our template ensures you're prepared to overcome these hurdles with confidence. The high costs that often come with data analysis are confronted head-on, unraveling budget-friendly strategies that don't compromise on quality. Get this template today!

Grab this template

Template 4: Big Data Analysis PPT Set

This comprehensive PPT Deck presents a pre-made Big Data Analysis funnel that guides you through the rather complex process of turning data into gold. Gain a competitive edge by understanding effective data analysis techniques of association rule learning, classification tree analysis, genetic algorithm, regression analysis, and sentiment analysis. It's more than a run-of-the-mill PPT Presentation; it's a transformative tool. Invest in a big data analysis PPT like resource that's not just about graphs and numbers; get it now. Download now!

Template 5: Data Management Analysis PPT Framework

For achieving business excellence, the quest for efficient and time-saving solutions is a universal endeavor. Recognizing your aspirations, we present the Data Management Analysis PowerPoint Presentation — an invaluable asset for seamless change management and effective data analysis. It incorporates PPT Slides designed to provide an effortless avenue for embracing change management and conducting incisive data analysis. It offers a cohesive platform for centralizing your objectives, ready to be shared with your team. The judicious use of text boxes empowers you to articulate your perspectives with precision on each pertinent subject. Download today!

Template 6: Predictive Data Analysis PPT Layout

Get this PPT Preset to consolidate your stakeholder's grasp on predictive analytics, a discipline that uses statistical methodologies, cutting-edge machine learning algorithms, and a suite of tools to dissect historical data. This PPT Layout guides you through a well-structured journey, unfolding the essentials of predictive analytics, its foundational framework, and a suite of models that constitute its core. The significance of predictive analytics takes center stage, underscored by its multifaceted applications. Additionally, this resource has an Estimation Model PPT Slide, which explains the key tenets of diverse predictive analytics tools and their closely-knit workflows. The demarcation between the four pivotal categories of advanced analytics in this PPT deck receives careful attention. It sheds light on predictive analytics models – from classification to clustering models and beyond. Download now!

Template 7: Dashboard For IT Operations Data Analysis

This PPT Template Dashboard is a dynamic representation of your operational landscape. This PPT Set helps track the total number of cases from inception to resolution. Visualize trends with a graph showcasing the weekly ebb and flow of opened and closed cases. Prioritize effectively, allocating resources where they matter most, as the presentation template depicts it across departments. Efficiency meets clarity as you explore the time distribution of tickets on a day-by-day basis. Gain a better understanding of workflow patterns and resource utilization. Analyze open case statuses, fostering an environment of proactive response and swift action. Download now!

Template 8: Quarterly Sales Data Analysis Report

Visualize your progress with ease using this PPT Template's intuitive presentation of monthly sales data. Get a clear view of team-wise statistics that showcase individual contributions, fostering a culture of recognition and growth. Uncover finer details through the nuanced comparison of total versus actual sales values, empowering you to identify trends and opportunities. Engage stakeholders in strategy evaluation as you assess team goals versus actual achievements. Pinpoint areas of excellence and those warranting attention, refining your approach. Download now!

Template 9: Real-Time Marketing Data Analysis

Here's a dynamic marketing analysis tool blending insights and aesthetics. It presents a pie chart comparing planned vs. actual budgets while diving deep into sections showcasing real-time marketing benefits: Elevated customer experiences, surging conversions, enhanced retention, and refined brand perception. Navigate budget allocation through intuitive bar graphs. Improve your strategy with data symphony, moving a step closer to success through informed choices. Download now!

Template 10: Data Analysis Process for Visualization and Presentation

Embark on a data-driven journey with this PPT Set. Learn the process of Data Analysis, Visualization, and Presentation to address complex business challenges. This PPT Design walks you through these stages, from issue identification and data preparation to exploratory analysis modeling. Witness raw data transform into insights through rigorous validation. Culminate in captivating visualizations and masterful presentations, setting new standards for impactful communication. Download now!

Bridging Numbers and Narratives: Your Journey Through Data Analysis

In a world where data weaves the fabric of progress, our journey through this blog comes to an inspiring end. As you venture into data analysis armed with our templates, remember that each graph, each layout, and each piece of information is a brushstroke on the canvas of understanding. With every mouse click, you’re not just navigating slides; you're charting the course for informed decisions, breakthrough discoveries, and transformative strategies.

FAQs on Data Analysis

What is data analysis.

Data analysis involves inspecting, cleansing, transforming, and modeling data to derive meaningful insights, draw conclusions, and support decision-making. It encompasses various techniques, including statistical methods, machine learning, and visualization, to uncover patterns, trends, and relationships within datasets.

What are the four types of data analysis?

There are four main types of data analysis:

Descriptive Analysis: This type of analysis focuses on summarizing and describing the main features of a dataset. It involves statistical measures such as mean, median, mode, range, and standard deviation. Descriptive analysis aims to clearly understand the data's characteristics but doesn't involve drawing conclusions or making predictions.
Diagnostic Analysis: Diagnostic analysis involves digging deeper into data to understand why certain patterns or outcomes occurred. It aims to identify the root causes of specific events or trends. Techniques used in diagnostic analysis often include data visualization, exploratory data analysis, and statistical tests to uncover relationships and correlations.
Predictive Analysis: Predictive analysis involves using historical data to predict future events or outcomes. This type of analysis uses statistical models, machine learning algorithms, and data mining techniques to identify patterns and trends that can be used to forecast future trends. It's widely used in finance, marketing, and healthcare for making informed decisions.
Prescriptive Analysis: Prescriptive analysis goes beyond predicting future outcomes. It provides recommendations or solutions for specific situations based on historical and current data analysis. This type of analysis considers different possible actions and their potential outcomes to guide decision-making. Prescriptive analysis is often used in complex scenarios involving multiple variables and options.

Where is data analysis used?

Data analysis is used in a wide range of fields and industries, including but not limited to:

Business: Analyzing customer behavior, market trends, and financial performance.
Healthcare: Analyzing patient records, medical research data, and disease trends.
Science: Analyzing experimental results, simulations, and observations.
Finance: Analyzing investment trends, risk assessment, and portfolio management.
Marketing: Analyzing campaign effectiveness, consumer preferences, and market segmentation.
Social Sciences: Analyzing survey data, demographic trends, and human behavior.
Sports: Analyzing player performance, game statistics, and strategy optimization.

What is the main tool for data analysis?

There isn't a single "main" tool for data analysis, as the choice of tools depends on the specific tasks and the preferences of the analyst. However, some widely used tools for data analysis include:

Spreadsheet Software: Like Microsoft Excel or Google Sheets, used for basic data manipulation and visualization.
Statistical Software: Such as R and Python's libraries (e.g., pandas, numpy, scipy), used for in-depth statistical analysis and modeling.
Data Visualization Tools: Like Tableau, Power BI, or matplotlib/seaborn in Python, used to create visual representations of data.
Database Management Systems (DBMS): Such as SQL-based systems for querying and managing large datasets.
Machine Learning Libraries: Such as scikit-learn, TensorFlow, and PyTorch for building predictive models.

Why is data analysis important?

Data analysis is crucial for several reasons:

Informed Decision-Making: It provides insights that help individuals and organizations make informed decisions based on evidence rather than intuition.
Identifying Patterns and Trends: It helps to uncover hidden patterns, trends, and correlations in large datasets that might not be apparent on the surface.
Problem Solving: Data analysis aids in solving complex problems by providing a structured approach to understanding and addressing issues.
Improving Efficiency and Performance: It allows businesses to optimize processes, improve efficiency, and enhance performance based on data-driven insights.
Innovation and Research: Data analysis is essential in scientific research and innovation, helping to validate hypotheses and drive discoveries.
Competitive Advantage: Organizations that effectively use data analysis gain a competitive edge by better understanding their customers, markets, and internal operations.
Risk Management: Data analysis enables better risk assessment and management by identifying potential issues or anomalies early on.
Resource Allocation: It helps allocate resources effectively by understanding where investments are most likely to yield positive outcomes.

How Financial Management Templates Can Make a Money Master Out of You
How to Design the Perfect Service Launch Presentation [Custom Launch Deck Included]
Quarterly Business Review Presentation: All the Essential Slides You Need in Your Deck
[Updated 2023] How to Design The Perfect Product Launch Presentation [Best Templates Included]

Liked this blog? Please recommend us

Top 20 Big Data and Analytics Templates for Machine Learning, Cloud Computing and Artificial Intelligence PPT Presentations

Top 10 Data Security Management Templates to Safeguard Your Business (Free PDF Attached)

This form is protected by reCAPTCHA - the Google Privacy Policy and Terms of Service apply.

Digital revolution powerpoint presentation slides

Sales funnel results presentation layouts

3d men joinning circular jigsaw puzzles ppt graphics icons

Business Strategic Planning Template For Organizations Powerpoint Presentation Slides

Future plan powerpoint template slide

Project Management Team Powerpoint Presentation Slides

Brand marketing powerpoint presentation slides

Launching a new service powerpoint presentation with slides go to market

Agenda powerpoint slide show

Four key metrics donut chart with percentage

Engineering and technology ppt inspiration example introduction continuous process improvement

Meet our team representing in circular format

What is Big Data Analytics?

Uncover the realm of big data analytics: its impact, tools, challenges, and real-world applications across industries. Dive into data-driven insights now.

What is big data analytics?

How does big data analytics work, the importance of big data analytics, types of big data analytics, the benefits of big data analytics, the challenges of big data analytics, big data analytics examples, common big data analytics techniques, harness your big data with amplitude analytics.

Big data analytics examines and analyzes large and complex data sets known as “big data.”

Through this analysis, you can uncover valuable insights, patterns, and trends to make more informed decisions. It uses several techniques, tools, and technologies to process, manage, and examine meaningful information from massive datasets.

We typically apply big data analytics when data is too large or complicated for traditional data processing methods to handle efficiently. The more information there is, the greater the need for diverse analytical approaches, quicker handling times, and a more extensive data capacity.

Predictive Analytics - Prescriptive Analytics - Descriptive Analytics - Diagnostic Analytics

Big data analytics combines several stages and processes to extract insights.

Here’s a quick overview of what this could look like:

Data collection : Gather data from various sources, such as surveys, social media, websites, databases, and transaction records. This data can be structured, unstructured, or semi-structured.
Data storage : Store data in distributed systems or cloud-based solutions. These types of storage can handle a large volume of data and provide fault tolerance.
Data preprocessing : It’s best to clean and preprocess the raw data before performing analysis. This process could involve handling missing values, standardizing formats, addressing outliers, and structuring the data into a more suitable format.
Data integration : Data usually comes from various sources in different formats. Data integration combines the data into a unified format.
Data processing : Most organizations benefit from using distributed frameworks to process big data. These break down the tasks into smaller chunks and distribute them across multiple machines for parallel processing.
Data analysis techniques : Depending on the goal of the analysis, you’ll likely apply several data analysis techniques. These could include descriptive , predictive , and prescriptive analytics using machine learning, text mining, exploratory analysis, and other methods.
Data visualization : After analysis, communicate the results visually, like charts, graphs, dashboards, or other visual tools. Visualization helps you communicate complex insights in an understandable and accessible way.
I nterpretation and decision making : Interpret the insights gained from your analysis to draw conclusions and make data-backed decisions. These decisions impact business strategies, processes, and operations.
Feedback and scale : One of the main advantages of big data analytics frameworks is their ability to scale horizontally. This scalability enables you to handle increasing data volumes and maintain performance, so you have a sustainable method for analyzing large datasets.

It’s important to remember that big data analytics isn’t a linear process, but a cycle.

You’ll continually gather new data, analyze it, and refine business strategies based on the results. The whole process is iterative, which means adapting to changes and making adjustments is key.

Big data analytics has the potential to transform the way you operate, make decisions, and innovate. It’s an ideal solution if you’re dealing with massive datasets and are having difficulty choosing a suitable analytical approach.

By tapping into the finer details of your information, using techniques and specific tools, you can use your data as a strategic asset.

Big data analytics enables you to benefit from:

Informed decision-making : You can make informed decisions based on actual data, which reduces uncertainty and improves outcomes.
Business insights : Analyzing large datasets uncovers hidden patterns and trends, providing a deeper understanding of customer behavior and market dynamics.
Customer understanding : Get insight into customer preferences and needs so you can personalize experiences and create more impactful marketing strategies.
Operational efficiency : By analyzing operational data, you can optimize processes, identify bottlenecks, and streamline operations to reduce costs and improve productivity.
Innovation : Big data analytics can help you uncover new opportunities and niches within industries. You can identify unmet needs and emerging trends to develop more innovative products and services to stay ahead of the competition.

There are four main types of big data analytics— descriptive , diagnostic , predictive , and prescriptive . Each serves a different purpose and offers varying levels of insight.

Collectively, they enable businesses to comprehensively understand their big data and make decisions to drive improved performance.

Let’s take a closer look at each one.

Descriptive analytics

This type focuses on summarizing historical data to tell youwhat’s happened in the past. It uses aggregation, data mining, and visualization techniques to understand trends, patterns, and key performance indicators (KPIs).

Descriptive analytics helps you understand your current situation and make informed decisions based on historical information.

Diagnostic analytics

Diagnostic analytics goes beyond describing past events and aims to understand why they occurred. It separates data to identify the root causes of specific outcomes or issues.

By analyzing relationships and correlations within the data, diagnostic analytics helps you gain insights into factors influencing your results.

Predictive analytics

This type of analytics uses historical data and statistical algorithms to predict future events. It spots patterns and trends and forecasts what might happen next.

You can use predictive analytics to anticipate customer behavior, product demand, market trends, and more to plan and make strategic decisions proactively.

Prescriptive analytics

Prescriptive analytics builds on predictive analytics by recommending actions to optimize future outcomes. It considers various possible actions and their potential impact on the predicted event or outcome.

Prescriptive analytics help you make data-driven decisions by suggesting the best course of action based on your desired goals and any constraints.

Descriptive Analytics - Diagnostic Analytics - Predictive Analytics - Prescriptive Analytics

Big data analytics has become a clear business game changer by unlocking insights and opportunities.

Below we’ve highlighted some of the tangible benefits of this transformative approach.

Improved risk management

Big data encompasses massive data volumes from diverse sources, including real-time streams. Rapid analysis helps detect anomalies or unusual patterns quickly, preventing risks like fraud or security breaches that can have widespread and immediate consequences.

Example : Banks use big data analytics to spot unusual spending patterns in real-time, helping prevent fraudulent credit card transactions and safeguarding customer accounts.

Cost-efficiency

Big data analytics can process and analyze extensive datasets, including handling large-scale data streams from sources like IoT devices or social media in real time.

This comprehensive analysis enables you to optimize your operations, identify inefficiencies, and reduce costs at a level that might not be achievable with smaller datasets.

Example : Big data analytics optimizes production in manufacturing by analyzing data from sensors on the factory floor, reducing downtime and minimizing maintenance costs.

Better decision making

Applying big data analytics provides deeper insights, as it can analyze diverse and unstructured data types.

You can use it to analyze everything from structured databases to unstructured text and multimedia content. This variety of data sources enables richer insights into customer behavior, market trends, and other critical factors, helping you make more informed and strategic decisions.

Example : An ecommerce platform uses big data analytics to analyze customer browsing behavior and purchase history. This enables personalized recommendations to help improve customer satisfaction and drive sales.

Deeper insights

Big data analytics extracts insights from vast and diverse datasets. This includes structured and unstructured data, making it better at revealing nuanced patterns and hidden relationships.

By delving into massive datasets, big data analytics can uncover insights that have a transformative impact on business strategies and operations.

Example : A healthcare provider uses big data analytics to explore patient data, clinical research, and external sources to find personalized treatment options for complex medical conditions.

Big data analytics has immense potential, but has its share of hurdles.

You may encounter some of these challenges, so it’s important to recognize and understand how to overcome them.

Here are a few to look out for.

Making data accessible and organized

Handling large and diverse datasets can make organizing and accessing information challenging.

We recommend a cohesive data infrastructure that enables easy retrieval and integration for practical analysis.

Maintaining quality

The sheer volume and variety of data can lead to inconsistencies and inaccuracies.

Ensuring data quality through cleaning, validation, and proper data governance helps prevent incorrect analysis and decision-making.

Keeping data secure

Maintaining data security is a major concern given the large volume of sensitive information collected and analyzed.

Safeguarding data against breaches, unauthorized access, and cyber threats protects customer privacy and business integrity.

Finding the right tools

The rapidly evolving landscape of big data tools and technologies can be overwhelming.

We recommend using a buying committee of internal stakeholders to evaluate tools that integrate well together and match your business needs and goals.

Real-world applications of big data analytics have ignited shifts and shaped approaches across several industries.

We’ll explore some examples and highlight how this methodology helps decision-making and innovation in many business sectors.

In healthcare, big data analytics processes vast volumes of patient records, medical images, and genomic data.

It identifies intricate patterns in large datasets to predict disease trends, enhance personalized treatments, and even anticipate potential outbreaks by analyzing global health data.

Product development

Big data analytics facilitates product development by analyzing structured data like sales records and unstructured data like customer reviews and social media interactions.

This enables companies to uncover hidden insights about customer preferences to produce more innovative and targeted products.

Media and entertainment

Big data analytics helps the media and entertainment industry by dissecting streams of viewership data and social media interactions.

These techniques unravel real-time trends, helping media companies rapidly adapt their content offerings, optimize ad placement, and personalize recommendations for diverse audiences.

Marketing companies can benefit from big data analytics in several ways. Unlike smaller-scale analytical approaches, it can analyze intricate customer behavior across various channels and dissect complex patterns in real time.

Marketers can offer highly personalized experiences, detect shifting trends faster, and responsively adjust their strategies.

Big data analytics in ecommerce is more than simple sales analysis. It dives into vast and diverse datasets, including clickstream data, purchase histories, and online interactions.

It enables real-time recommendations, dynamic pricing adjustments, and enhanced supply chain management for a seamless customer experience.

In the banking sector, big data analytics doesn’t only focus on transaction monitoring.

It processes enormous amounts of transaction data in real time, using advanced algorithms and machine learning to find unusual patterns and behavior. In doing so, big data analytics helps banks reduce false positives and provide more accurate fraud signals than other methods.

There are many techniques in the big data analytics toolbox and you'll likely come across many as you dissect and analyze your information.

If you’re looking for somewhere to start, these are foundational techniques for handling big data.

Association Rule Learning: Used to find relationships or patterns in large datasets. It’s primarily applied in market basket analysis, where the goal is to discover associations between items frequently purchased together.
Classification Tree Analysis : Used for predictive modeling and classification tasks. They partition the dataset into subsets based on input features and then assign a class label to each one. Decision trees are one type of classification tree.
Genetic Algorithms : An optimization technique inspired by natural selection. This involves creating a population of potential solutions and evolving them over generations to find the best one. You can use genetic algorithms for various optimization problems, including feature selection, parameter tuning, etc.
Machine Learning : This covers various techniques in which algorithms learn patterns from data and make predictions or decisions. It includes supervised learning where models are trained on labeled data, unsupervised learning where patterns are inferred from unlabeled data, and reinforcement learning where they learn to make decisions based on rewards or punishments.
Clustering : An unsupervised learning technique in which data points are grouped into clusters based on similarity. It’s mostly used for customer segmentation and anomaly detection.
Regression Analysis : Models the relationship between dependent and independent variables. It’s commonly used for predicting numerical values, such as sales based on advertising costs.
Neural Networks : A class of machine learning models inspired by the brain’s structure. They consist of interconnected nodes—known as neurons—organized into layers. Deep learning is a subset of neural networks involving multiple hidden layers. Convolutional Neural Networks (CNNs) are used to analyze images, while Recurrent Neural Networks (RNNs) are used for sequence data.
Text Mining and Natural Language Processing (NLP): Focused on processing and understanding human language, these are used for sentiment analysis, topic modeling, language generation, and more.
Dimensionality Reduction : These techniques reduce the number of input features while preserving essential information. They help with visualization, noise reduction, and speeding up training.
Time Series Analysis : Used to analyze data points collected over time for forecasting, anomaly detection, and trend analysis.

From healthcare to marketing, big data analytics offers a lens into the intricate workings of our interconnected world.

It empowers you to navigate complexities, spot trends that elude the naked eye, and transform data into actionable strategies that drive growth.

As the business landscape evolves, so does the scope and impact of big data analytics—this is where Amplitude Analytics can help.

Amplitude Analytics bridges the gap between raw data and meaningful insights, guiding you toward a deeper understanding of your user’s journey.

As an all-in-one data analytics platform, it applies all four types of big data analytics—predictive, prescriptive, descriptive, and diagnostic—-to help you garner insights across all areas of your business. You’ll be able to analyze your data and truly decipher the stories and potential it holds.

Enhance your product, engage your customers, and make data-backed decisions that resonate.

Get started with Amplitude Analytics today .

Other Analytics Guides

What is Enterprise Analytics?
What is Predictive Analytics?
What is Diagnostic Analytics?
What is Descriptive Analytics?
What is Prescriptive Analytics?

Data Analytics Case Study: Complete Guide in 2024

What are data analytics case study interviews.

When you’re trying to land a data analyst job, the last thing to stand in your way is the data analytics case study interview.

One reason they’re so challenging is that case studies don’t typically have a right or wrong answer.

Instead, case study interviews require you to come up with a hypothesis for an analytics question and then produce data to support or validate your hypothesis. In other words, it’s not just about your technical skills; you’re also being tested on creative problem-solving and your ability to communicate with stakeholders.

This article provides an overview of how to answer data analytics case study interview questions. You can find an in-depth course in the data analytics learning path .

How to Solve Data Analytics Case Questions

Check out our video below on How to solve a Data Analytics case study problem:

With data analyst case questions, you will need to answer two key questions:

What metrics should I propose?
How do I write a SQL query to get the metrics I need?

In short, to ace a data analytics case interview, you not only need to brush up on case questions, but you also should be adept at writing all types of SQL queries and have strong data sense.

These questions are especially challenging to answer if you don’t have a framework or know how to answer them. To help you prepare, we created this step-by-step guide to answering data analytics case questions.

We show you how to use a framework to answer case questions, provide example analytics questions, and help you understand the difference between analytics case studies and product metrics case studies .

Data Analytics Cases vs Product Metrics Questions

Product case questions sometimes get lumped in with data analytics cases.

Ultimately, the type of case question you are asked will depend on the role. For example, product analysts will likely face more product-oriented questions.

Product metrics cases tend to focus on a hypothetical situation. You might be asked to:

Investigate Metrics - One of the most common types will ask you to investigate a metric, usually one that’s going up or down. For example, “Why are Facebook friend requests falling by 10 percent?”

Measure Product/Feature Success - A lot of analytics cases revolve around the measurement of product success and feature changes. For example, “We want to add X feature to product Y. What metrics would you track to make sure that’s a good idea?”

With product data cases, the key difference is that you may or may not be required to write the SQL query to find the metric.

Instead, these interviews are more theoretical and are designed to assess your product sense and ability to think about analytics problems from a product perspective. Product metrics questions may also show up in the data analyst interview , but likely only for product data analyst roles.

TRY CHECKING: Marketing Analytics Case Study Guide

Data Analytics Case Study Question: Sample Solution

Data Analytics Case Study Sample Solution

Let’s start with an example data analytics case question :

You’re given a table that represents search results from searches on Facebook. The query column is the search term, the position column represents each position the search result came in, and the rating column represents the human rating from 1 to 5, where 5 is high relevance, and 1 is low relevance.

Each row in the search_events table represents a single search, with the has_clicked column representing if a user clicked on a result or not. We have a hypothesis that the CTR is dependent on the search result rating.

Write a query to return data to support or disprove this hypothesis.

search_results table:

search_events table

Step 1: With Data Analytics Case Studies, Start by Making Assumptions

Hint: Start by making assumptions and thinking out loud. With this question, focus on coming up with a metric to support the hypothesis. If the question is unclear or if you think you need more information, be sure to ask.

Answer. The hypothesis is that CTR is dependent on search result rating. Therefore, we want to focus on the CTR metric, and we can assume:

If CTR is high when search result ratings are high, and CTR is low when the search result ratings are low, then the hypothesis is correct.
If CTR is low when the search ratings are high, or there is no proven correlation between the two, then our hypothesis is not proven.

Step 2: Provide a Solution for the Case Question

Hint: Walk the interviewer through your reasoning. Talking about the decisions you make and why you’re making them shows off your problem-solving approach.

Answer. One way we can investigate the hypothesis is to look at the results split into different search rating buckets. For example, if we measure the CTR for results rated at 1, then those rated at 2, and so on, we can identify if an increase in rating is correlated with an increase in CTR.

First, I’d write a query to get the number of results for each query in each bucket. We want to look at the distribution of results that are less than a rating threshold, which will help us see the relationship between search rating and CTR.

This CTE aggregates the number of results that are less than a certain rating threshold. Later, we can use this to see the percentage that are in each bucket. If we re-join to the search_events table, we can calculate the CTR by then grouping by each bucket.

Step 3: Use Analysis to Backup Your Solution

Hint: Be prepared to justify your solution. Interviewers will follow up with questions about your reasoning, and ask why you make certain assumptions.

Answer. By using the CASE WHEN statement, I calculated each ratings bucket by checking to see if all the search results were less than 1, 2, or 3 by subtracting the total from the number within the bucket and seeing if it equates to 0.

I did that to get away from averages in our bucketing system. Outliers would make it more difficult to measure the effect of bad ratings. For example, if a query had a 1 rating and another had a 5 rating, that would equate to an average of 3. Whereas in my solution, a query with all of the results under 1, 2, or 3 lets us know that it actually has bad ratings.

Product Data Case Question: Sample Solution

In product metrics interviews, you’ll likely be asked about analytics, but the discussion will be more theoretical. You’ll propose a solution to a problem, and supply the metrics you’ll use to investigate or solve it. You may or may not be required to write a SQL query to get those metrics.

We’ll start with an example product metrics case study question :

Let’s say you work for a social media company that has just done a launch in a new city. Looking at weekly metrics, you see a slow decrease in the average number of comments per user from January to March in this city.

The company has been consistently growing new users in the city from January to March.

What are some reasons why the average number of comments per user would be decreasing and what metrics would you look into?

Step 1: Ask Clarifying Questions Specific to the Case

Hint: This question is very vague. It’s all hypothetical, so we don’t know very much about users, what the product is, and how people might be interacting. Be sure you ask questions upfront about the product.

Answer: Before I jump into an answer, I’d like to ask a few questions:

Who uses this social network? How do they interact with each other?
Has there been any performance issues that might be causing the problem?
What are the goals of this particular launch?
Has there been any changes to the comment features in recent weeks?

For the sake of this example, let’s say we learn that it’s a social network similar to Facebook with a young audience, and the goals of the launch are to grow the user base. Also, there have been no performance issues and the commenting feature hasn’t been changed since launch.

Step 2: Use the Case Question to Make Assumptions

Hint: Look for clues in the question. For example, this case gives you a metric, “average number of comments per user.” Consider if the clue might be helpful in your solution. But be careful, sometimes questions are designed to throw you off track.

Answer: From the question, we can hypothesize a little bit. For example, we know that user count is increasing linearly. That means two things:

The decreasing comments issue isn’t a result of a declining user base.
The cause isn’t loss of platform.

We can also model out the data to help us get a better picture of the average number of comments per user metric:

January: 10000 users, 30000 comments, 3 comments/user
February: 20000 users, 50000 comments, 2.5 comments/user
March: 30000 users, 60000 comments, 2 comments/user

One thing to note: Although this is an interesting metric, I’m not sure if it will help us solve this question. For one, average comments per user doesn’t account for churn. We might assume that during the three-month period users are churning off the platform. Let’s say the churn rate is 25% in January, 20% in February and 15% in March.

Step 3: Make a Hypothesis About the Data

Hint: Don’t worry too much about making a correct hypothesis. Instead, interviewers want to get a sense of your product initiation and that you’re on the right track. Also, be prepared to measure your hypothesis.

Answer. I would say that average comments per user isn’t a great metric to use, because it doesn’t reveal insights into what’s really causing this issue.

That’s because it doesn’t account for active users, which are the users who are actually commenting. A better metric to investigate would be retained users and monthly active users.

What I suspect is causing the issue is that active users are commenting frequently and are responsible for the increase in comments month-to-month. New users, on the other hand, aren’t as engaged and aren’t commenting as often.

Step 4: Provide Metrics and Data Analysis

Hint: Within your solution, include key metrics that you’d like to investigate that will help you measure success.

Answer: I’d say there are a few ways we could investigate the cause of this problem, but the one I’d be most interested in would be the engagement of monthly active users.

If the growth in comments is coming from active users, that would help us understand how we’re doing at retaining users. Plus, it will also show if new users are less engaged and commenting less frequently.

One way that we could dig into this would be to segment users by their onboarding date, which would help us to visualize engagement and see how engaged some of our longest-retained users are.

If engagement of new users is the issue, that will give us some options in terms of strategies for addressing the problem. For example, we could test new onboarding or commenting features designed to generate engagement.

Step 5: Propose a Solution for the Case Question

Hint: In the majority of cases, your initial assumptions might be incorrect, or the interviewer might throw you a curveball. Be prepared to make new hypotheses or discuss the pitfalls of your analysis.

Answer. If the cause wasn’t due to a lack of engagement among new users, then I’d want to investigate active users. One potential cause would be active users commenting less. In that case, we’d know that our earliest users were churning out, and that engagement among new users was potentially growing.

Again, I think we’d want to focus on user engagement since the onboarding date. That would help us understand if we were seeing higher levels of churn among active users, and we could start to identify some solutions there.

Tip: Use a Framework to Solve Data Analytics Case Questions

Analytics case questions can be challenging, but they’re much more challenging if you don’t use a framework. Without a framework, it’s easier to get lost in your answer, to get stuck, and really lose the confidence of your interviewer. Find helpful frameworks for data analytics questions in our data analytics learning path and our product metrics learning path .

Once you have the framework down, what’s the best way to practice? Mock interviews with our coaches are very effective, as you’ll get feedback and helpful tips as you answer. You can also learn a lot by practicing P2P mock interviews with other Interview Query students. No data analytics background? Check out how to become a data analyst without a degree .

Finally, if you’re looking for sample data analytics case questions and other types of interview questions, see our guide on the top data analyst interview questions .

Your browser is not supported. Please download another browser to be able to use all of the Maven features.

Data Playground

Explore and download sample datasets hand-picked by Maven instructors. Practice applying your data analysis and visualization skills to real-world data, from flight delays and movie ratings to shark attacks and UFO sightings.

Recently updated
# of records - (low to high)
# of records - (high to low)

CRM Sales Opportunities

B2B sales pipeline data from a fictitious company that sells computer hardware.

CSV ( 627 KB )
Preview data
View full details

Global Electronics Retailer

Transactional data for a fictitious global electronics retailer.

CSV ( 4.7 MB )

World Economic Indicators

Economic development data from the World Bank and the United Nations.

CSV ( 2.9 MB )

CrossFit Games

Scraped data on athletes and scores from the 2019 CrossFit Games.

Excel ( 278 KB )

Great American Coffee Taste Test

Survey responses from ~4,000 Americans after a blind coffee taste test.

CSV ( 2.8 MB )

Ready to become a

Data rockstar.

Start learning for free, no credit card required!

tableau.com is not available in your region.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

Notifications

I have completed Data Analyst assignments in Data Analyst Course during my journey in ExcelR.

Vaishnavi-N24/Data-Analyst-Assignments

Folders and files, repository files navigation, data-analyst-assignments-excelr-solution-.

I have completed all the Data Analyst Assignmnets.

Jupyter Notebook 100.0%

Purdue Online Writing Lab Purdue OWL® College of Liberal Arts

Welcome to the Purdue Online Writing Lab

Welcome to the Purdue OWL

This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.

Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.

The Online Writing Lab at Purdue University houses writing resources and instructional material, and we provide these as a free service of the Writing Lab at Purdue. Students, members of the community, and users worldwide will find information to assist with many writing projects. Teachers and trainers may use this material for in-class and out-of-class instruction.

The Purdue On-Campus Writing Lab and Purdue Online Writing Lab assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue Writing Lab serves the Purdue, West Lafayette, campus and coordinates with local literacy initiatives. The Purdue OWL offers global support through online reference materials and services.

A Message From the Assistant Director of Content Development

The Purdue OWL® is committed to supporting students, instructors, and writers by offering a wide range of resources that are developed and revised with them in mind. To do this, the OWL team is always exploring possibilties for a better design, allowing accessibility and user experience to guide our process. As the OWL undergoes some changes, we welcome your feedback and suggestions by email at any time.

Please don't hesitate to contact us via our contact page if you have any questions or comments.

All the best,

Social Media

Facebook twitter.

IMAGES

Data Analytics Assignment
ITECH1103 Data Analytics Assignment 2 Semester 1 2021 1 .pdf
Data Analytics Assignment Help
DataAnalytics Assignment
DATA ANALYTICS ASSIGNMENT.pptx
Assignment 3 BDA

VIDEO

Analysis of Data? Some Examples to Explore
Data Analytics Assignment Description
Big data Analytics Assignment Ashish Mishra
A university dashboard (M1 Big Data Analytics assignment)
ITECH 1103 (Big Data Analytics) Assignment
Data Analytics Assignment Support Session

COMMENTS

20 Data Analytics Projects for All Levels
Data Analytics Projects for Beginners. As a beginner, you need to focus on importing, cleaning, manipulating, and visualizing the data. Data Importing: learn to import the data using SQL, Python, R, or web scraping. Data Cleaning: use various Python and R libraries to clean and process the data.
Data Analytics Project Assignment
For your research project you will conduct data analysis and right a report summarizing your analysis and the findings from your analysis. You will accomplish this by completing a series of assignments. Data 275 Research Project Assignment. In this week's assignment, you are required accomplish the following tasks: 1. Propose a topic for you ...
What is Data Analysis? An Expert Guide With Examples
Data analysis is a comprehensive method of inspecting, cleansing, transforming, and modeling data to discover useful information, draw conclusions, and support decision-making. It is a multifaceted process involving various techniques and methodologies to interpret data from various sources in different formats, both structured and unstructured.
What Is Data Analysis? (With Examples)
Explore four types of data analysis with examples. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly, one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...
Assignment 2: Exploratory Data Analysis
Assignment 2: Exploratory Data Analysis. In this assignment, you will identify a dataset of interest and perform an exploratory analysis to better understand the shape & structure of the data, investigate initial questions, and develop preliminary insights & hypotheses. Your final submission will take the form of a report consisting of ...
Homework 3: Data Analysis
Call this method percent_change_bachelors_2000s and return the difference (the percent in 2010 minus the percent in 2000) as a float. For example, assuming we have parsed hw3-nces-ed-attainment.csv and stored it in a variable called data, then the call percent_change_bachelors_2000s(data) will return 2.599999999999998.
Data Analytics: Definition, Uses, Examples, and More
Data analytics is the collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision making. Data analytics is often confused with data analysis. While these are related terms, they aren't exactly the same. In fact, data analysis is a subcategory of data analytics that deals ...
Big Data Analytics
Big Data Analytics - Assignments. Big Data Analytics. CSE545 - Spring 2019. Assignments. Assignment 1. Assignment 2. Assignment 3. Final Team Project.
Examples of Business Analytics in Action
Business Analytics Examples. According to a recent survey by McKinsey, an increasing share of organizations report using analytics to generate growth. Here's a look at how four companies are aligning with that trend and applying data insights to their decision-making processes. 1. Improving Productivity and Collaboration at Microsoft.
Python Data Analysis Example: A Step-by-Step Guide for Beginners
Step 1: Import Data. Once you have downloaded the Sunspots dataset, the next step is to import the data into Python. There are several ways to do this; the one you choose depends on the format of your data. If you have data in a text file, you may need to read the data in line-by-line using a for loop.
Top 10 Data Analysis Templates with Samples and Examples
However, some widely used tools for data analysis include: Spreadsheet Software: Like Microsoft Excel or Google Sheets, used for basic data manipulation and visualization. Statistical Software: Such as R and Python's libraries (e.g., pandas, numpy, scipy), used for in-depth statistical analysis and modeling.
The Beginner's Guide to Statistical Analysis
Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarize your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.
PDF Data Analysis: Statistical Assignment
data. 2. Develop a research question that you would like to consider in this area. Consider what you hope to prove/disprove by carrying out a statistical analysis and consider whether a statistical analysis is achievable and/or appropriate to the data. 3. Think about where you will get data to carry out research in this topic e.g.:
What is Big Data Analytics? Full Guide and Examples
Big data analytics examines and analyzes large and complex data sets known as "big data.". Through this analysis, you can uncover valuable insights, patterns, and trends to make more informed decisions. It uses several techniques, tools, and technologies to process, manage, and examine meaningful information from massive datasets.
5 Data Analytics Projects for Beginners
These data analytics project ideas reflect the tasks often fundamental to many data analyst roles. 1. Web scraping. While you'll find no shortage of excellent (and free) public data sets on the internet, you might want to show prospective employers that you're able to find and scrape your own data as well.
Data Analytics Case Study Guide (Updated for 2024)
Step 1: With Data Analytics Case Studies, Start by Making Assumptions. Hint: Start by making assumptions and thinking out loud. With this question, focus on coming up with a metric to support the hypothesis. If the question is unclear or if you think you need more information, be sure to ask.
Free Data Sets & Dataset Samples
Data Playground. Explore and download sample datasets hand-picked by Maven instructors. Practice applying your data analysis and visualization skills to real-world data, from flight delays and movie ratings to shark attacks and UFO sightings. Explore data sets.
PDF Assignment 2: Sampling & Data Analysis (Group Assignment) Objectives
Assignment 2, FYC 6800 - Page 1 . Assignment 2: Sampling & Data Analysis (Group Assignment) Objectives: After completing this assignment, you will be able to • Explain the sampling and data analysis procedures used in research reports • Determine whether the researcher wants to generalize his/her specific findingsand/or
(PDF) Practical Data Analysis: An Example
18 2 Practical Data Analysis: An Example. Fig. 2.1 A histogram for the distribution of the value of attribute age using 8 bins. Fig. 2.2 A histogram for the distribution of the value of attribute ...
What Is Data Analysis? (With Examples)
What Is Data Analysis? (With Examples) Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock Holme's proclaims ...
Free Public Data Sets For Analysis
Public data sets are ideal resources to tap into to create data visualizations. With the information provided below, you can explore a number of free, accessible data sets and begin to create your own analyses. The following COVID-19 data visualization is representative of the the types of visualizations that can be created using free public ...
Vaishnavi-N24/Data-Analyst-Assignments
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window.
Google Data Analytics Professional Certificate
Professional Certificate - 8 course series. Prepare for a new career in the high-growth field of data analytics, no experience or degree required. Get professional training designed by Google and have the opportunity to connect with top employers. There are 483,000 open jobs in data analytics with a median entry-level salary of $92,000.¹.
BUSI 650
Individual Assignment business analytics tableau 12. Practical 100% (8) 6. Busi 650 - Assignment 1. Mandatory assignments 80% (5) 18. Individual Assignment. Mandatory assignments 82% (11) 14.
Welcome to the Purdue Online Writing Lab
Mission. The Purdue On-Campus Writing Lab and Purdue Online Writing Lab assist clients in their development as writers—no matter what their skill level—with on-campus consultations, online participation, and community engagement. The Purdue Writing Lab serves the Purdue, West Lafayette, campus and coordinates with local literacy initiatives.