If there is one prayer that you should pray/sing every day and every hour, it is the LORD's prayer (Our FATHER in Heaven prayer)
It is the most powerful prayer. A pure heart, a clean mind, and a clear conscience is necessary for it.
- Samuel Dominic Chukwuemeka

For in GOD we live, and move, and have our being. - Acts 17:28

The Joy of a Teacher is the Success of his Students. - Samuel Chukwuemeka



Descriptive Statistics Project

Samuel Dominic Chukwuemeka (SamDom For Peace)
Data Analysis is a part of Statistics.
Data Analysis is also a part of Computer Science.
Hence, it is expected that any Statistics and/or Computer Science student should know the descriptive statistics/analysis of data.
This project is designed to measure your knowledge and understanding of data analysis by describing a raw data/datset, presenting the data, calculating the descriptive statistics of the data, and interpreting your results.

General Project Requirements

(1.) This is an individual project. It is not a group project.
Students may work together. However, each student must submit the individual project.


(2.) Each student will work with real-world raw data of a variable.
Please identify the variable and the type of variable.
Specify the year when the raw data was obtained.
The sample size of the real-world data should be at least ten ($10$)
All information used for this project should be verifiable on the direct website of the company/organization/government.
Textbook examples/exercises are NOT allowed.
You can browse/search for real-world data/datasets from the United States Government website on Datasets
(https://catalog.data.gov/dataset)
You may also use any other real-world dataset of raw data that you find from any company or any academic institution that gives you the right to use their raw dataset.
You may also conduct your own research and collect raw data.
You have probably covered Data Collection in your previous class.
If you decide to conduct your own research, please email me for pre-approval.
If you cannot find any raw data after trying the aforementioned three ways, please let me know.
I can use some of the Blackboard Collaborate Live Sessions to teach you how to serach for data.


(3.) Write the complete address of the direct page of the website where you found the data.
*If the "direct" web address is too long, please shorten it by pasting the "complete web address" into www.tinyurl.com
* *This is only for traditional students (onsite) students*
*Generate a short address and write that address as is
For online students, please copy and paste the link as is
Please set the link to open in a new window.


(4.) I understand some of you do not want to type directly in the Blackboard editor using the Math Editor.
You may:
(a.) Show all your work on paper, write all math terms appropriately, take clear screenshots of your entire work, and insert those screenshots as images directly on the Blackboard editor. OR
(b.) Show all your work in Microsoft Word, write all math terms appropriately, take clear screenshots of your entire work, and insert those screenshots as images directly on the Blackboard editor.
Please DO NOT submit any attachment on Blackboard. It will not be clicked. It will not be opened. It will not be graded.


(5.) If you need feedback on your project before submission, please submit your project as a draft in the Project Drafts forum on Blackboard.
You may also send it to me as an attachment via email. I shall review and provide feedback.
Draft projects are to be submitted in the Project Drafts forum. Draft projects are not graded.
Actual projects that needs to be graded should be submitted in the actual Project forum.


(6.) Research Skills: Cite your source properly. Use APA, MLA, or Chicago Manual of Style. Indicate the style you used.


(7.) Writing Skills: Write or type the "raw" data entirely.


(8.) Mathematical Skills (For Only Statistics Students):
(a.) Data Presentation: Use an appropriate data presentation tool to present your data. You can use any appropriate tool in Pearson Statcrunch or Microsoft Excel among others.
Describe your data based on the presentation (skewness, etc.)

Mathematical Skills (For Statistics and Computer Science/Information Technology Students):
(b.) Measures of Center: Calculate the mean, median, mode, and midrange of the data.
Interpret your results with reference to the dataset.

(c.) Measures of Spread: Calculate the range, variance, and standard deviation of the data.
Interpret your results with reference to the dataset.

(d.) Measures of Position: Calculate the five-number summary of the data.
Interpret your results with reference to the dataset.

I shall use my calculator to check your work and your answers.
Please ensure you get the same results as the results on my calculators.


(9.) Programming Skills (For Only Computer Science/Information Technology Students):
(a.) You may use the raw data as an array or an ArrayList or a Vector or as a file as applicable.

Develop a program that computes the:
(b.) Measures of Center: Calculate the mean, median, mode, and midrange of the data.
Interpret your results with reference to the dataset.

(c.) Measures of Spread: Calculate the range, variance, and standard deviation of the data.
Interpret your results with reference to the dataset.

(d.) Measures of Position: Calculate the five-number summary of the data.
Interpret your results with reference to the dataset.

I shall use my calculators to check your work and your answers.
Please ensure you get the same results as the results on my calculators.


(e.) Write comments accordingly.

(f.) Upload all project files (the entire project folder) in the appropriate area in the Blackboard gradebook.

Please NOTE:
For Beginning C++, Beginning VB.BET, Beginning C#, Beginning Java, JavaScript, ASP.NET: you may use Functional Programming and/or Object-oriented Programming
For Advanced C++, Advanced VB.NET, Advanced C#, Advanced Java: Object-oriented Programming is required.

(g.) Submit a Reflection Journal. Include your challenges, and how you overcame those challenges.
Please review the rubric for the criteria to be assessed.


(10.) All work must be turned in by the final due date to receive credit.
Any work beyond the final due date will not be accepted.
It is highly recommended that you test your calculations and program with my calculators
Horizontal Data Entry: Descriptive Statistics Calculators
and/or
Vertical Data Entry: Descriptive Statistics Calculators

Example Guide (Descriptive Statistics of Data)

Name: Your name
Date: The date
Instructor: Samuel Chukwuemeka
Project: Descriptive Statistics of the United States Manufactured Housing Shipments $2019$ Data for the $50$ States
Company/Government: Census.gov
(https://www2.census.gov/programs-surveys/mhs/visualizations/2019/2019usmapbystate.pdf?#)
Objectives: (1.) Present a dataset using a histogram.
(2.) Describe the dataset.
(3.) Determine the measures of central tendency of a dataset.
(4.) Determine the measures of variation of a dataset.
(5.) Determine the measures of location of a dataset.
Variable/Type: Manufactured Housing Shipments / Discrete variable
Year Obtained: $2019$
Citation: Indicate the type of citation format. Cite your source accordingly.

Please Note:
This is part of the Reflection (not the entire Reflection) of the Final Project.
Please review the Final Project Reflection samples provided for you in your course.
The teacher should guide each student to the successful completion of the project.
Let students know you are willing to help.

U.S Manufactured Housing Shipments $2019$ Data for the $50$ States (Raw Data)
$1331$ $1566$ $3890$ $553$ $810$ $314$ $143$ $342$ $2402$ $885$
$1406$ $298$ $238$ $261$ $860$ $1981$ $15866$ $847$ $581$ $1291$
$1565$ $4360$ $657$ $1313$ $4203$ $2180$ $2792$ $2716$ $3478$ $4546$
$1828$ $1074$ $1101$ $4871$ $4079$ $3649$ $7819$ $1862$ $1610$ $144$
$394$ $635$ $190$ $26$ $100$ $596$ $345$ $128$ $89$ $14$


Because we have to find the median and the five-number summary, it is better to sort the data.
Because we have five columns and ten rows, I prefer to sort by columns (the smaller number: $5 \lt 10$; because I shall read the data that way)
It is my preference. Please do what you prefer.
The sorted data is:
U.S Manufactured Housing Shipments $2019$ Data for the $50$ States (Sorted Data)
$14$ $143$ $298$ $553$ $810$ $1101$ $1565$ $1981$ $3478$ $4360$
$26$ $144$ $314$ $581$ $847$ $1291$ $1566$ $2180$ $3649$ $4546$
$89$ $190$ $342$ $596$ $860$ $1313$ $1610$ $2402$ $3890$ $4871$
$100$ $238$ $345$ $635$ $885$ $1331$ $1828$ $2716$ $4079$ $7819$
$128$ $261$ $394$ $657$ $1074$ $1406$ $1862$ $2792$ $4203$ $15866$

Measures of Center (Mean, Median, Mode, Midrange)


To make it easier,
Draw the table with the class intervals and tally first
Begin from the first column (not the first row because it is easier to deal with the data values with the way it is arranged in columns)
Place tallies as you read the data values from top to bottom
Then, draw the frequency column and add the frequencies
Check to make sure the sum of the frequencies is $50$

$2019$ U.S Manufactured Housing Shipments $(x)$ Tally Frequency $(f)$ $f * x$
$14$ I $1$ $14$
$26$ I $1$ $26$
$89$ I $1$ $89$
$100$ I $1$ $100$
$128$ I $1$ $128$
$143$ I $1$ $143$
$144$ I $1$ $144$
$190$ I $1$ $190$
$238$ I $1$ $238$
$261$ I $1$ $261$
$298$ I $1$ $298$
$314$ I $1$ $314$
$342$ I $1$ $342$
$345$ I $1$ $345$
$394$ I $1$ $394$
$553$ I $1$ $553$
$581$ I $1$ $581$
$596$ I $1$ $596$
$635$ I $1$ $635$
$657$ I $1$ $657$
$810$ I $1$ $810$
$847$ I $1$ $847$
$860$ I $1$ $860$
$885$ I $1$ $885$
$1074$ I $1$ $1074$
$1101$ I $1$ $1101$
$1291$ I $1$ $1291$
$1313$ I $1$ $1313$
$1331$ I $1$ $1331$
$1406$ I $1$ $1406$
$1565$ I $1$ $1565$
$1566$ I $1$ $1566$
$1610$ I $1$ $1610$
$1828$ I $1$ $1828$
$1862$ I $1$ $1862$
$1981$ I $1$ $1981$
$2180$ I $1$ $2180$
$2402$ I $1$ $2402$
$2716$ I $1$ $2716$
$2792$ I $1$ $2792$
$3478$ I $1$ $3478$
$3649$ I $1$ $3649$
$3890$ I $1$ $3890$
$4079$ I $1$ $4079$
$4203$ I $1$ $4203$
$4360$ I $1$ $4546$
$4871$ I $1$ $4871$
$7819$ I $1$ $7819$
$15866$ I $1$ $15866$
$\Sigma f = 50$ $\Sigma fx = 94229$

$ \underline{Mean} \\[3ex] (ii)\:\: \bar{x} = \dfrac{\Sigma fx}{\Sigma f} \\[5ex] \bar{x} = \dfrac{94229}{50} \\[3ex] \bar{x} = 1884.58 \\[3ex] $ The average number of manufactured housing shipments in the United States in $2019$ is approximately one thousand, eight hundred and eighty five shipments.

$ \underline{Median} \\[3ex] \Sigma f = 50 \\[3ex] An\:\:even\:\:sample\:\:size\:\:means\:\:two\:\:middle\:\:numbers \\[3ex] They\:\:are\:\:the\:\:25th\:\:and\:\:26th\:\:numbers \\[3ex] 25th\:\:data\:\:value = 1074 \\[3ex] 26th\:\:data\:\:value = 1101 \\[3ex] Median = \dfrac{1074 + 1101}{2} = \dfrac{2175}{2} \\[5ex] Median = 1087.5 \\[3ex] $ The median number of manufactured housing shipments in the United States in $2019$ is approximately one thousand and eighty eight shipments.

$ \underline{Mode} \\[3ex] Data\:\:has\:\:no\:\:mode \\[3ex] \underline{Midrange} \\[3ex] Midrange = \dfrac{min + max}{2} \\[5ex] min = 14...Hawaii \\[3ex] max = 15866...Texas \\[3ex] Midrange = \dfrac{14 + 158660}{2} = \dfrac{15880}{2} \\[5ex] Midrange = 7940 \\[3ex] $ Hawaii had the least number of manufactured housing shipments in $2019$ (what are the possible reasons?): $14$
Texas had the most number of manufactured housing shipments in $2019$ (what are the possible reasons?): $15,866$
The midrange number of manufactured housing shipments in the United States in $2019$ is seven thousand, nine hundred and forty shipments.

Measures of Dispersion (Range, Variance, Standard Deviation)




Final Project Samples

First Sample


Second Sample