|
Student-generated Problem Sets
Problem by Theresa
Ok, you just won the lottery - $20 million - and
are trying to decide where to go on vacation to celebrate (yes, you've
already decided to take a leave from school/work for an extended period
of time).
1. First compute the average temperature
of the
following places.
2. Eliminate all places higher than 90 degrees
and lower
than 70 degrees.
3. Recode temperatures into four groups:
Low 70's, High
70's, Low 80's,
High 80's
4. Combine the island temperatures.
Compute average
mean. Compute
non-island temperatures. Compute
average mean.
5. After eyeballing the two average means,
eliminate the
places in the combined
group with the highest
average \mean.
Sort remaining places by temperature.
6. Print out remaining places, with a title
stating
"Vacation Options"
Hawaii,72,75,78,76,80
Bahamas,90,85,92,82,87
Rome,60,62,68,70,69
Greece,70,69,68,65,60
Paris,60,59,62,66,72
Australia,72,78,77,79,82
Argentina,90,89,92,98,95
Catalina Island,78,77,75,80,80
Cancun,88,85,84,82,85
Possible Solution
Problem by Kathleen
Here is my example. It isn't cute or funny
but the data is
kinda ugly. It is
the lab grades from a class. There are a
series of grades which are experiments
1-13 and a total
then six personel labs which are added
together with
final look at the lab book as a Portfolio total
then
a grade for a lab practical and then the final
grade for lab
part of the course.
There ia missing data, incomplete
data, etc,. Typical grades where
some students never
pass anything
in.
How would you clean up the data to do some stats
on it
and what stats would
be appropriate?
LAST 4 ID
NAME E1 E2
E3 E4 E5
E6 E7
E8 E9
E10 E11 E12
E13 TOTAL EXP
PL2
PL3 PL4
PL6 PL8 PL9
PORTFOL LAB FINAL TOTAL
3341 Erin
6.5 8 9
10 10 8
9 9
10 10
8.5 10 10
115 9 8.5
8.5
8.5 8.5
8.5 74.5 14
203.5
1680 Tracy 6
7 10
10 10 9
9 10
10 10
8.5 10 10
116.5 9 8.5
9
8.5 8.5
8.5 75 14
205.5
1433 Anthony 7.5
8 9
9 10
10 9
10 10
0 10
0 90.5 9.5
10 9.5
9.5 9.5
10 64 16
170.5
O764 Michael
0
0
6924 Brian 7
7 8
9 9
8 9
8
10 10
8.5 10 10
112 9 9
9 8
8.5 8.5
78 16 206
3275 Katherine
6 7
8 9
9 8
9
8 10
10 8.5 10
10 111 9
9 9
8 8.5
8.5 78 14
203
1475 Sabrina 7.5
7 9
9 9
8 9
9
10 10
8 10
10 113.5 8.5
9 9
9
8.5 8
75 14 202.5
7581 Jeanah 8
9 10
9 9
10 10 10
10 10
10 10 10
123.5 10 9
9.5 10
9.5 10
88 16 227.5
4274 Emma
6.5 7 8
9 10
40
40
7667 Thomas 6.5
9 9
9 9
8 9
8
10 10
9.5 10 10
115.5 9.5 10
0
9.5 9
0 70
15 200.5
6298 Alisa
0
0
Barbara
10 10 10
10 10
10 10 10
79 8.5 10
9.5 10
10 10
96 20 195
total
6.8 8 9
9 9
9 9
8
10 10
7.9 10 9
112.1875
75.313 14.875
86%
Possible Solution
Problem by Willy
There is some evidence that certain types of people
respond to survey items in particular
patterns regardless
of the item content. For example one might
answer in a
generally neutral manner or in a generally extreme
manner (everything is strongly agree or strongly
disagree). With this in mind, your task is to
utilize the oras data from earlier this semester and identify the subjects
who's pattern is more generally neutral.
Create a new variable which indicates the number
of items the person responded to with a "3" (neutral response).
Then, for those who responded to more than
15 items with a "3" give them a label of "neutral responder."
Your goal here is just to identify those who
are neutral responders, if you want to use this information to then delete
people or manipulate the data further, proceed at your own risk and don't
blame me for any resulting desires to bang your cranium against your monitor.
Possible Solution
Problem by Joan
This data file is from a survey filled out last
summer by
some teachers working
on a technology project. The
variables are: id
location: 0=Phoenix area, 1=Tucson,
2=Flagstaff, 3=Yuma, 4=Kingman, 5=other
sex: 0=male,
1=female subject
taught:1=elementary, 2=special,
3=science/math, 4=social studies/language,
5=library/media, 6-8=others years
of teaching experience
whether or not they had email access at the time
of the
survey frequency
of email use: 0=never ... 4=daily
frequency of Web use: 0=never ... 4=daily
-
Find the average number of years of teaching experience
for men and women separately.
-
Sort participants by geographic and subject area.
-
Did people in the Phoenix
area use email and the web more or less than those in outlying
areas?
-
Did teachers in some subject areas use this technology
more than those in other areas?
Here are the first few of the 180 participants:
1,1,1,8,0,0,0
0,0,1,7,1,4,4
0,1,1,8,0,3,1
0,0,3,3,1,1,1
2,1,6,18,1,2,2
2,1,1,0,0,3,3
0,1,1,2,0,0,1
0,1,1,17,0,0,1
1,1,4,14,0,0,1
0,1,1,0,0,3,0
1,1,4,8,0,0,0
0,1,1,32,0,2,2
1,1,2,16,0,1,0
1,1,1,24,1,0,0
3,1,1,19,0,4,4
5,1,1,25,0,3,3
4,1,1,7,1,3,3
5,1,1,3,0,0,1
Possible Solution
Problem by Nancy
You have the birthdays of all incoming freshmen.
1. You want to throw a party but you only
want to invite
those students
of legal drinking age (21). Calculate
the age of each student
(today) as of the student's last
birthday. Calculate
the age of each student on the
date of your orientation
party last fall (September 1,
1997).
2. You believe that students born on Wednesdays
are
more likely to
attend graduate school (because you
remember the poem "Wednesday's
child is full of
woe..."). In anticipation
of one day (not today)
performing this
fascinating study, you want to begin
by determining on what
day of the week each student
was born.
Write a program to create a new file that includes
each student's current age, age at the fall
party and the day of
the week on which the student
was born.
Sample data:
Jane, 1-1-77
Bill, 2-6-75
Tim, 5-5-80
Maria, 2-3-79
Portia, 12-7-74
Buddy, 12-27-81
Possible Solution
Problem by Jessie
Here is a small dataset from a class I teach.
The variables are name gender age frstgen
Chicano gpa mce pass
absncs act
Heather,f,26,n,n,3.65,y,y,5,27
Lisa,f,19,y,y,3.2,y,y,3,24
Anthony,28,y,y,1.4,y,y,4,19
Marcia,18,y,y,1.5,y,y,5,18
Jose,28,n,y,0.0,n,y,,13
Tom,30,n,y,3.9,y,n,3,30
Jill,23,y,y,n,3.2,y,y,2,29
Jim,22,y,y,y,2.3,y,n,1,24
Eli,f,18,y,y,2.l,y,y,3,29
Possible Solution
Problem by Chuck
Tired of dealing with all this unrealistic data
like test scores, attitudes, and SES variables?
I am providing
indescribably important data
with far-reaching
implications. It comes from CBS Sportsline,
and contains
1998 NBA by-team offensive statistics for teams
of the
Pacific Division.
Oh this is real, Baby! So, get your beer,
unbutton that top button
on your pants, scrawl back in
your Lazy-Boy (yeah, the one in front
of your computer),
and let's bust some data...The small data set
follows,
along with some variable names and labels:
LA Lakers 75 2864-5976
457-1303 1717-2511 105.4
LA Lakers
969 2241 3210 1830 686
1150 503
Seattle 75
2792-5882 566-1411 1427-1963 101.0
Seattle
844 2047 2891 1828 726
1068 356
Phoenix 74
2814-6087 376-1088 1291-1731 98.6
Phoenix
922 2215 3137 1881 684
1119 383
LA Clippers 75 2686-6120
483-1338 1371-1903 96.3
LA Clippers 962
2082 3044 1384 574 1216
413
Portland 74
2595-5755 292-947 1505-2054 94.4
Portland
984 2273 3257 1576 514
1263 422
Sacramento 75 2706-6112
258-736 1345-1944 93.5
Sacramento 999
2094 3093 1689 547 1143
388
Golden State 75 2588-6321 169-639
1231-1741 87.7
Golden State 1198 2266
3464 1532 566 1266 428
OFFENSIVE STATISTICS (containing 2 records per
team).
The first line consists of the variables:
TEAM
TEAMS OF THE PACIFIC DIVISION
GAMES
NUMBER OF GAMES PLAYED
FGM-FGA
FIELD GOALS MADE - FIELD GOALS ATTEMPTED
TPM-TPA
3 POINT FIELD GOALS MADE - 3 PONT FIELD GOALS
ATTEMPTED
FTM-FTA
FREE THROWS MADE - FREE THROWS ATTEMPTED
GAMEAVG
TEAM AVERAGE POINTS SCORED PER GAME
The second line contains the following variables:
TEAM
TEAMS OF THE PACIFIC DIVISION
OFF
OFFESNSIVE REBOUNDS
DEF
DEFENSIVE REBOUNDS
TRB
TOTAL REBOUNDS
AST
ASSISTS
STL
STEALS
TO
TURNOVERS
BLKS
BLOCKS
Now, here's the question:
1. Bring the data set into SAS and input
the variables.
You can look
at the structure above to see a couple of
problems ('-' within
variables, 2 records, etc.).
2. Sort the data by team.
3. Separate fgm-fga, tpm-tpa, and ftm-fta
each into 2
variables. For
example, fgm-fga will become 2
variables, 'fgm' and
'fga', and so on. The new variables
must be numeric (omitting
the '-').
4. Get the average number of total rebounds,
offensive
rebounds, and
defensive rebounds per game for each
team. New variables
can be created for this. Round
the output to 2 decimal places.
5. Now get the percentage made for field
goals (fgm), 3
point shots (tpm),
and free throws (ftm). Multiply the
decimal by 100 so the
outputted number is expressed
in a percentage. Round
each to 2 decimals.
6. Finally, create a new variable that labels
'Phoenix' as
"greatness",
while all other teams carry the label 'they
suck!'.
7. Print the data set showing all changes
and include an
appropriate title.
LA Lakers 75
2864-5976 457-1303 1717-2511 105.4
LA Lakers
969 2241 3210 1830 686
1150 503
Seattle
75 2792-5882 566-1411 1427-1963 101.0
Seattle
844 2047 2891 1828 726
1068 356
Phoenix
74 2814-6087 376-1088 1291-1731 98.6
Phoenix
922 2215 3137 1881 684
1119 383
LA Clippers 75 2686-6120
483-1338 1371-1903 96.3
LA Clippers 962
2082 3044 1384 574 1216
413
Portland 74
2595-5755 292-947 1505-2054 94.4
Portland
984 2273 3257 1576 514
1263 422
Sacramento 75 2706-6112
258-736 1345-1944 93.5
Sacramento
999 2094 3093 1689 547
1143 388
Golden State 75 2588-6321
169-639 1231-1741 87.7
Golden State 1198 2266
3464 1532 566 1266 428
Possible Solution
Problem by Rodney
Here is a "real life" problem I encountered while
completing a class project for a statistics course I took last semester.
I will spare you the gory and boring details with respect to the purpose
of the consulting project I did for the class and get right to the
point.
The project used historical data contained in
the ASU data warehouse---I wanted high school math information about students
enrolled in introductory (MAT 106) and intermediate algebra (MAT 117) courses
at ASU. An SQL (Structured Query Language) script was written to
create the data set. This script produced a data set containing 9
variables (see below for variable description).
The data was messy and needed additional work
before it could be used. Several "repairs" were necessary. Specifically,
student data pertaining to high school academic performance was reported
across multiple records. For example, a student enrolled in MAT 106
may have taken three math courses while in high school; thus would be three
lines of information in the data set (one line for each high school math
course taken) for this one student.
For the purposes of analysis, a single grade representing
the students' for high school math GPA and a binary flag (coded either
0 or 1) for any high school math course that meets ASU's undergraduate
entrance was needed for each student.
Below is a list of the things needed to make the
data usable.
/****************Cleaning the Data***************/
1) Delete all records containing an Asu Converted
grade of "I", "W","X", or "P".
2) Delete all records containing Course Id of
"MAT 114" (recall, we want only students taking MAT 106 and MAT 117).
/*************Construct the GPA*****************/
1) Create an numeric variable where each ASU
Converted Grade as follows:
A,A+,A- = 4.0; B+,B,B- = 3; C+,C,C- = 2 . . .
F,E,U = 0.
2) For each record multiply the numeric ASU converted
grade (created above) by the class hours (by doing this each grade becomes
weighted based on class--thus a math course worth 1 class hour has greater
weight than a math course worg only .5 class hours).
3)For each student (using the unique record identifier
Affiliate Id) add the weighted scores for each math course together (created
in step 2, above) and divide by the sum of the class hours---this value
represents the overall high school math GPA for the student.
/**************Make a Binary Variable************/
Create a binary flag variable where any math course
with an "M" for the variable Class Competency is coded as 1 and anything
else is coded as 0.
/****************Make a New Dataset***************/
Finally, output the data to an external file so
that you have one record per student containing the variables Semester,
Affiliate ID, Course ID, Schedule Line number, newly created highschool
GPA, and the binary flag for Class Competency
For those who want to take a look at the data
set you can get a copy of the file from my AFS Instructor volume by using
the following UNIX command:
cp afs/asu.edu/windows/iv/pos598/data/19968hs.csv
19968.csv
Semester, semester student taking course
at ASU
Affiliate Id, student id number
Course Id, math course number
Schedule Line Number
Class Title, high school course name
Class Hours, credit hours earned for high school
math class
Class Grade, grade earned for high school class
Asu Converted Grade, ASU conversion of high school
grade
Class Competency Met Code, high school math course
meets ASU requirements.
Possible Solution
Problem by Debbie
The data set below represents the applicant pool
for Faber University's 1999 Freshman class.
1,3.5,2.9,1030,21,2.0,3.7,1
2,2.4,2.8,1200,29,2.0,3.0,1
3,3.0,1.9,880,15,1.0,2.0,4
4,2.9,3.0,990,17,2.0,3.0,2
5,2.7,2.9,990,17,2.5,2.8,4
6,2.9,2.9,.,17,3.0,2.8,3
7,2.8,2.9,990,16,2.6,2.4,2
8,2.7,2.8,990,17,2.0,3.0,4
9,2.8,2.8,900,18,2.7,3.1,2
10,2.4,2.6,750,18,2.9,2.0,4
11,2.9,2.9,1000,17,2.0,3.2,3
12,2.4,2.0,1050,18,2.8,2.5,4
13,2.0,2.0,1100,19,2.0,2.0,2
14,2.7,2.8,1000,17,2.0,3.5,3
15,2.9,2.9,1030,19,2.0,3.7,3
Variables are respectively (a) ID; (b) overall
gpa; (c) academic gpa; (d) Combined SAT score; (e)Composite ACT score;
(f) math gpa; (g) English gpa; and (h) admission status variable wherein1=admit,
2= case review, 3=wait list, and 4=deny.
Write a program to:
-
Calculate average overall gpa,academic gpa,
Combined SAT scores, Composite ACT scores, math gpa, and English gpa for
applicant pool first, then by admission status variable.
-
15% of the applicant pool can be admitted without
meeting admission criteria. Create a formula to select the 15% from
the not admitted pool including deny. Create a new admission variable
called SpecAdm.
-
Repeat the second part of bullet one (Calculate the
variable averages by admission status variable and the new admission variable.)
-
Proc print to print out the list of Special Admits
sorted by academic gpa.
Possible
Solution
|