Student-generated Problem Sets

Problem by Theresa

Ok, you just won the lottery - $20 million - and are trying to decide where to go on vacation to celebrate (yes, you've already decided to take a leave from school/work for an extended period of time).

1. First compute the average temperature of the
following places.

2. Eliminate all places higher than 90 degrees and lower
than 70 degrees.

3. Recode temperatures into four groups: Low 70's, High
     70's, Low 80's, High 80's

4. Combine the island temperatures. Compute average
      mean. Compute non-island temperatures. Compute
      average mean.

5. After eyeballing the two average means, eliminate the
     places in the combined group with the highest
    average \mean. Sort remaining places by temperature.

6. Print out remaining places, with a title stating
"Vacation Options"

Hawaii,72,75,78,76,80

Bahamas,90,85,92,82,87

Rome,60,62,68,70,69

Greece,70,69,68,65,60

Paris,60,59,62,66,72

Australia,72,78,77,79,82

Argentina,90,89,92,98,95

Catalina Island,78,77,75,80,80

Cancun,88,85,84,82,85

Possible Solution

Problem by Kathleen

Here is my example. It isn't cute or funny but the data is
kinda ugly. It is the lab grades from a class. There are a
series of grades which are experiments 1-13 and a total
then six personel labs which are added together with
final look at the lab book as a Portfolio total then
a grade for a lab practical and then the final grade for lab
part of the course. There ia missing data, incomplete
data, etc,. Typical grades where some students never
pass anything in.

How would you clean up the data to do some stats on it
and what stats would be appropriate?

LAST 4 ID       NAME    E1      E2      E3      E4      E5      E6      E7
E8      E9      E10     E11     E12     E13     TOTAL EXP       PL2
PL3     PL4     PL6     PL8     PL9     PORTFOL LAB FINAL       TOTAL
3341    Erin    6.5     8       9       10      10      8       9       9
10      10      8.5     10      10      115     9       8.5     8.5
8.5     8.5     8.5     74.5    14      203.5
1680    Tracy   6       7       10      10      10      9       9       10
10      10      8.5     10      10      116.5   9       8.5     9
8.5     8.5     8.5     75      14      205.5
1433    Anthony 7.5     8       9       9       10      10      9
10      10      0       10      0       90.5    9.5     10      9.5
9.5     9.5     10      64      16      170.5
O764    Michael
0                                                                       0
6924    Brian   7       7       8       9       9       8       9       8
10      10      8.5     10      10      112     9       9       9       8
8.5     8.5     78      16      206
3275    Katherine       6       7       8       9       9       8       9
8       10      10      8.5     10      10      111     9       9       9
8       8.5     8.5     78      14      203
1475    Sabrina 7.5     7       9       9       9       8       9       9
10      10      8       10      10      113.5   8.5     9       9       9
8.5     8       75      14      202.5
7581    Jeanah 8       9       10      9       9       10      10      10
10      10      10      10      10      123.5   10      9       9.5     10
9.5     10      88      16      227.5
4274    Emma    6.5     7       8       9       10
40                                                                      40
7667    Thomas 6.5     9       9       9       9       8       9       8
10      10      9.5     10      10      115.5   9.5     10      0
9.5     9       0       70      15      200.5
6298    Alisa
0                                                                       0
        Barbara                                         10      10      10
10      10      10      10      10      79      8.5     10      9.5     10
10      10      96      20      195
total           6.8     8       9       9       9       9       9       8
10      10      7.9     10      9       112.1875
75.313 14.875
86%

Possible Solution

Problem by Willy

There is some evidence that certain types of people
respond to survey items in particular patterns regardless
of the item content. For example one might answer in a
generally neutral manner or in a generally extreme
manner (everything is strongly agree or strongly
disagree). With this in mind, your task is to utilize the oras data from earlier this semester and identify the subjects who's pattern is more generally neutral.

Create a new variable which indicates the number of items the person responded to with a "3" (neutral response).

Then, for those who responded to more than 15 items with a "3" give them a label of "neutral responder."
Your goal here is just to identify those who are neutral responders, if you want to use this information to then delete people or manipulate the data further, proceed at your own risk and don't blame me for any resulting desires to bang your cranium against your monitor.

Possible Solution

Problem by Joan

This data file is from a survey filled out last summer by
some teachers working on a technology project. The
variables are: id location: 0=Phoenix area, 1=Tucson,
2=Flagstaff, 3=Yuma, 4=Kingman, 5=other sex: 0=male,
1=female subject taught:1=elementary, 2=special,
3=science/math, 4=social studies/language,
5=library/media, 6-8=others years of teaching experience
whether or not they had email access at the time of the
survey frequency of email use: 0=never ... 4=daily
frequency of Web use: 0=never ... 4=daily

Find the average number of years of teaching experience for men and women separately.
Sort participants by geographic and subject area.
Did people in the Phoenix area use email and the web more or less than those in outlying areas?
Did teachers in some subject areas use this technology more than those in other areas?

Here are the first few of the 180 participants:

1,1,1,8,0,0,0

0,0,1,7,1,4,4

0,1,1,8,0,3,1

0,0,3,3,1,1,1

2,1,6,18,1,2,2

2,1,1,0,0,3,3

0,1,1,2,0,0,1

0,1,1,17,0,0,1

1,1,4,14,0,0,1

0,1,1,0,0,3,0

1,1,4,8,0,0,0

0,1,1,32,0,2,2

1,1,2,16,0,1,0

1,1,1,24,1,0,0

3,1,1,19,0,4,4

5,1,1,25,0,3,3

4,1,1,7,1,3,3

5,1,1,3,0,0,1

Possible Solution

Problem by Nancy

You have the birthdays of all incoming freshmen.

1. You want to throw a party but you only want to invite
     those students of legal drinking age (21). Calculate
     the age of each student (today) as of the student's last
     birthday. Calculate the age of each student on the
     date of your orientation party last fall (September 1,
     1997).

2. You believe that students born on Wednesdays are
     more likely to attend graduate school (because you
     remember the poem "Wednesday's child is full of
     woe..."). In anticipation of one day (not today)
     performing this fascinating study, you want to begin
     by determining on what day of the week each student
     was born.

Write a program to create a new file that includes each student's current age, age at the fall party and the day of
the week on which the student was born.
Sample data:

Jane, 1-1-77

Bill, 2-6-75

Tim, 5-5-80

Maria, 2-3-79

Portia, 12-7-74

Buddy, 12-27-81

Possible Solution

Problem by Jessie

Here is a small dataset from a class I teach. The variables are name gender age frstgen Chicano gpa mce pass
absncs act

Heather,f,26,n,n,3.65,y,y,5,27

Lisa,f,19,y,y,3.2,y,y,3,24

Anthony,28,y,y,1.4,y,y,4,19

Marcia,18,y,y,1.5,y,y,5,18

Jose,28,n,y,0.0,n,y,,13

Tom,30,n,y,3.9,y,n,3,30

Jill,23,y,y,n,3.2,y,y,2,29

Jim,22,y,y,y,2.3,y,n,1,24

Eli,f,18,y,y,2.l,y,y,3,29

Possible Solution

Problem by Chuck

Tired of dealing with all this unrealistic data like test scores, attitudes, and SES variables? I am providing
indescribably important data with far-reaching
implications. It comes from CBS Sportsline, and contains
1998 NBA by-team offensive statistics for teams of the
Pacific Division. Oh this is real, Baby! So, get your beer,
unbutton that top button on your pants, scrawl back in
your Lazy-Boy (yeah, the one in front of your computer),
and let's bust some data...The small data set follows,
along with some variable names and labels:

LA Lakers     75 2864-5976 457-1303 1717-2511 105.4
LA Lakers       969 2241 3210   1830   686   1150   503
Seattle       75 2792-5882 566-1411 1427-1963 101.0
Seattle         844 2047 2891   1828   726   1068   356
Phoenix       74 2814-6087 376-1088 1291-1731 98.6
Phoenix         922 2215 3137   1881   684   1119   383
LA Clippers   75 2686-6120 483-1338 1371-1903 96.3
LA Clippers     962 2082 3044   1384   574   1216   413
Portland      74 2595-5755 292-947 1505-2054 94.4
Portland        984 2273 3257   1576   514   1263   422
Sacramento    75 2706-6112 258-736 1345-1944 93.5
Sacramento      999 2094 3093   1689   547   1143   388
Golden State 75 2588-6321 169-639 1231-1741 87.7
Golden State   1198 2266 3464   1532   566   1266   428

OFFENSIVE STATISTICS (containing 2 records per team).

The first line consists of the variables:

TEAM                   TEAMS OF THE PACIFIC DIVISION
GAMES                 NUMBER OF GAMES PLAYED
FGM-FGA             FIELD GOALS MADE - FIELD GOALS ATTEMPTED
TPM-TPA              3 POINT FIELD GOALS MADE - 3 PONT FIELD GOALS
ATTEMPTED
FTM-FTA              FREE THROWS MADE - FREE THROWS ATTEMPTED
GAMEAVG           TEAM AVERAGE POINTS SCORED PER GAME

The second line contains the following variables:

TEAM                   TEAMS OF THE PACIFIC DIVISION
OFF                       OFFESNSIVE REBOUNDS
DEF                       DEFENSIVE REBOUNDS
TRB                       TOTAL REBOUNDS
AST                       ASSISTS
STL                        STEALS
TO                         TURNOVERS
BLKS                    BLOCKS

Now, here's the question:

1. Bring the data set into SAS and input the variables.
     You can look at the structure above to see a couple of
     problems ('-' within variables, 2 records, etc.).

2. Sort the data by team.

3. Separate fgm-fga, tpm-tpa, and ftm-fta each into 2
     variables. For example, fgm-fga will become 2
     variables, 'fgm' and 'fga', and so on. The new variables
    must be numeric (omitting the '-').

4. Get the average number of total rebounds, offensive
     rebounds, and defensive rebounds per game for each
     team. New variables can be created for this. Round
    the output to 2 decimal places.

5. Now get the percentage made for field goals (fgm), 3
     point shots (tpm), and free throws (ftm). Multiply the
    decimal by 100 so the outputted number is expressed
    in a percentage. Round each to 2 decimals.

6. Finally, create a new variable that labels 'Phoenix' as
     "greatness", while all other teams carry the label 'they
     suck!'.

7. Print the data set showing all changes and include an
appropriate title.

LA Lakers 75 2864-5976 457-1303 1717-2511 105.4 LA Lakers 969 2241 3210 1830 686 1150 503 Seattle 75 2792-5882 566-1411 1427-1963 101.0 Seattle 844 2047 2891 1828 726 1068 356 Phoenix 74 2814-6087 376-1088 1291-1731 98.6 Phoenix 922 2215 3137 1881 684 1119 383 LA Clippers 75 2686-6120 483-1338 1371-1903 96.3 LA Clippers 962 2082 3044 1384 574 1216 413 Portland 74 2595-5755 292-947 1505-2054 94.4 Portland 984 2273 3257 1576 514 1263 422 Sacramento 75 2706-6112 258-736 1345-1944 93.5 Sacramento 999 2094 3093 1689 547 1143 388 Golden State 75 2588-6321 169-639 1231-1741 87.7 Golden State 1198 2266 3464 1532 566 1266 428 Possible Solution Problem by Rodney Here is a "real life" problem I encountered while completing a class project for a statistics course I took last semester. I will spare you the gory and boring details with respect to the purpose of the consulting project I did for the class and get right to the point.

The project used historical data contained in the ASU data warehouse---I wanted high school math information about students enrolled in introductory (MAT 106) and intermediate algebra (MAT 117) courses at ASU. An SQL (Structured Query Language) script was written to create the data set. This script produced a data set containing 9 variables (see below for variable description).

The data was messy and needed additional work before it could be used. Several "repairs" were necessary. Specifically, student data pertaining to high school academic performance was reported across multiple records. For example, a student enrolled in MAT 106 may have taken three math courses while in high school; thus would be three lines of information in the data set (one line for each high school math course taken) for this one student.

For the purposes of analysis, a single grade representing the students' for high school math GPA and a binary flag (coded either 0 or 1) for any high school math course that meets ASU's undergraduate entrance was needed for each student.

Below is a list of the things needed to make the data usable.

/****************Cleaning the Data***************/

1) Delete all records containing an Asu Converted grade of "I", "W","X", or "P".

2) Delete all records containing Course Id of "MAT 114" (recall, we want only students taking MAT 106 and MAT 117).

/*************Construct the GPA*****************/
1) Create an numeric variable where each ASU Converted Grade as follows:
A,A+,A- = 4.0; B+,B,B- = 3; C+,C,C- = 2 . . . F,E,U = 0.

2) For each record multiply the numeric ASU converted grade (created above) by the class hours (by doing this each grade becomes weighted based on class--thus a math course worth 1 class hour has greater weight than a math course worg only .5 class hours).

3)For each student (using the unique record identifier Affiliate Id) add the weighted scores for each math course together (created in step 2, above) and divide by the sum of the class hours---this value represents the overall high school math GPA for the student.

/**************Make a Binary Variable************/

Create a binary flag variable where any math course with an "M" for the variable Class Competency is coded as 1 and anything else is coded as 0.

/****************Make a New Dataset***************/

Finally, output the data to an external file so that you have one record per student containing the variables Semester, Affiliate ID, Course ID, Schedule Line number, newly created highschool GPA, and the binary flag for Class Competency

For those who want to take a look at the data set you can get a copy of the file from my AFS Instructor volume by using the following UNIX command:

cp afs/asu.edu/windows/iv/pos598/data/19968hs.csv 19968.csv

Semester, semester student taking course at ASU
Affiliate Id, student id number
Course Id, math course number
Schedule Line Number
Class Title, high school course name
Class Hours, credit hours earned for high school math class
Class Grade, grade earned for high school class
Asu Converted Grade, ASU conversion of high school grade
Class Competency Met Code, high school math course meets ASU requirements.

Possible Solution Problem by Debbie

The data set below represents the applicant pool for Faber University's 1999 Freshman class.

1,3.5,2.9,1030,21,2.0,3.7,1
2,2.4,2.8,1200,29,2.0,3.0,1
3,3.0,1.9,880,15,1.0,2.0,4
4,2.9,3.0,990,17,2.0,3.0,2
5,2.7,2.9,990,17,2.5,2.8,4
6,2.9,2.9,.,17,3.0,2.8,3
7,2.8,2.9,990,16,2.6,2.4,2
8,2.7,2.8,990,17,2.0,3.0,4
9,2.8,2.8,900,18,2.7,3.1,2
10,2.4,2.6,750,18,2.9,2.0,4
11,2.9,2.9,1000,17,2.0,3.2,3
12,2.4,2.0,1050,18,2.8,2.5,4
13,2.0,2.0,1100,19,2.0,2.0,2
14,2.7,2.8,1000,17,2.0,3.5,3
15,2.9,2.9,1030,19,2.0,3.7,3

Variables are respectively (a) ID; (b) overall gpa; (c) academic gpa; (d) Combined SAT score; (e)Composite ACT score; (f) math gpa; (g) English gpa; and (h) admission status variable wherein1=admit, 2= case review, 3=wait list, and 4=deny.

Write a program to:

Calculate average overall gpa,academic gpa, Combined SAT scores, Composite ACT scores, math gpa, and English gpa for applicant pool first, then by admission status variable.
15% of the applicant pool can be admitted without meeting admission criteria. Create a formula to select the 15% from the not admitted pool including deny. Create a new admission variable called SpecAdm.
Repeat the second part of bullet one (Calculate the variable averages by admission status variable and the new admission variable.)
Proc print to print out the list of Special Admits sorted by academic gpa.

Possible Solution