Student-generated Problem Sets 

Problem by Theresa 

Ok, you just won the lottery - $20 million - and are trying to decide where to go on vacation to celebrate (yes, you've already decided to take a leave from school/work for an extended period of time).  

1.  First compute the average temperature of the 
     following  places. 

2.  Eliminate all places higher than 90 degrees and lower  
     than 70 degrees. 

3.  Recode temperatures into four groups:  Low 70's, High  
     70's, Low 80's,  High 80's 
  
4.  Combine the island temperatures.  Compute average 
      mean.  Compute  non-island temperatures.  Compute  
      average mean.  

5.  After eyeballing the two average means, eliminate the  
     places in the combined group with the highest 
    average \mean.  Sort remaining places by  temperature. 

6.  Print out remaining places, with a title stating  
     "Vacation Options" 

    Hawaii,72,75,78,76,80 
    Bahamas,90,85,92,82,87 
    Rome,60,62,68,70,69 
    Greece,70,69,68,65,60 
    Paris,60,59,62,66,72 
    Australia,72,78,77,79,82 
    Argentina,90,89,92,98,95 
    Catalina Island,78,77,75,80,80 
    Cancun,88,85,84,82,85 
 
Possible Solution 

Problem by Kathleen 

Here is my example.  It isn't cute or funny but the data is 
kinda ugly. It is the lab grades from a class.  There are a  
series of grades which are experiments 1-13 and a total  
then six personel labs which are added together with  
final look at the lab book as a Portfolio total then  
a grade for a lab practical and then the final grade for lab  
part of the course.  There ia missing data, incomplete  
data, etc,. Typical grades where some students never 
pass  anything in. 

How would you clean up the data to do some stats on it  
and what stats would be appropriate? 

LAST 4 ID       NAME    E1      E2      E3      E4      E5      E6      E7 
E8      E9      E10     E11     E12     E13     TOTAL EXP       PL2 
PL3     PL4     PL6     PL8     PL9     PORTFOL LAB FINAL       TOTAL 
3341    Erin    6.5     8       9       10      10      8       9       9 
10      10      8.5     10      10      115     9       8.5     8.5 
8.5     8.5     8.5     74.5    14      203.5 
1680    Tracy   6       7       10      10      10      9       9       10 
10      10      8.5     10      10      116.5   9       8.5     9 
8.5     8.5     8.5     75      14      205.5 
1433    Anthony 7.5     8       9       9       10      10      9 
10      10      0       10      0       90.5    9.5     10      9.5 
9.5     9.5     10      64      16      170.5 
O764    Michael 
0                                                                       0 
6924    Brian   7       7       8       9       9       8       9       8 
10      10      8.5     10      10      112     9       9       9       8 
8.5     8.5     78      16      206 
3275    Katherine       6       7       8       9       9       8       9 
8       10      10      8.5     10      10      111     9       9       9 
8       8.5     8.5     78      14      203 
1475    Sabrina 7.5     7       9       9       9       8       9       9 
10      10      8       10      10      113.5   8.5     9       9       9 
8.5     8       75      14      202.5 
7581    Jeanah  8       9       10      9       9       10      10      10 
10      10      10      10      10      123.5   10      9       9.5     10 
9.5     10      88      16      227.5 
4274    Emma    6.5     7       8       9       10 
40                                                                      40 
7667    Thomas  6.5     9       9       9       9       8       9       8 
10      10      9.5     10      10      115.5   9.5     10      0 
9.5     9       0       70      15      200.5 
6298    Alisa 
0                                                                       0 
        Barbara                                         10      10      10 
10      10      10      10      10      79      8.5     10      9.5     10 
10      10      96      20      195 
total           6.8     8       9       9       9       9       9       8 
10      10      7.9     10      9       112.1875 
75.313  14.875   
86%  

Possible Solution

Problem by Willy 

There is some evidence that certain types of people  
respond to survey items in particular patterns regardless  
of the item content.  For example one might answer in a  
generally neutral manner or in a generally extreme  
manner (everything is strongly agree or strongly  
disagree). With this in mind, your task is to utilize the oras data from earlier this semester and identify the subjects who's pattern is more generally neutral. 

Create a new variable which indicates the number of items the person responded to with a "3" (neutral response). 

Then, for  those who responded to more than 15 items with a "3" give them a label of "neutral responder."  
Your goal here is just to identify those who are neutral responders, if you want to use this information to then delete people or manipulate the data further, proceed at your own risk and don't blame me for any resulting desires to bang your cranium against your monitor. 

Possible Solution 

Problem by Joan 

This data file is from a survey filled out last summer by  
some teachers  working on a technology project. The  
variables are: id location: 0=Phoenix area, 1=Tucson,  
2=Flagstaff, 3=Yuma, 4=Kingman, 5=other sex: 0=male,  
1=female subject taught:1=elementary, 2=special,  
3=science/math, 4=social  studies/language,  
5=library/media, 6-8=others years of teaching experience 
whether or not they had email access at the time of the  
survey frequency of email use: 0=never ... 4=daily 
frequency of Web use: 0=never ... 4=daily 
 

  • Find the average number of years of teaching experience for men and women  separately. 
  • Sort participants by geographic and subject area. 
  • Did people in the  Phoenix area use email and the web more or less than those in outlying  areas? 
  • Did teachers in some subject areas use this technology more than  those in other areas? 
Here are the first few of the 180 participants: 
 
      1,1,1,8,0,0,0 
      0,0,1,7,1,4,4 
      0,1,1,8,0,3,1 
      0,0,3,3,1,1,1 
      2,1,6,18,1,2,2 
      2,1,1,0,0,3,3 
      0,1,1,2,0,0,1 
      0,1,1,17,0,0,1 
      1,1,4,14,0,0,1 
      0,1,1,0,0,3,0 
      1,1,4,8,0,0,0 
      0,1,1,32,0,2,2 
      1,1,2,16,0,1,0 
      1,1,1,24,1,0,0 
      3,1,1,19,0,4,4 
      5,1,1,25,0,3,3 
      4,1,1,7,1,3,3 
      5,1,1,3,0,0,1 
Possible Solution 

Problem by Nancy 

You have the birthdays of all incoming freshmen.   

1.  You want to throw a party but you only want to invite 
     those students of legal drinking age (21).  Calculate  
     the age of each student (today) as of the student's last  
     birthday.  Calculate the age of each student on the  
     date of your orientation party last fall (September 1,  
     1997). 

2.  You believe that students born on Wednesdays are 
     more likely to attend graduate school (because you  
     remember the poem "Wednesday's child is full of  
     woe...").  In anticipation of one day (not today)  
     performing this fascinating study, you want to begin 
     by determining on what day of the week each student  
     was born. 

Write a program to create a new file that includes each student's current age, age at the fall party and the day of  
the week on which the student was born. 
Sample data: 

      Jane, 1-1-77 
      Bill, 2-6-75 
      Tim,  5-5-80 
      Maria, 2-3-79 
      Portia, 12-7-74 
      Buddy, 12-27-81
 
 Possible Solution 

Problem by Jessie 

Here is a small dataset from a class I teach. The variables are name gender age frstgen Chicano gpa mce pass  
absncs act 
 

      Heather,f,26,n,n,3.65,y,y,5,27 
      Lisa,f,19,y,y,3.2,y,y,3,24 
      Anthony,28,y,y,1.4,y,y,4,19 
      Marcia,18,y,y,1.5,y,y,5,18 
      Jose,28,n,y,0.0,n,y,,13 
      Tom,30,n,y,3.9,y,n,3,30 
      Jill,23,y,y,n,3.2,y,y,2,29 
      Jim,22,y,y,y,2.3,y,n,1,24 
      Eli,f,18,y,y,2.l,y,y,3,29
 
 Possible Solution 

Problem by Chuck 

Tired of dealing with all this unrealistic data like test scores, attitudes, and SES variables?  I am providing  
indescribably important data with far-reaching  
implications.  It comes from CBS Sportsline, and contains  
1998 NBA by-team offensive statistics for teams of the  
Pacific Division.  Oh this is real, Baby!  So, get your beer,  
unbutton that top button on your pants, scrawl back in  
your Lazy-Boy (yeah, the one in front of your computer),  
and let's bust some data...The small data set follows,  
along with some variable names and labels: 

LA Lakers     75  2864-5976  457-1303  1717-2511  105.4 
LA Lakers       969  2241  3210   1830   686   1150   503 
Seattle       75  2792-5882  566-1411  1427-1963  101.0 
Seattle         844  2047  2891   1828   726   1068   356 
Phoenix       74  2814-6087  376-1088  1291-1731  98.6 
Phoenix         922  2215  3137   1881   684   1119   383 
LA Clippers   75  2686-6120  483-1338  1371-1903  96.3 
LA Clippers     962  2082  3044   1384   574   1216   413 
Portland      74  2595-5755  292-947  1505-2054  94.4 
Portland        984  2273  3257   1576   514   1263   422 
Sacramento    75  2706-6112  258-736  1345-1944  93.5 
Sacramento      999  2094  3093   1689   547   1143   388 
Golden State  75  2588-6321  169-639  1231-1741  87.7 
Golden State   1198  2266  3464   1532   566   1266   428 

OFFENSIVE STATISTICS (containing 2 records per team). 

The first line consists of the variables: 

TEAM                   TEAMS OF THE PACIFIC DIVISION 
GAMES                 NUMBER OF GAMES PLAYED 
FGM-FGA             FIELD GOALS MADE - FIELD GOALS ATTEMPTED 
TPM-TPA              3 POINT FIELD GOALS MADE - 3 PONT FIELD GOALS 
ATTEMPTED 
FTM-FTA              FREE THROWS MADE - FREE THROWS ATTEMPTED 
GAMEAVG           TEAM AVERAGE POINTS SCORED PER GAME 

The second line contains the following variables: 

TEAM                   TEAMS OF THE PACIFIC DIVISION 
OFF                       OFFESNSIVE REBOUNDS 
DEF                       DEFENSIVE REBOUNDS 
TRB                       TOTAL REBOUNDS 
AST                       ASSISTS 
STL                        STEALS 
TO                         TURNOVERS 
BLKS                    BLOCKS 

Now, here's the question: 

1.  Bring the data set into SAS and input the variables.  
     You can look at the structure above to see a couple of  
     problems ('-' within variables, 2 records, etc.). 

2.  Sort the data by team. 

3.  Separate fgm-fga, tpm-tpa, and ftm-fta each into 2  
     variables.  For example, fgm-fga will become 2  
     variables, 'fgm' and 'fga', and so on. The new variables  
    must be numeric (omitting the '-'). 

4.  Get the average number of total rebounds, offensive  
     rebounds, and defensive rebounds per game for each  
     team.  New variables can be created for this.  Round  
    the output to 2 decimal places. 

5.  Now get the percentage made for field goals (fgm), 3  
     point shots (tpm), and free throws (ftm).  Multiply the  
    decimal by 100 so the outputted number is expressed  
    in a percentage.  Round each to 2 decimals. 

6.  Finally, create a new variable that labels 'Phoenix' as  
     "greatness", while all other teams carry the label 'they  
     suck!'. 

7.  Print the data set showing all changes and include an 
     appropriate title. 
 

LA Lakers     75  2864-5976  457-1303  1717-2511  105.4 
LA Lakers       969  2241  3210   1830   686   1150   503 
Seattle       75  2792-5882  566-1411  1427-1963  101.0 
Seattle         844  2047  2891   1828   726   1068   356 
Phoenix       74  2814-6087  376-1088  1291-1731  98.6 
Phoenix         922  2215  3137   1881   684   1119   383 
LA Clippers   75  2686-6120  483-1338  1371-1903  96.3 
LA Clippers     962  2082  3044   1384   574   1216   413 
Portland      74  2595-5755  292-947  1505-2054  94.4 
Portland        984  2273  3257   1576   514   1263   422 
Sacramento    75  2706-6112  258-736  1345-1944  93.5 
Sacramento      999  2094  3093   1689   547   1143   388 
Golden State  75  2588-6321  169-639  1231-1741  87.7 
Golden State   1198  2266  3464   1532   566   1266   428 
Possible Solution
Problem by Rodney 
 
Here is a "real life" problem I encountered while completing a class project for a statistics course I took last semester.  I will spare you the gory and boring details with respect to the purpose of  the consulting project I did for the class and get right to the point. 

The project used historical data contained in the ASU data warehouse---I wanted high school math information about students enrolled in introductory (MAT 106) and intermediate algebra (MAT 117) courses at ASU.  An SQL (Structured Query Language) script was written to create the data set.  This script produced a data set containing 9 variables (see below for variable description). 

The data was messy and needed additional work before it could be used. Several "repairs" were necessary. Specifically, student data pertaining to high school academic performance was reported across multiple records.  For example, a student enrolled in MAT 106 may have taken three math courses while in high school; thus would be three lines of information in the data set (one line for each high school math course taken) for this one student. 

For the purposes of analysis, a single grade representing the students' for high school math GPA and a binary flag (coded either 0 or 1) for any high school math course that meets ASU's undergraduate entrance was needed for each student. 

Below is a list of the things needed to make the data usable.  

/****************Cleaning the Data***************/ 

1) Delete all records containing an Asu Converted grade of "I", "W","X", or "P". 

2) Delete all records containing Course Id of "MAT 114" (recall, we  want only students taking MAT 106 and MAT 117). 

/*************Construct the GPA*****************/ 
1) Create an numeric variable where each ASU Converted Grade as follows: 
A,A+,A- = 4.0; B+,B,B- = 3; C+,C,C- = 2 . . . F,E,U = 0. 

2) For each record multiply the numeric ASU converted grade (created above) by the class hours (by doing this each grade becomes weighted based on class--thus a math course worth 1 class hour has greater weight than a math course worg only .5 class hours). 

3)For each student (using the unique record identifier Affiliate Id) add the weighted scores for each math course together (created in step 2, above) and divide by the sum of the class hours---this value represents the overall high school math GPA for the student. 

/**************Make a Binary Variable************/ 

Create a binary flag variable where any math course with an "M" for the variable Class Competency is coded as 1 and anything else is coded as 0. 

/****************Make a New Dataset***************/ 

Finally, output the data to an external file so that you have one record per student containing the variables Semester, Affiliate ID, Course ID, Schedule Line number, newly created highschool GPA, and the binary flag for Class Competency  
 
For those who want to take a look at the data set you can get a copy of the file from my AFS Instructor volume by using the following UNIX command: 

cp afs/asu.edu/windows/iv/pos598/data/19968hs.csv  19968.csv 

Semester,  semester student taking course at ASU 
Affiliate Id, student id number 
Course Id,  math course number 
Schedule Line Number  
Class Title, high school course name 
Class Hours, credit hours earned for high school math class 
Class Grade, grade earned for high school class 
Asu Converted Grade, ASU conversion of high school grade  
Class Competency Met Code, high school math course meets ASU requirements. 

 
   Possible Solution
Problem by Debbie 

The data set below represents the applicant pool for Faber University's 1999 Freshman class.  

1,3.5,2.9,1030,21,2.0,3.7,1 
2,2.4,2.8,1200,29,2.0,3.0,1 
3,3.0,1.9,880,15,1.0,2.0,4 
4,2.9,3.0,990,17,2.0,3.0,2 
5,2.7,2.9,990,17,2.5,2.8,4 
6,2.9,2.9,.,17,3.0,2.8,3 
7,2.8,2.9,990,16,2.6,2.4,2 
8,2.7,2.8,990,17,2.0,3.0,4 
9,2.8,2.8,900,18,2.7,3.1,2 
10,2.4,2.6,750,18,2.9,2.0,4 
11,2.9,2.9,1000,17,2.0,3.2,3 
12,2.4,2.0,1050,18,2.8,2.5,4 
13,2.0,2.0,1100,19,2.0,2.0,2 
14,2.7,2.8,1000,17,2.0,3.5,3 
15,2.9,2.9,1030,19,2.0,3.7,3 

Variables are respectively (a) ID; (b) overall gpa; (c) academic gpa; (d) Combined SAT score; (e)Composite ACT score; (f) math gpa; (g) English gpa; and (h) admission status variable wherein1=admit, 2= case review, 3=wait list, and 4=deny. 

Write a program to:   

  • Calculate average overall gpa,academic gpa,  Combined SAT scores, Composite ACT scores, math gpa, and English gpa for applicant pool first, then by admission status variable.
  • 15% of the applicant pool can be admitted without meeting admission criteria.  Create a formula to select the 15% from the not admitted pool including deny.  Create a new admission variable called SpecAdm.
  • Repeat the second part of bullet one (Calculate the variable averages by admission status variable and the new admission variable.)
  • Proc print to print out the list of Special Admits sorted by academic gpa.
Possible Solution