Class Summary March 11th

I.  Summarize 

    A.  SAS Temp vs. Perm 
      1.  Data steps to save
    B.  Functions 
      1.  INDEX 
      2.  SUBSTR 
      3.  COMPRESS
    C.  Unix Command  CAT 
    D.  Titles as organizers 
    E.  ARRAY statements 
    F.  Homework
II.  Questions 
    A.  Max 
    B.  z-Score 
    C.  New variables 
      1.  Must be in data step 
      2.  PROC _______ 
      3.  Get back in data 
      4.  Start new data step 
      5.  Build data before PROC statement 
      6.  Builds as you execute function 
      7.  Goes through function sequentially 
      8.  Be careful as new variables must be referenced 
           above
    D.  Computer rounding rules are a function of precision, which depends on the architecture of its binary operations.
III.  Midterm 
    Bring notes, books, example programs . . .
IV.  Review 
    A.  SET 
      1.  Calls data and appends or in other words adds new data.
    B.  MERGE 
      1.  Merges two data sets, but second set will overwrite first.
    C.  UPDATE 
      1.  Updates data set with another set and does not overwrite missing values of first data set.
    D.  VAR 
      1.  Want only certain variables to print out on report, for instance i1-i50. 

      ex.  PROC PRINT; 

        VAR  i1-i50; 
         
      Don't want i34 
      ex. PROC PRINT; 
        VAR i1-i33 i35-i50 
         
      Another example: 
        Data One; INFILE 'xxxx'; 
          Age SES HSGRAD;
        PROC PRINT; 
          VAR ID--HSGRAD;
      Double dash will do everything based on ordering in INPUT statement.
    E.  Important Question 
      If DIS GE 16 then 
        delete;
      else if (DIS+DE+AV) GE 34 then 
        delete;
      This represents a case where ELSE doesn't matter. 
      Not clear of rule.  Make up a number and go through the procedure (test).
V.  Project 
    A.  Change z-Scores to t-Scores 
      M= 50 
      SD=10
     
    Could add 50; z-Score centered on 0; multiply numbers; SD=1 multiply by 10 so SD=10. 

    z-Score*10 widens distribution; then want mean to equal 50 
    t-Score = 50+(z-Score*10); 
    shifts up 50 points 

        20 to 70 
        10 to 60 
          0 to 50. 
         
    Can nest many functions, for example 
      TD =  50 + ( (D-16) /4.3)*10;
     B.  Discussion of validating rules for ORAS  
      1.  Mahalanova (sp?) Distance 
        a.  2 dimensional 
        b.  how far each observations is from mean; plot distance
      2.  Cluster Analysis 
        a.  Based on empirical location. 
        b.  Used CART (Classification and Regression Technique) to make sure cluster was meaningful 
         
    C.  In-class assignment 
      1.  Access log.file 
      2.  Input data 
      3.  Read time as string 
      4.  Get hours out (substring) 
      5.  PROC FREQ for table of hours 
        a.  PROC FREQ; 
          tables hour;
      6.  Solution Examples 

      data hour; 
      infile 'access.log'; 
      input junk1 $ junk2 $ junk3 $ stuff $15.; 
      hour=substr (stuff,14,2); 
      * above reads only the hour information; 
      proc freq; 
      tables hour; 
      run; 
       
      data showhour  ; 
      infile 'access.log'; 
      input rec $ 1-80; 
      pos = index(rec,':')+1; 
      hour = substr(rec, pos, 2); 
      proc print  data=showhour; 
      title ' Yang Yu Class Show Hour Using Substr Function '; 
      run; 
       
      data hours; 
      *use substring function to get the hour listed on each line; 
      *We could use free field input; 
      infile 'access.log'; 
      input stuff1-stuff4 $15.; 

       hour=substr(stuff4,14,2); 
       run; 
      proc freq; 
      table hour; 
      proc print; 
       run; 
       
        
       data compacc; 
      infile 'access.log'; 
      input ip$ d1$ d2$ date $25. ; 
      hour = substr (date,14,2); 
      proc freq; 
         tables hour; 
      proc print; 
      run;