Support the GITP forums on Patreon
Help support GITP's forums (and ongoing server maintenance) via Patreon
Results 1 to 5 of 5
  1. - Top - End - #1
    Troll in the Playground
     
    BardGuy

    Join Date
    Jan 2009

    Default a couple python questions

    I'm trying to teach myself Python, and one thing I'm doing is replicating some work projects (originally done in SAS) in Python.

    I got one thing mostly working, but I have two questions.

    1. Could someone explain how string subsetting is working? I have it working in the examples below, but I don't understand why yr[0:2] and yr[2:4] are giving me the answers I want.
    Example is near the top of the CODE below, under Part One.

    2. I'm using the pandas to_csv function to write out data to a CSV file. However, I need metadata in my header row instead of the column headings. I figured this out for the most part, but I'm getting a few issues.
    2a. I'd like the CSV I'm making to be replaced each time I run the program. If I don't set the mode, the custom header row fails to be written. If I set the mode to a (for append), it appends stuff instead of overwriting.
    I tried looking for the documentation, but couldn't find any details.

    2b. When the CSV is written out, there's a blank row between each row in my data frame. I'm guessing there's a hidden carriage return somewhere in the data, but I can't find it. Or is there some option that sets how Python acts between lines?

    Issues with #2 are in Part Five, at the bottom of the program.

    Spoiler: summary of what code is doing
    Show


    Part One: creating metadata needed later on and setting input/output directories

    Part Two: I'm taking a short CSV and extracting the DistrictCode (an ID number) and a GrantStatus (Yes or No), then converting that Y/N to a different format.

    Part Three: checking part two worked

    Part Four: getting more metadata

    Part Five: writing the CSV. Includes creating some kind of stupid columns that are required by the project (like a row counter and some blank columns).


    Code:
    #PART ONE
    yr="1819"
    myyear="20"+yr[0:2]+"-20"+yr[2:4] #should be 2018-2019
    
    from datetime import date
    today=date.today()
    today=today.strftime("%m%d%y")
    
    filename="SCLEASUBGRANTS170"+myyear[7:9]+"PY.csv"
    filepath="O:\various folders\Programming Conference 2020"
    #you can check any of these by typing a PRINT command like
    #print(yr)
    
    #PART TWO
    import pandas as pd
    myinput=dir+'\\N170 data.csv'
    mydata=pd.read_csv(myinput, dtype={'DistrictCode':str})  
    #dtype sets DistrictCode as character, avoiding the issue we saw in PROC IMPORT
    mydata['grant']='MVSUBG'+mydata['GrantStatus'].str.upper() #create and UPCASE grant status
    mydata=mydata.sort_values('DistrictCode')
    
    #PART THREE
    print("Quality Control Check")
    print(pd.crosstab(mydata['DistrictCode'], columns='count'))
    print(pd.crosstab(mydata['grant'], mydata['GrantStatus']))
    #instead of using CROSSTAB for DistrictCode, you could use
    #print(mydata['DistrictCode']) 
    #since only 1 row per district, but crosstab better replicates PROC FREQ
    
    #PART FOUR
    obsnum=mydata.shape[0] #note: mydata.count() does something slightly different than desired
    firstline="LEA SUBGRANT STATUS,"+str(obsnum)+","+filename+",SCDE XX "+today+","+myyear
    #print(firstline)
    
    #PART FIVE
    mydata['counter']=mydata.reset_index().index+1
    mydata['StateNumber']='45'
    mydata['StateCode']='01'
    mydata['filler1']=''
    mydata['filler2']=''
    outputThis=mydata[['counter','StateNumber','StateCode','DistrictCode','filler1','grant','filler2']].copy()
    print(outputThis)
    with open(filepath+'\\'+filename, mode='a') as file:
        file.write(firstline+"\r\n") #stuff at the end is a carriage return
        outputThis.to_csv(file, index=None, header=False)
    For isssue 2, here's an excerpt of my CSV
    Code:
    LEA SUBGRANT STATUS,9,SCLEASUBGRANTS17019PY.csv,SCDE XX 101619,2018-2019
    
    1,45,01,0160,,MVSUBGNO,
    
    2,45,01,0201,,MVSUBGNO,
    I don't want those blank lines between the header and the first row or between the data rows.

  2. - Top - End - #2
    Orc in the Playground
     
    MindFlayer

    Join Date
    Feb 2015

    Default Re: a couple python questions

    Question #1 is easy to explain. Python uses 0-based indexing, but the second number of a range is the number just beyond the range. So if s='abcd', then s[0] == 'a', s[1] == 'b', s[2] == 'c', s[3] == 'd', and s[1:3] == s[1] + s[2] == 'bc'.

  3. - Top - End - #3
    Troll in the Playground
     
    BardGuy

    Join Date
    Jan 2009

    Default Re: a couple python questions

    Quote Originally Posted by DavidSh View Post
    Question #1 is easy to explain. Python uses 0-based indexing, but the second number of a range is the number just beyond the range. So if s='abcd', then s[0] == 'a', s[1] == 'b', s[2] == 'c', s[3] == 'd', and s[1:3] == s[1] + s[2] == 'bc'.
    Thanks. That makes a lot of sense.

    I think I was confusing the end as the number of characters I wanted (more akin to how SAS' substr function works) or didn't think it could go beyond the index range for when I needed the last character.

    char='abcd'
    print(char[1:4])
    bcd

    ---

    This somehow reminded me of another, unrelated question. Does Anaconda require being online to work? I was working on a Python program using its Spyder interface, and when I disconnected from the work network and internet it just stopped functioning right.

  4. - Top - End - #4
    Firbolg in the Playground
     
    Jasdoif's Avatar

    Join Date
    Mar 2007
    Location
    Oregon, USA

    Default Re: a couple python questions

    Quote Originally Posted by JeenLeen View Post
    1. Could someone explain how string subsetting is working? I have it working in the examples below, but I don't understand why yr[0:2] and yr[2:4] are giving me the answers I want.
    Think of it this way: the indexes aren't the elements themselves, they're the spots between the elements (and at the start and end):

    Position 0 1 2 3 4
    Element "1" "8" "1" "9"

    So x[0:2] gets the elements between the 0 and 2 positions.

    Quote Originally Posted by JeenLeen View Post
    2a. I'd like the CSV I'm making to be replaced each time I run the program. If I don't set the mode, the custom header row fails to be written. If I set the mode to a (for append), it appends stuff instead of overwriting.
    I tried looking for the documentation, but couldn't find any details.
    Huh. I couldn't recreate this, if I do something like
    Code:
    with open('testFile.csv','w') as thisFile:
      thisFile.write("TEST,TEST\r\n")
      randomTestFrame.to_csv(thisFile, index=None, header=False)
    it overwrites an existing file and keeps the test line. If you were passing a path to to_csv, pandas would be opening the file itself and you'd need to pass mode="a" so it didn't overwrite the file with your header; but you're passing a file object...have you tried it with mode="w" (on the open call) since you started using a file object for to_csv?

    I also see extra characters at the end of each line in vim, which I think is....

    Quote Originally Posted by JeenLeen View Post
    2b. When the CSV is written out, there's a blank row between each row in my data frame. I'm guessing there's a hidden carriage return somewhere in the data, but I can't find it. Or is there some option that sets how Python acts between lines?
    I'm guessing you're using a file viewer that treats both line feed ("\n") and carriage return ("\r") characters as new lines, so you're getting doubled up (and double-spaced); and that you probably only want "\n".

    So, try telling pandas that's what you want, by using line_terminator="\n" on the to_csv call. (You'd also want to pull "\r" out of your manual write line, if this works)


    Also: always keep your import lines at the top of the code; this can prevent annoying confusion later.
    Feytouched Banana eldritch disciple avatar by...me!

    The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas

  5. - Top - End - #5
    Troll in the Playground
     
    BardGuy

    Join Date
    Jan 2009

    Default Re: a couple python questions

    And thank you.

    This code, based on your recommendations, worked and resolves both issues.
    Code:
    with open(filepath+'\\'+filename, mode='w') as file:
        file.write(firstline+"\r") #stuff at the end is a carriage return
        outputThis.to_csv(file, index=None, header=False, line_terminator="\n")
    I'm just opening these in Notepad, so I'm surprised that it would be viewing stuff odd.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •