A Monster for Every Season: Summer 2
You can get A Monster for Every Season: Summer 2 now at Gumroad
Results 1 to 13 of 13
  1. - Top - End - #1
    Bugbear in the Playground
    Join Date
    Feb 2020

    Default Reduce the file size of SQL export to files

    So at work, we were using a pre-build SP in our database to generate csv/txt files with the output. The standard file size was usually about 30 MB for the monthly files and 240 MB for yearly file.

    However, I noticed on Friday that the output of the SP was fundamentally wrong in a few of the output, so I designed a new script that perfectly recreated the same output as the SP was meant to generate ("perfectly" in the sense of what was being output, some cell formats might be a bit off but nobody cares).

    However, after I tested my new script and validated the output against a benchmark, I tried to generate new csv/txt files with the new script. The speed is more higher and the entire output is a lot more efficient. However the files generated from my script are 145 MB for the monthly file, and a wooping 2.1Gb for the yearly file.

    Clearly, there is something that is causing the data being submitted by the new script to take a lot, lot more information but a customary comparative review of both files doesn't indicate the source of the problem (it's not like the # of decimal was fundamentally changed). And I was hoping someone would have some hints or advice about how I resolve this problem. Maybe it's a matter of the formatting? maybe it's in the script setting that I have to alter something?


    If you wonder what kind of data we are generating: it's just standard performance statistics, with fund names/codes, benchmark names/codes and rate of returns with about 5-6 decimals. I get there's a lot of it in the output we generate (like I said, the original yearly output had about 240MB worth of .txt).

  2. - Top - End - #2
    Titan in the Playground
     
    Jasdoif's Avatar

    Join Date
    Mar 2007
    Location
    Oregon, USA

    Default Re: Reduce the file size of SQL export to files

    It seems particularly strange that the yearly file grew at such a different factor than the monthly files....My first question would be whether you're getting duplicate rows; the slower speed of the original stored procedure you mentioned could be the result of a SELECT DISTINCT somewhere.
    Feytouched Banana eldritch disciple avatar by...me!

    The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas

  3. - Top - End - #3
    Troll in the Playground
     
    PaladinGuy

    Join Date
    Mar 2012
    Location
    UK
    Gender
    Male

    Default Re: Reduce the file size of SQL export to files

    Agreed. Since you appear to be outputting text files there must be a hugh difference in the amount of data output.
    Are you able to run the queries to screen to compare the records returned?

    Another (worse) possiblity is an open join (can't remember technical name) - where you return every record of one table for every record of another because the join isn't actually defined.

    Another possibility (but the differences seem to large) is if one script is formatting the numebrs as text and adding in the unnecessary spaces etc.

  4. - Top - End - #4
    Bugbear in the Playground
    Join Date
    Sep 2013

    Default Re: Reduce the file size of SQL export to files

    Can you run your new report to generate an old set of results from before your change - if there's nothing wrong, then they should be exactly the same size. But I'd suspect you've got a join going wrong somewhere, and comparing the two outputs should give you an indication of where that is.

    Although don't ignore the chance that the original export was wrong and you've been missing some data.

  5. - Top - End - #5
    Colossus in the Playground
     
    BlackDragon

    Join Date
    Feb 2007
    Location
    Manchester, UK
    Gender
    Male

    Default Re: Reduce the file size of SQL export to files

    How many rows in each set of data? The physical size of the files isn't really as significant as the amount of data in them, which the number of rows will be a much better guideline toward.

  6. - Top - End - #6
    Troll in the Playground
     
    Griffon

    Join Date
    Jun 2013
    Location
    Bristol, UK

    Default Re: Reduce the file size of SQL export to files

    Is "diff" still available?
    The end of what Son? The story? There is no end. There's just the point where the storytellers stop talking.

  7. - Top - End - #7
    Bugbear in the Playground
    Join Date
    Feb 2020

    Default Re: Reduce the file size of SQL export to files

    Quote Originally Posted by Storm_Of_Snow View Post
    Can you run your new report to generate an old set of results from before your change - if there's nothing wrong, then they should be exactly the same size. But I'd suspect you've got a join going wrong somewhere, and comparing the two outputs should give you an indication of where that is.

    Although don't ignore the chance that the original export was wrong and you've been missing some data.
    Quote Originally Posted by factotum View Post
    How many rows in each set of data? The physical size of the files isn't really as significant as the amount of data in them, which the number of rows will be a much better guideline toward.
    I will compare # of rows of both extract and see if its really just more rows generated then.

  8. - Top - End - #8
    Titan in the Playground
     
    Grey_Wolf_c's Avatar

    Join Date
    Aug 2007

    Default Re: Reduce the file size of SQL export to files

    Quote Originally Posted by Cikomyr2 View Post
    I will compare # of rows of both extract and see if its really just more rows generated then.
    Definitely start there. Also check you are generating the same number of columns. And I'd also check if the old file was truncating the character fields. I've had to deal way too frequently with text fields with literally thousands of characters, row after row after row, and it might be that if you have any of those, the old process was cutting them off at a hundred or so, producing massive gains.

    But yeah, in general, your first step is to identify exactly what the difference is between old and new. Excel can be of great help - just open the old and new in two tabs, ordered by PK, and have a third tab comparing them one by one (tab1!A1=tab2!A1, drag for every column & row), then look for the "FALSE".

    Grey Wolf
    Last edited by Grey_Wolf_c; 2022-06-20 at 11:28 PM.
    Interested in MitD? Join us in MitD's thread.
    There is a world of imagination
    Deep in the corners of your mind
    Where reality is an intruder
    And myth and legend thrive
    Quote Originally Posted by The Giant View Post
    But really, the important lesson here is this: Rather than making assumptions that don't fit with the text and then complaining about the text being wrong, why not just choose different assumptions that DO fit with the text?
    Ceterum autem censeo Hilgya malefica est

  9. - Top - End - #9
    Ettin in the Playground
     
    Chimera

    Join Date
    Dec 2015

    Default Re: Reduce the file size of SQL export to files

    As others have said, the first places to check is the # of rows (in most cases, the primary driver of file size). After that, size of variable fields (who knows, maybe the new method defaults text strings to 32,000 bytes of something).

    Side question, on what platform are you running SQL, and what modifiers are you including on the export?

    Quote Originally Posted by Khedrac View Post
    Another (worse) possiblity is an open join (can't remember technical name) - where you return every record of one table for every record of another because the join isn't actually defined.
    Cross or Cartesian.

    One place that I have seen problems occur is where someone tries to replicate another bit of code, and doesn't realize that there are stealth limiters on it somehow. One of them being a left join with a where statement which effective converts the output into an inner join. For example:
    Code:
    proc sql; 
    create table xyz as select distinct
    a.var1,
    a.var2,
    a.var3,
    b.var_1,
    b.var_2,
    b.var_3
    from abc as a
    left join def as b
    where b.var_4 in ('sample', 'random', 'example');
    quit;
    Will not actually return back all the rows from table abc, despite being a left join (since output is predicated on a field derived from table def). If OP had something like this in the original version, and didn't replicate it in the new one, they may be bringing in a lot more data in the new version.

  10. - Top - End - #10
    Bugbear in the Playground
    Join Date
    Feb 2020

    Default Re: Reduce the file size of SQL export to files

    Quote Originally Posted by Willie the Duck View Post
    Side question, on what platform are you running SQL, and what modifiers are you including on the export?

    The sql scripts are ran on standard Microsoft SQL Studio.

    I dont know what "modifiers" are, as my knowledge of batch command is of the "just repeat what came before me" school of learning.

  11. - Top - End - #11
    Titan in the Playground
     
    Grey_Wolf_c's Avatar

    Join Date
    Aug 2007

    Default Re: Reduce the file size of SQL export to files

    Quote Originally Posted by Cikomyr2 View Post
    The sql scripts are ran on standard Microsoft SQL Studio.

    I dont know what "modifiers" are, as my knowledge of batch command is of the "just repeat what came before me" school of learning.
    They mean all the clauses after the select and the join - the where clause (filter) and group by clause (aggregation) both can significantly reduce the size of the output (as does the Having clause, which is a sort of combination of both).

    But we are getting into the weeds now. We can help you further once we have some idea of where the problem is, but you do have to find that out first. I don't want to lead you down a garden path of examining joins for cartesian issues until we suspect that's the problem, otherwise it's just wasting your time hunting for snipes.

    ETA: Although, speaking of, assuming we do need to help further, how free are you to post the actual SQL? If there is a typo you are missing, it might be the only way we can find it.

    Grey Wolf
    Last edited by Grey_Wolf_c; 2022-06-21 at 08:39 AM.
    Interested in MitD? Join us in MitD's thread.
    There is a world of imagination
    Deep in the corners of your mind
    Where reality is an intruder
    And myth and legend thrive
    Quote Originally Posted by The Giant View Post
    But really, the important lesson here is this: Rather than making assumptions that don't fit with the text and then complaining about the text being wrong, why not just choose different assumptions that DO fit with the text?
    Ceterum autem censeo Hilgya malefica est

  12. - Top - End - #12
    Bugbear in the Playground
    Join Date
    Feb 2020

    Default Re: Reduce the file size of SQL export to files

    Thank you y'all!

    Found the problem. I was adding 6x as many row as i expected because i didnt narrow down the scope of cross-table interactions.

    Thats what happen you program at 23:00 kids after a full work day. Dont binge program you end up doing stupid mistakes.

  13. - Top - End - #13
    Troll in the Playground
     
    PaladinGuy

    Join Date
    Mar 2012
    Location
    UK
    Gender
    Male

    Default Re: Reduce the file size of SQL export to files

    Glad to hear it was a relatively simple fix.

    PS Thank-you to Willie the Duck for providing the correct terms for cross joins (it's been well over a decade since I did any amount of serious SQL writing).
    Last edited by Khedrac; 2022-06-23 at 02:02 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •