-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsustainable memory consumption #472
Comments
Hi @uditsharma29, thanks for the report. Can you please provide a bit more info as to what you're doing? Are you by any chance creating CPython objects in the main loop? That's the source of another issue so I'm wondering if it's the same here. |
Hi @arshajii , thanks for the response. Here is what my code does (pseudocode):
The above is supposed to happen daily with new 10 million data. for 30 days. Codon is unable to deal with the data post about 7 days when it reaches a memory consumption of 80+ GB. Here is the abstracted code for you to understand the working:
I am not sure what you mean by creating CPython objects in the main loop. In this case, even if the processing happens in the functions, the entire data does go through the main loop at some point. I don't see a different way to avoid passing data through the main loop. Please let me know if you need more information to understand the issue further. |
Thanks for the follow-up -- out of curiosity can you try running with the environment variable By CPython object, I meant if you had any |
There is little to no improvement when using the GC_MAXIMUM_HEAP_SIZE env variable. Just in case it matters, I specified the GC_MAXIMUM_HEAP_SIZE in an @python block as I am not using any |
Clarification on the CPython objects, We are using functions with @python decorators for I/O operations i.e. reading the files initially and writing (updating) the files at the end of processing. There are no CPython objects during processing.
In this case, since the |
Hello,
While working with real-world data with a couple hundred million rows, Python is able to run on about 50GB memory consumption as per the Mac's activity monitor. On the other hand, Codon uses up more than 80 GB, slows down incredibly and breaks at the 25% mark. Similar memory consumption was observed even when I implemented further batching indicating that the memory consumed in prior batches is not being released even after the batch is done processing. I also tried to use the 'del' keyword as well as tried to gc.collect() ASAP but there is no improvement.
How does Codon manage memory? What should I be doing to explicitly release memory from variables which are no longer needed?
Please let me know if you need anything from my end to replicate the issue.
The text was updated successfully, but these errors were encountered: