Pages: [1] 2
  Print  
Author Topic: Serious Memory Leak  (Read 701 times)
Dmes
Newbie
*
Posts: 1


« on: October 12, 2011, 06:02:21 PM »

To the Rapid Miner development team:

There is a very serious memory leak in Version 5.1.  I am reading a large (900,000 rows) csv file in.  The system monitor shows memory usage slowly increasing, as expected. But when the process finishes, and a new process is started, the memory usage starts at the same level where it was when the first process ended-  the  2nd process then crashes due to lack of memory!

I have tested this with the Windows performance monitor as well- which confirmed that the memory was not being released when the pocess ended. 

I am using the "Free Memory" operator- which seems to have no effect.

The only way to run the 2nd process is to restart Rapid Miner!

Please correct this error as soon as possible!

Thanks!
Logged
misanthropic789
Newbie
*
Posts: 8


« Reply #1 on: January 19, 2012, 11:40:42 PM »

I am encountering the same problem - Has any work been done on this?
Logged
ChrisI
Newbie
*
Posts: 14


« Reply #2 on: January 22, 2012, 08:09:08 AM »

Hi,

I have had something similar with v. 5.1. When running LoopAttributes inside which there is a single GenerateAttribute operator, after less than 200 iterations (new attributes), it runs out of memory and 'seizes up'. The dimensionality of each example vector is 28 (reals) and the total number of example vectors is 20,000 so I cannot see that there is cause for lack of memory...... my central memory space is 8GB and no other applications are running, the Xms parameter for Java is set at 6GB.........Huh

ChrisI
Logged
misanthropic789
Newbie
*
Posts: 8


« Reply #3 on: January 22, 2012, 08:35:41 PM »

That is an interesting observation - I also am using loops that have a generate attribute statement in them.  My system has 16G total of memory and 12 allocated to rapidminer.  The loop is executed 5 times and the base data file is about 1G, so even if it loaded the file 5 times in a row that still shouldn't fill up the memory. 

Worse, it doesn't release when the job is over even with a Free Memory box as the last step of the job.  That means if I run another job immediately afterward, it will fail due to insufficient memory.  I'd be happy to provide more information if someone can tell me what is needed to troubleshoot this issue.
Logged
Uwe
Newbie
*
Posts: 23


WWW
« Reply #4 on: January 23, 2012, 03:09:31 PM »

I observed a similar behavior. The memory was very fast filled. The Free memory operator "did not work".

Uwe
« Last Edit: January 23, 2012, 03:11:25 PM by Uwe » Logged
wessel
Sr. Member
****
Posts: 366


« Reply #5 on: January 23, 2012, 04:51:12 PM »

If I start rapid miner, run any process, and leave rapid miner running I see the memory used by javaw.exe slowly growing.
Logged
Marius
Global Moderator
Sr. Member
*****
Posts: 370



WWW
« Reply #6 on: February 02, 2012, 05:08:58 PM »

Hi,

I think in this thread there are described several problems.

If I start rapid miner, run any process, and leave rapid miner running I see the memory used by javaw.exe slowly growing.
This does probably no harm, RapidMiner just does some background calculations (e.g. updating the memory monitor Smiley ), and since it does not need the memory the garbage collection is not triggered. As soon as the memory is needed, it will be cleared.


To the Rapid Miner development team:

There is a very serious memory leak in Version 5.1.  I am reading a large (900,000 rows) csv file in.  The system monitor shows memory usage slowly increasing, as expected. But when the process finishes, and a new process is started, the memory usage starts at the same level where it was when the first process ended-  the  2nd process then crashes due to lack of memory!

I have tested this with the Windows performance monitor as well- which confirmed that the memory was not being released when the pocess ended. 
Just a guess: did you leave the results view open? For that, the data also stays in memory.

Quote
I am using the "Free Memory" operator- which seems to have no effect.
The Free Memory operator only triggers the garbage collection explicitly, which frees data which is not needed for anything. That could speed up things later, but it does not free any memory which would not be freed automatically. Thus it won't solve any out-of-memory problems

Quote
The only way to run the 2nd process is to restart Rapid Miner!
Please try to close the result tab before running the second process. If that helps, we are done, if not, we will have a look at it.


my central memory space is 8GB and no other applications are running, the Xms parameter for Java is set at 6GB.........Huh
Please check that RapidMiner can really access that much memory. If not, please try the Xmx option instead of Xms.




@all: please let us know if your problems persist.

Best,
Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
ChrisI
Newbie
*
Posts: 14


« Reply #7 on: February 05, 2012, 12:33:34 AM »

Hi,

I have checked the RapidMiner memory use via the System Monitor in the Results screen. With Xms set to 6GB it frequently clocks 5.2 GB.

Chris.
Logged
ChrisI
Newbie
*
Posts: 14


« Reply #8 on: February 11, 2012, 07:01:42 PM »

Hi again,

I run a loop on an ExampleSet with 20000 vectors (Examples) each vector made up of 8 integers which I subsequently convert to reals.

The loop grinds to a halt at 218 loops at which point the memory usage is showing read at max 4.2GB. Either I am doing something stupid or there is something weird going on..... Huh

How can I get the xml data and ExampleSet to you?

Kindest Regards,

ChrisI

Logged
ChrisI
Newbie
*
Posts: 14


« Reply #9 on: February 11, 2012, 07:06:52 PM »

Hi again,

Referring to my posting on the looping problem, I have managed to read Marius' instructions on posting.... Wink

Here is the xml:
Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
    <process expanded="true" height="540" width="682">
      <operator activated="true" class="read_csv" compatibility="5.2.000" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\Chris\Documents\STRATH-WEIR\CLUSTER-Event-20k.csv"/>
        <parameter key="column_separators" value=","/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.2.000" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="X_Value"/>
        <parameter key="invert_selection" value="true"/>
      </operator>
      <operator activated="true" class="transpose" compatibility="5.2.000" expanded="true" height="76" name="Transpose" width="90" x="313" y="30"/>
      <operator activated="true" class="numerical_to_real" compatibility="5.2.000" expanded="true" height="76" name="Numerical to Real" width="90" x="45" y="165"/>
      <operator activated="true" class="loop_attributes" compatibility="5.2.000" expanded="true" height="60" name="Loop Attributes" width="90" x="179" y="165">
        <process expanded="true" height="540" width="700">
          <operator activated="true" class="generate_attributes" compatibility="5.2.000" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="30">
            <list key="function_descriptions">
              <parameter key="new-attr%{loop_attribute}" value="%{loop_attribute} * att_20001"/>
            </list>
          </operator>
          <connect from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="example set"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_example set" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Transpose" to_port="example set input"/>
      <connect from_op="Transpose" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/>
      <connect from_op="Numerical to Real" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
      <connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Logged
wessel
Sr. Member
****
Posts: 366


« Reply #10 on: February 11, 2012, 10:02:14 PM »

I slightly modified the process as send by ChrisI.
It now uses the generate data operator instead of read csv so anyone can paste it and run.
I kept a look at the amount of memory Rapid Miner was using:

idle memory usage   2.0GB (used by system not rapid miner)
start Rapid Miner   2.6GB
load and run process   2.8GB
press another time run    2.9GB
press 5 more times run   3.1GB
press 5 more times run   3.2GB
press 5 more times run   3.4GB
press 5 more times run   3.6GB
press 5 more times run   3.7GB
press run lots of times   6.7GB
press run lots of times   7.4GB
press run lots of times   8.2GB

http://img1.uploadscreenshot.com/images/orig/2/4106595534-orig.jpg



edit: if you wish I can try to do the same thing on Ubuntu linux and on a machine with even more memory.

Best regards,

Wessel
« Last Edit: February 11, 2012, 10:04:19 PM by wessel » Logged
ChrisI
Newbie
*
Posts: 14


« Reply #11 on: February 12, 2012, 03:19:32 AM »

Hi,

Tried using the GenerateData operator instead of the ReadCSV, just in case there was something confounding the issue. No change.

The machine locks up indicating 5.8GB memory used.

Code:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
    <process expanded="true" height="540" width="682">
      <operator activated="true" class="generate_data" compatibility="5.2.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="number_examples" value="20000"/>
        <parameter key="number_of_attributes" value="8"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.2.000" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="transpose" compatibility="5.2.000" expanded="true" height="76" name="Transpose" width="90" x="313" y="30"/>
      <operator activated="true" class="select_attributes" compatibility="5.2.000" expanded="true" height="76" name="Select Attributes (2)" width="90" x="49" y="165">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="id"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="loop_attributes" compatibility="5.2.000" expanded="true" height="60" name="Loop Attributes" width="90" x="179" y="165">
        <process expanded="true" height="540" width="700">
          <operator activated="true" class="generate_attributes" compatibility="5.2.000" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="30">
            <list key="function_descriptions">
              <parameter key="new-attr%{loop_attribute}" value="%{loop_attribute} * att_20000"/>
            </list>
          </operator>
          <connect from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="example set"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_example set" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Transpose" to_port="example set input"/>
      <connect from_op="Transpose" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
      <connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Logged
ChrisI
Newbie
*
Posts: 14


« Reply #12 on: February 12, 2012, 03:38:33 AM »

HI,

Using MaterializeData and FreeMemory operators inside the loop keeps the memory consuption down, BUT the execution speed is totally uancceptable
  Huh

ChrisI
Logged
Marius
Global Moderator
Sr. Member
*****
Posts: 370



WWW
« Reply #13 on: February 13, 2012, 05:51:54 PM »

I investigated this issue, and the good news is: we don't have a memleak, the memory is just not freed Smiley

What I found out is the following: the JVM claims a lot of system memory quite fast, and almost never frees it. Internally however, the memory used (and not just claimed) by RapidMiner, is cleaned up between or during process runs.

As test process I used the process posted above with 1000 examples.

Running the same process with 20000 examples probably does not work, since with 1000 examples it already needs about 1GB of memory (this is probably improvable, and certainly will be improved in the future). At least the memory is correctly cleaned (inside the JVM) between process runs, and RapidMiner does not run out of memory, as long as the example sets are reasonably sized.

Best, Marius
Logged

Please add [SOLVED] to the topic title when your problem has been solved! (do so by editing the first post in the thread and modifying the title)
wessel
Sr. Member
****
Posts: 366


« Reply #14 on: February 13, 2012, 05:56:51 PM »

I investigated this issue, and the good news is: we don't have a memleak, the memory is just not freed Smiley

What I found out is the following: the JVM claims a lot of system memory quite fast, and almost never frees it. Internally however, the memory used (and not just claimed) by RapidMiner, is cleaned up between or during process runs.

As test process I used the process posted above with 1000 examples.

Running the same process with 20000 examples probably does not work, since with 1000 examples it already needs about 1GB of memory (this is probably improvable, and certainly will be improved in the future). At least the memory is correctly cleaned (inside the JVM) between process runs, and RapidMiner does not run out of memory, as long as the example sets are reasonably sized.

Best, Marius

Maybe you guys should make a button to "try and free memory".
Judging based on your description this should work.
What I do now, if need more memory, is simply close and restart Rapid Miner.
Logged
Pages: [1] 2
  Print  
 
Jump to: