import 50.000 lines csv file

A place for users and developers of the Xataface to discuss and receive support.

import 50.000 lines csv file

Postby PolderBoy » Fri Apr 03, 2009 1:53 pm

Dear all,

I am trying to import a 7.5 mb csv file with 50.000 lines. I keep getting a time out.
I have changed the php.ini file on my server.
In this file I have changed the:
max_execution_time = 600
memory_limit = 128M
upload_max_filesize = 28M
As I have read in this forum.
But I keep getting a timeout. It can import a 1000 lines file. But the user has a 50.000 or even more csv file.

Now the thing is that when I use phpmyadm it imports the file in 8 seconds flat.
But I can't let the user use phpmyadm because it could be a security risk. Being able to drop the table.
Thanks for any respons or ideas.

PolderBoy
PolderBoy
 
Posts: 72
Joined: Fri Apr 03, 2009 12:53 am

Postby shannah » Fri Apr 03, 2009 3:55 pm

When you say it is timing out, does that mean that the server is failing to respond, or do you get an error message (e.g. like "Maximum execution time exceeded")?

-Steve
shannah
 
Posts: 4457
Joined: Wed Dec 31, 1969 5:00 pm

error discription

Postby PolderBoy » Sat Apr 04, 2009 1:03 am

The error I get now is:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 47 bytes) in /home/administrator/www/xataface/Dataface/Record.php on line 869

[EDIT]
I just tried to export a csv file and get this error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 2 bytes) in /home/administrator/www/xataface/Dataface/DB.php on line 343
[/EDIT]

But I have recieved many other error as well all claiming that there wasn't enough space or time available.
But those have been solved with the changes I made in php.ini file.

The cpu usage on the server gives a 100% for 1 processor.
For 10 seconds.

Oh and thank you for the quick respons.
I love your program.

With regards,
PolderBoy
PolderBoy
 
Posts: 72
Joined: Fri Apr 03, 2009 12:53 am

Postby sophistry » Sat Apr 04, 2009 7:17 am

Have you tried importing directly to the database? It sounds like you're using the xataface import feature, which may no be able to properly handle the size/time it takes to import that much data (Limitation of web transfers, possibly).

If you have access to your database, like with phpmyadmin, that might work better.

Rory
sophistry
 
Posts: 27
Joined: Mon May 19, 2008 11:20 am

Postby PolderBoy » Sat Apr 04, 2009 9:49 am

Hello Rory,

Yes I have used PhpMyAdmin. It takes 8 sec to upload a 50.000 lines file.
On the same server. But I don't want the final user to have access to PhpMyAdmin because you can drop the table and I don't want the user to be able to do that.

But the user has to be able to upload these files. Many of them. In 3 years there will be 6.5 million records in the MySqL

But the server is using 100% of the processor with Xataface and only 46% with MyPHPAdmin.

With regards,
PolderBoy
PolderBoy
 
Posts: 72
Joined: Fri Apr 03, 2009 12:53 am

Postby PolderBoy » Sun Apr 05, 2009 11:55 pm

Dear All,

I have found that by lowering the max_execution_time back to 60 this error (Fatal error: Allowed memory size) is solved.
But also found that it gives me an error when I try to import my 50.000 lines of csv file:
Fatal error: Maximum execution time of 60 seconds exceeded in /home/administrator/www/xataface/Dataface/Table.php on line 781

Also when looking for a solution to this problem found that there is a bug in php5:
"This php bugs manifests in SugarCRM when many objects are created that have parent->child relationships such as Project->Project task. Even if the parent variable is reassigned to a new object in a loop the child still has a link to it so the memory is not released by the garbage collection in php5.
http://www.sugarcrm.com/crm/index.php?o ... a3e46cc3da


Now I know that this error is in SugarCRM but can it also be in Xataface?

Thanks,
PolderBoy
PolderBoy
 
Posts: 72
Joined: Fri Apr 03, 2009 12:53 am

Postby shannah » Sun Apr 12, 2009 10:52 am

What does the import filter function look like in your app? The memory leak could either be inside your import filter (more likely), or it could be in Xataface.

You can find out which it is by, returning an empty array from your import filter and see if it still times out (but keep all of the same logic otherwise). If it still runs out of memory, then the memory leak is inside your import filter. If not, then it could be a memory leak in Xataface's handling of it.

-Steve
shannah
 
Posts: 4457
Joined: Wed Dec 31, 1969 5:00 pm

Postby PolderBoy » Tue Apr 14, 2009 2:44 am

Dear All,

My import filter:
Code: Select all
<class>setValues($defaultValues);
          
           // Now we add the values from the CSV file.
           $record->setValues(
               array(
                    'D2D_ID'=>$D2D_ID,
                  'ID'=>$ID,
                  'AgentNr'=>$AgentNr,
                  'Voorletters'=>$Voorletters,
                  'Tusselvoegsel'=>$Tusselvoegsel,
                  'Achternaam'=>$Achternaam,
                  'Telefoonnummer'=>$Telefoonnummer,
                  'Straatnaam'=>$Straatnaam,
                  'Huisnummer'=>$Huisnummer,
                  'Toevoeging'=>$Toevoeging,
                  'Postcode'=>$Postcode,
                  'Plaats'=>$Plaats,
                  'Geslacht'=>$Geslacht,
                  'Vrij_veld_1'=>$Vrij_veld_1,
                  'Vrij_veld_2'=>$Vrij_veld_2,
                  'Reasoncode'=>$Reasoncode,
                  'PDF_File'=>$PDF_File,
                  'PDF_JN'=>$PDF_JN,
                  'VL_File'=>$VL_File,
                  'VL_JN'=>$VL_JN,
                  'Aanmaak_Datum'=>$Aanmaak_Datum,
                  'Aanmaak_Tijd'=>$Aanmaak_Tijd
                   )
               );
           $records[] = $record;
       }
   
       // Now we return the array of records to be imported.
       return $records;
   }
}
?>

If I import a file with 1000 lines it works fine, but importing a 50.000 lines file it gives the error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 47 bytes) in /home/administrator/www/xataface/Dataface/Record.php on line 869

I am not saying that there is a memory leak in Xataface.
I found something on the internet and asked if it could be the case.
And yes it is probably something in my code or on my server's settings.

But what did I do wrong?

With regards,
PolderBoy.

[/code]
PolderBoy
 
Posts: 72
Joined: Fri Apr 03, 2009 12:53 am

Postby shannah » Tue Apr 14, 2009 4:49 am

Looks like phpbb is stripping some of your code out of your post. Make sure you check the "Disable HTML in this Post" box when you post. Can you repost it?

-Steve
shannah
 
Posts: 4457
Joined: Wed Dec 31, 1969 5:00 pm

Postby PolderBoy » Tue Apr 14, 2009 7:21 am

Dear All,

Maybe that's my mistake?
There is no html in the file.
This code is in the name_of_table.php in the name_of_table.folder (in this case:/home/administrator/www/BB/tables/D2D/D2D.php)
But here is the code:
<?
class tables_D2D
{

function __import__csv(&$data, $defaultValues=array())
{
// build an array of Dataface_Record objects that are to be inserted based
// on the CSV file data.
$records = array();

// first split the CSV file into an array of rows.
$rows = explode("\n", $data);
foreach ( $rows as $row ){
// We iterate through the rows and parse the name, phone number, and email
// addresses to that they can be stored in a Dataface_Record object.
list($D2D_ID, $ID, $AgentNr, $Voorletters, $Tusselvoegsel, $Achternaam, $Telefoonnummer, $Straatnaam, $Huisnummer, $Toevoeging, $Postcode, $Plaats, $Geslacht, $Vrij_veld_1, $Vrij_veld_2, $Reasoncode, $PDF_File, $PDF_JN, $VL_File, $VL_JN, $Aanmaak_Datum, $Aanmaak_Tijd) = explode(',', $row);

$record = new Dataface_Record('D2D', array());

// We insert the default values for the record.
$record->setValues($defaultValues);

// Now we add the values from the CSV file.
$record->setValues(
array(
'D2D_ID'=>$D2D_ID,
'ID'=>$ID,
'AgentNr'=>$AgentNr,
'Voorletters'=>$Voorletters,
'Tusselvoegsel'=>$Tusselvoegsel,
'Achternaam'=>$Achternaam,
'Telefoonnummer'=>$Telefoonnummer,
'Straatnaam'=>$Straatnaam,
'Huisnummer'=>$Huisnummer,
'Toevoeging'=>$Toevoeging,
'Postcode'=>$Postcode,
'Plaats'=>$Plaats,
'Geslacht'=>$Geslacht,
'Vrij_veld_1'=>$Vrij_veld_1,
'Vrij_veld_2'=>$Vrij_veld_2,
'Reasoncode'=>$Reasoncode,
'PDF_File'=>$PDF_File,
'PDF_JN'=>$PDF_JN,
'VL_File'=>$VL_File,
'VL_JN'=>$VL_JN,
'Aanmaak_Datum'=>$Aanmaak_Datum,
'Aanmaak_Tijd'=>$Aanmaak_Tijd
)
);
$records[] = $record;
}

// Now we return the array of records to be imported.
return $records;
}
}
?>
What did I do wrong?
Thanks,
Brian
PolderBoy
 
Posts: 72
Joined: Fri Apr 03, 2009 12:53 am

Postby shannah » Tue Apr 14, 2009 10:21 pm

As an experiment, try changing the last line where you return the $records array to just return an empty array (keep the rest the same).

e.g.
Code: Select all
return array();


Then see if it times out when you try to run the import. If it times out, then the leak is in the loop. If not, the leak happens after you've handed it off to xataface.

-Steve
shannah
 
Posts: 4457
Joined: Wed Dec 31, 1969 5:00 pm

Postby PolderBoy » Tue Apr 21, 2009 12:22 am

Thanks for the reply,

I have changed the php page like you asked me to do:
//return $records;
return array();

But I get: Fatal error: Maximum execution time of 60 seconds exceeded in /home/administrator/www/BB/tables/D2D/D2D.php on line 21

So that means that the error is in the loop.
Could somebody tell me what is wrong with the loop?

Code: Select all
<?
class tables_D2D
{

   function __import__csv(&$data, $defaultValues=array())
   {
       // build an array of Dataface_Record objects that are to be inserted based
       // on the CSV file data.
       $records = array();
      
       // first split the CSV file into an array of rows.
       $rows = explode("\n", $data);
       foreach ( $rows as $row ){
           // We iterate through the rows and parse the name, phone number, and email
           // addresses to that they can be stored in a Dataface_Record object.
      list($D2D_ID, $ID, $AgentNr, $Voorletters, $Tusselvoegsel, $Achternaam, $Telefoonnummer, $Straatnaam, $Huisnummer, $Toevoeging, $Postcode, $Plaats, $Geslacht, $Vrij_veld_1, $Vrij_veld_2, $Reasoncode, $PDF_File, $PDF_JN, $VL_File, $VL_JN, $Aanmaak_Datum, $Aanmaak_Tijd, $Netto) = explode(',', $row);

           $record = new Dataface_Record('D2D', array());
          
           // We insert the default values for the record.
           $record->setValues($defaultValues);
          
           // Now we add the values from the CSV file.
           $record->setValues(
               array(
                    'D2D_ID'=>$D2D_ID,
                  'ID'=>$ID,
                  'AgentNr'=>$AgentNr,
                  'Voorletters'=>$Voorletters,
                  'Tusselvoegsel'=>$Tusselvoegsel,
                  'Achternaam'=>$Achternaam,
                  'Telefoonnummer'=>$Telefoonnummer,
                  'Straatnaam'=>$Straatnaam,
                  'Huisnummer'=>$Huisnummer,
                  'Toevoeging'=>$Toevoeging,
                  'Postcode'=>$Postcode,
                  'Plaats'=>$Plaats,
                  'Geslacht'=>$Geslacht,
                  'Vrij_veld_1'=>$Vrij_veld_1,
                  'Vrij_veld_2'=>$Vrij_veld_2,
                  'Reasoncode'=>$Reasoncode,
                  'PDF_File'=>$PDF_File,
                  'PDF_JN'=>$PDF_JN,
                  'VL_File'=>$VL_File,
                  'VL_JN'=>$VL_JN,
                  'Aanmaak_Datum'=>$Aanmaak_Datum,
                  'Aanmaak_Tijd'=>$Aanmaak_Tijd,
                                                'Netto'=>$Netto
                   )
               );
           $records[] = $record;
       }
   
       // Now we return the array of records to be imported.
       //return $records;
       return array();
   }
}


I don't think that there is anything wrong with it because I can import a 1000 line csv file.
But not a 49.734 lines file.

But PHPMyadmin does it 8 seconds.

Thanks again,
PolderBoy
PolderBoy
 
Posts: 72
Joined: Fri Apr 03, 2009 12:53 am

Postby s_nalepa » Sat Apr 25, 2009 10:40 pm

Seems like the $records might get very big. Would there be a way to rebuild the script to take the csv file in chunks while putting it into the $records array and then concatenate into the complete array before returning it? Does the exthaust of Time/memory occur before or after splitting the file into rows?

Funny enough, PhpMyAdmin uses PHP as well!
s_nalepa
 
Posts: 16
Joined: Sat Dec 29, 2007 8:43 am
Location: Lund, Sweden

Postby PolderBoy » Mon Apr 27, 2009 9:55 am

Thank you for the reply s_nalepa.

Unfortunaly I am not that good a php programmer to build this.
"take the csv file in chunks while putting it into the $records array"
I know that I can't ask the user to make the csv file in to a 1000 line file and then import it.

In reply to your second tip/question: How do you test something like that whether it crashes before or after splitting the file into rows.
(In vb you can debug but how do you debug in php?)

And yes phpmyadmin is in php as well but I don't have to specify the array in phpmyadmin. But again I am not a proffesional php programmer.
(But I am learning)

With regards,
PolderBoy
PolderBoy
 
Posts: 72
Joined: Fri Apr 03, 2009 12:53 am

Postby shannah » Tue Apr 28, 2009 11:30 am

The problem is likely that PHPMyAdmin uses the mysql command LOAD DATA INFILE to load the whole CSV file into MySQL and let MySQL do the parsing. This is much faster than parsing it in PHP with 50,000 loop iterations, evidently.

It is possible to use the same strategy in Xataface. I'll see if I can put together an example for you. Later today or tomorrow.


-Steve
shannah
 
Posts: 4457
Joined: Wed Dec 31, 1969 5:00 pm


Return to Xataface Users

Who is online

Users browsing this forum: No registered users and 16 guests

cron
Powered by Dataface
© 2005-2007 Steve Hannah All rights reserved