Pages

Pages

How to write an import filter to import records into a table or relationship

It is common to need to import records en masse into a database. This is what import filters are for. (Since 0.7)

Consider the following scenarios:

  1. You have a publications database and your users want to be able to import records from a bibtex file.
  2. You have a contacts database and your users want to be able to import their excel spreadsheet of contacts into the database.
  3. You want to allow your users to be able to upload a list of names of people, and have the database populated with these names and also associated information about these names retrieved via LDAP.

There are countless scenarios where you would want to import records into your database - and most of them are more complicated than simply loading a CSV file into a table.  You usually want to be able to validate and reformat the data to fit your database.

Xataface allows you to write import filters to enable users to import data en_masse in predefined formats.

Writing an import filter: Overview

Writing an import filter is really quite simple.  All you have to do is define a method in your table's delegate class that accepts a string blob as a parameter and returns an array of Dataface_Record objects.  That's it.  You are expected to parse the string and place the resulting data into Dataface_Record objects.  Xataface will take care of the rest.  It will provide a nice upload form for your users, give the users an opportunity to approve the information before it is imported, and finally insert the data into the database.

Example 1: Importing a CSV File

We have a table named 'People' with 3 columns Name, PhoneNumber, and Email.  We want to be able to import a CSV (Comma separated Value) file with multiple names into the table at once.  We begin, by adding a method to our People table's delegate class as follows:

function __import__csv(&$data, $defaultValues=array()){
// build an array of Dataface_Record objects that are to be inserted based
// on the CSV file data.
$records = array();

// first split the CSV file into an array of rows.
$rows = explode("\n", $data);
foreach ( $rows as $row ){
// We iterate through the rows and parse the name, phone number, and email
// addresses to that they can be stored in a Dataface_Record object.
list($name, $phone, $email) = explode(',', $row);
$record = new Dataface_Record('People', array());

// We insert the default values for the record.
$record->setValues($defaultValues);

// Now we add the values from the CSV file.
$record->setValues(
array(
'Name'=>$name,
'PhoneNumber'=>$phone,
'Email'=>$email
)
);

// Now add the record to the output array.
$records[] = $record;
}

// Now we return the array of records to be imported.
return $records;
}

So what does this method do for us?

When a user clicks on the Import Records button they will now see a CSV option in the Import Filters select list.  This is because we named our method __import__csv.  If we had called it __import__comma_separated_value, then the user would see Comma Separated Value in the import filters select list.  The user can either copy and paste a CSV file into the text area, or he can upload a CSV file using the upload widget on the form.  The contents of this file will be passed to our __import__csv method as the $data parameter.

So what our __import__csv method does is parse the user's CSV file into an array of Dataface_Record objects and returns the array for Xataface to handle the rest of the import.  Xataface will create a temporary table with the import data and ask the user to verify that the import is correct.  Then it will copy the data into the live table, if the user decides to proceed with the import.

What is the defaultValues parameter for?

The Xataface import form allows the user to specify some default values that are to be placed in every record that is imported.  They are passed to your import filter as an associative array, where the keys are the field names and the values are the associated values.  All you need to know is that, for each Dataface_Record object that you create, you should call:

$record->setValues($defaultValues)

to set the default values before you add data that is parsed from the input data.

See example 2 (below) for some screen shots of the import form within the Xataface application.

Example 2 (Advanced): Importing an Excel Spreadsheet into a Book Library Database

Our goal:

The church librarian keeps track of the the books using an excel spreadsheet that he designed himself.  It works well for his purposes, but now that we have a web library, we need a way for him to merge the two data-sources easily.  He is not all that computer savvy, so EASY is the key word.

Our Strategy:

We will create an import filter in the 'books' table delegate class to take raw data from an Excel XML spreadsheet and build an array of Dataface_Record objects with the data.  To parse the excel spreadsheet, we will use an PHP Library for parsing Excel XML files.

Details

Before we begin, here is just a quick overview of what the database looks like:

  • The database has 4 tables:books, books_categories, books_media, and users.
    books : The main table that stores all the books
    books_categories: Contains categories that are used in valuelists for the books table.
    books_media : Contains types of media (e.g. books, vhs, etc..) that are used in valuelists of the books table.
    users : Stores the users of the system.

    So this is pretty much a 1 table application apart from the supporting tables for valuelists.
  • The books table looks like:

        book_id    int(11)            No        auto_increment                                 

        title    varchar(128)    utf8_unicode_ci        No                                           

        author_or_editor    varchar(64)    utf8_unicode_ci        Yes    NULL                                       

        isbn    varchar(32)    utf8_unicode_ci        Yes    NULL                                       

        publisher    varchar(64)    utf8_unicode_ci        Yes    NULL                                       

        copyright_year    year(4)            Yes    NULL                                     

        categories    varchar(32)    utf8_unicode_ci        Yes    NULL                                       

        audience    enum('General', 'Children', 'Teen', 'Youth')    utf8_unicode_ci        No    General                                     

        media    varchar(32)    utf8_unicode_ci        Yes    NULL                                       

        reference_no    varchar(32)    utf8_unicode_ci        Yes    NULL                                       

        notes    text    utf8_unicode_ci        Yes    NULL                                    

        borrower_id    int(11)            Yes    NULL                                     

        due_date    date            Yes    NULL                                     

        date_created    datetime            Yes    NULL                                     

        date_modified    datetime            Yes    NULL                                     

        created_by    varchar(32)    utf8_unicode_ci        Yes    NULL                                       

        modified_by    varchar(32)    utf8_unicode_ci        Yes    NULL 

  • Notice that the categories and media fields are varchar() fields.  This is so that they can store multiple category ids and media ids - one per line.  They use checkbox widgets.

Step 1: Installing the excelxmlparser class

We install the Excel xml parser library somewhere in our application.  I have placed in a directory called 'include' in my application directory.

Step 2: Writing the import filter

Forgive me for jumping straight into this, but I'm just going to paste the completed filter method source so you can see it:
/**
* An import filter to import excel spreadsheets. This will add categories and users as necessary.
*
* @param string &$data The raw XML data to be imported .. needs to be parsed.
* @param $defaultValues Associative array of default values that should be placed
* in all imported records. This allows users to specify default values
* to be placed into certain fields when the import the spreadsheet.
* @returns array(Dataface_Record)
*/
function __import__excel_spreadsheet(&$data, $defaultValues=array()){
$records = array(); // the array that will hold the records to be imported.

// First let's import the excel parser and parse the data into a
// data structure so we can work with it.
import('include/excelxmlparser/ExcelXMLParser.php');
$tempdir = DATAFACE_SITE_PATH.'/templates_c';
$tmpnam = tempnam($tempdir, 'dataface-librarian-excell-import');
$handle = fopen($tmpnam,'w');
fwrite($handle,$data);
fclose($handle);

// Now that we have parsed the data, we can work with it

$ExcelXMLParser = new ExcelXMLParser();
$app =& Dataface_Application::getInstance();


$result = $ExcelXMLParser->OpenWorkbook($tmpnam,array());

if ( ExcelXMLError::isError($result) ) return array();
// If there was a parsing error, we'll just return an empty array
// meaning no records could be imported.
$worksheets = array('General List'/*,'Children List','Youth List','Teen List'*/);
// The names of the worksheets in this excel file that should be imported.

$ws =& $ExcelXMLParser->Workbook->getFirstWorksheet();
while ($ws){
// We go through each worksheet in the spreadsheet.
// In our case we will set the audience field based on the name of
// the worksheet.
$name = $ws->getName();
if ( stripos($name,'teen') !== false ) $audience='Teen';
else if (stripos($name,'youth') !== false) $audience='Youth';
else if (stripos($name,'children') !== false) $audience = 'Children';
else $audience = 'General';

$ws->Table->getFirstRow(); $ws->Table->getNextRow(); $ws->Table->getNextRow();
// Data doesn't start until the 4th row
while ( $row = $ws->Table->getNextRow() ){

//Iterate through all the rows

// Now for each row of the spreadsheet we will create a new record.
$record = new Dataface_Record('books', array());

// Start out with the default values and build from there.
$record->setValues($defaultValues);

// Since I'm using chaining - looks like this filter requires
// PHP 5 ... :)

if ( !$row->getCell(0)->getValue() ) continue;
// This is a check to make sure that there is a value in the first cell..
// if not, we skip this row.

// Now we set the values of the record with the values in this row of the
// spreadsheet
$record->setValues(
array(
'title'=>$row->getCell(0)->getValue(),
'author_or_editor'=>$row->getCell(1)->getValue().', '.$row->getCell(2)->getValue(),
'publisher'=>$row->getCell(3)->getValue(),
'copyright_year'=>$row->getCell(4)->getValue(),

'media'=>explode(',',$row->getCell(7)->getValue()),
'reference_no'=>$row->getCell(8)->getValue(),
'notes'=>$row->getCell(9)->getValue(),
'audience'=>$audience
)
);

// The spreadsheet does its categories a little differently than our
// database. Our database keeps a list of category ids in a single
// varchar field - using a checkbox widget. Whereas the spreadsheet
// has 2 columns: major category and subcategory. Meaning that each
// will only have 2 categories. This is not a problem.. We can parse
// the spreadsheet to work with our format.
$majorCategory = trim($row->getCell(5)->getValue());
$subCategory = trim($row->getCell(6)->getValue());

// Since our categories field is a repeating field, it will accept an array
// of values on input and know what to do when them when it comes time to
// save them.
$categories = array();
if ( $majorCategory ){
// If there is a major category listed, then we see if it exists in the
// books_categories table.
$catrec =df_get_record('books_categories', array('category_name'=>$majorCategory));

if (!$catrec ){
// This category doesn't exist yet, so we will add it.

$catrec = new Dataface_Record('books_categories', array());
$catrec->setValue('category_name',$majorCategory);
$catrec->save();
}

// Lets add the category id.
$categories[] = $catrec->val('category_id');

}


// Now we do the same thing for subcategories.
if ( $subCategory ){
$catrec =df_get_record('books_categories', array('category_name'=>$subCategory));
if (!$catrec ){
$catrec = new Dataface_Record('books_categories', array());
$catrec->setValue('category_name',$subCategory);
$catrec->save();
}
$categories[] = $catrec->val('category_id');

}

// Now that we have built our categories array, we can add it
// to oru record.
$record->setValue('categories',$categories);

// We need to do some work for the media field. The spreadsheet lists
// multiple media selections as comma-delimited lists.. we need them
// to be an array because the media field is a repeating field.
$media = explode(',',$row->getCell(7)->getValue());
$media_ids = array();
foreach ($media as $medium){
$medrec = df_get_record('books_media', array('medium_name'=>$medium));
if ( !$medrec ){
$medrec = new Dataface_Record('books_media', array());
$medrec->setValue('medium_name',$medium);
$medrec->save();
}
$media_ids[] = $medrec->val('medium_id');
}
$record->setValue('media', $media_ids);
$borrowerName = trim($row->getCell(10)->getValue());
$telephone = trim($row->getCell(11)->getValue());
$email = trim($row->getCell(12)->getValue());
$dueDate = trim($row->getCell(13)->getValue());

// If this book has been borrowed, we will try to find out
// some more information from the users table on the borrower.
if ( $borrowerName ){

if ( $email ){
$brec = df_get_record('users', array('email'=>$email));
} else if ( $telephone ){
$stripped_phone = preg_replace('/[^0-9]/','',$telephone);
$brec = df_get_record('users', array('phone'=>$stripped_phone));
} else {
$borrowerNames = explode(' ',$borrowerName);
$lastName = $borrowerNames[count($borrowerNames)-1];
$firstName = $borrowerNames[0];
$brec = df_get_record('users', array('first_name'=>$firstName,'last_name'=>$lastName));

} // if $email

if ( !$brec ){
$brec = new Dataface_Record('users', array());
$names = explode(' ',$borrowername);
$lastName = array_pop($names);
$firstName = implode(' ',$names);
$brec->setValues(
array('first_name'=>$firstName,
'last_name'=>$lastName,
'phone'=>$telephone,
'email'=>$email,
'password'=>str_replace('@','*',$email)
)
);
$brec->save();

} // if !$brec
$record->setValue('borrower_id', $brec->val('userid'));
$record->setValue('due_date', $dueDate);
} // if $borrowerName


// Now that we have populated our record, we will add it to the
// array of records to be inserted.
$records[] =&$record;

unset($record); // necessary to prevent PHP from writing over the last record



} // while $row = ...

unset($ws);
$ws =& $ExcelXMLParser->Workbook->getNextWorksheet();
} // foreach $worksheets ...


// Return our array of records and let Xataface handle the rest.
return $records;

}
So what is important here:
  • Notice the name of the method: __import__excel_spreadsheet().  The __import__ prefix indicates that this is defining an import filter.  If you defined a method named __import__foo, then Xataface will recognize this as an import filter named 'foo'.

Step 3: Try it out

So we open our application:
main_page.png

Notice the import records button in the upper right.  Click on this to get the import form:
import_form_1.png

If you click on the "Import File Format" select list, you'll see some import formats, including "excel spreadsheet" which is the filter that we have just created:

import_data_format.png

(Note that the books delegate class also has a method called __import__titles_list which I did not mention in the above section... that is why there is also an option for "Titles List" in the format list).

If you select "Excel Spreadsheet from the list" and upload our spreadsheet using the upload widget provided, you will get a confirmation screen:

confirm.png

If you want to see the data that will be imported, you can click on "Preview Import Data":

preview.png

If everything looks good, you can click the "Looks Good Proceed with import" button.  Xataface takes care of the rest!

Discussion

The above example was quite complex as it required extra libraries and a lot of data manipulation to make the data fit.  However, it is possible to write import filters that require only 3 or 4 lines of code. 

Import filters are very useful to allow your users to work with the data in different ways and allow them to be able to import that data into the application without having to spend hours of tedious data entry.


Powered by Xataface
(c) 2005-2024 All rights reserved