Pages: [1]
  Print  
Author Topic: [SOLVED] Process for concatenating files ?  (Read 988 times)
dara
Newbie
*
Posts: 29


« on: July 10, 2013, 10:26:03 AM »

Is there any process for concatenating files?
« Last Edit: July 22, 2013, 12:24:00 PM by Marius » Logged
Marco Boeck
Administrator
Hero Member
*****
Posts: 1015


WWW
« Reply #1 on: July 10, 2013, 11:53:38 AM »

Hi,

once you have imported your files via one of the various import operators or one of the import wizards (Files -> Import Data), you can concatenate example sets via the "Append" operator.

Regards,
Marco
Logged

dara
Newbie
*
Posts: 29


« Reply #2 on: July 10, 2013, 04:45:02 PM »

Thanks, I already imported the files from a database remotely + my local desktop.

I do not need  to concatenate Example Sets, I need to concatenate actually text files as in file-concatenate.

I tell you why:

1. We want to associate a series of files to each other e.g. HR documents for one employee
2. We want to perform text classification on the entire series of documents, not just one
3. So I thought a crude setup would be to concatenate all the files for one employee and treat it as a single file

D
Logged
Marco Boeck
Administrator
Hero Member
*****
Posts: 1015


WWW
« Reply #3 on: July 11, 2013, 10:12:48 AM »

Hi,

to read multiple files for text classification, you can use the "Process Documents from Files" operator. There should be plenty of help available in the forums because text mining questions are pretty common Wink

Regards,
Marco
Logged

mdc
Jr. Member
**
Posts: 60


« Reply #4 on: July 12, 2013, 02:24:41 AM »


Hi,

I quickly created a process to read files, concatenate the contents and write to another file. See if you can adapt this to your needs.

enjoy,
Matthew

Code:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="text:create_document" compatibility="5.3.000" expanded="true" height="60" name="Create Document" width="90" x="112" y="75">
        <parameter key="text" value="This is to initialize the content of Remember/Recall operators.&#10;&#10;"/>
      </operator>
      <operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember (2)" width="90" x="246" y="75">
        <parameter key="name" value="doc"/>
        <parameter key="io_object" value="Document"/>
      </operator>
      <operator activated="true" class="loop_files" compatibility="5.3.005" expanded="true" height="60" name="Loop Files" width="90" x="380" y="165">
        <parameter key="directory" value="/Users/mdc/Texts"/>
        <process expanded="true">
          <operator activated="true" class="text:read_document" compatibility="5.3.000" expanded="true" height="60" name="Read Document (2)" width="90" x="112" y="120">
            <parameter key="file" value="%{file_path}"/>
          </operator>
          <operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall" width="90" x="112" y="30">
            <parameter key="name" value="doc"/>
            <parameter key="io_object" value="Document"/>
          </operator>
          <operator activated="true" class="text:combine_documents" compatibility="5.3.000" expanded="true" height="94" name="Combine Documents (2)" width="90" x="313" y="120"/>
          <operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember" width="90" x="447" y="120">
            <parameter key="name" value="doc"/>
            <parameter key="io_object" value="Document"/>
          </operator>
          <connect from_op="Read Document (2)" from_port="output" to_op="Combine Documents (2)" to_port="documents 2"/>
          <connect from_op="Recall" from_port="result" to_op="Combine Documents (2)" to_port="documents 1"/>
          <connect from_op="Combine Documents (2)" from_port="document" to_op="Remember" to_port="store"/>
          <portSpacing port="source_file object" spacing="0"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall (2)" width="90" x="514" y="75">
        <parameter key="name" value="doc"/>
        <parameter key="io_object" value="Document"/>
      </operator>
      <operator activated="true" class="text:write_document" compatibility="5.3.000" expanded="true" height="76" name="Write Document" width="90" x="648" y="75">
        <parameter key="file" value="/Users/matthewgarong/concatenated_text.txt"/>
      </operator>
      <connect from_op="Create Document" from_port="output" to_op="Remember (2)" to_port="store"/>
      <connect from_op="Recall (2)" from_port="result" to_op="Write Document" to_port="document"/>
      <connect from_op="Write Document" from_port="document" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Logged
dara
Newbie
*
Posts: 29


« Reply #5 on: July 12, 2013, 09:37:49 AM »

Thanx Matthew

It works! Impressed.

Could you kindly tell me how to make this a separate process by itself, like an IO box to use in other processes? I am not sure how to do this in general i.e. making my own processes from others

Dara
Logged
mdc
Jr. Member
**
Posts: 60


« Reply #6 on: July 12, 2013, 04:12:43 PM »



To make it a separate process - Save and call from  your process using 'Execute Process' operator. I have not tried this though.
You can also add this to your process - just copy and paste to your process (at top level or inside a 'Subprocess' operator. Do  this in the Process window, not in XML.

Matthew
Logged
dara
Newbie
*
Posts: 29


« Reply #7 on: July 12, 2013, 11:25:02 PM »

Thanx mdc

Got it to work, really appreciate everyone's help
D
Logged
Pages: [1]
  Print  
 
Jump to: