29 mayo 2010

beped: @create

probar @Create en el componente propertyConfigurator

hermes: Character encoding

30.1.4.5. Character encoding

Sets the character encoding of submitted form data.
This filter is not installed by default and requires an entry in components.xml to enable it:

<web:character-encoding-filter encoding="UTF-16"
override-client="true"
url-pattern="*.seam"/>


• encoding — The encoding to use.
• override-client — If this is set to true, the request encoding will be set to whatever is
specified by encoding no matter whether the request already specifies an encoding or not. If
set to false, the request encoding will only be set if the request doesn't already specify an
encoding. The default setting is false.

07 mayo 2010

java parse fixed-length files (2)

http://jsapar.tigris.org/

Mission

The goal of this project is to create a java library that contains a parser of flat files and csv files. The library should be simple to use and possible to extend.

Existing features

  • Support for flat files with fixed positions.
  • Support for CSV files.
  • The schema can be expressed with xml notation or created directly within the java code.
  • The parser can either produce a Document class, representing the content of the file, or you can choose to receive events for each line that has been successfully parsed.
  • Can handle huge files without loading everything into memory.
  • The output Document class contains a list of lines which contains a list of cells.
  • The Document class can be transformed into a Java object (via reflection) if the schema is carefully written.
  • It is also possible to produce java objects directly from the parser.
  • It is possible convert a list of java objects into a file according to a schema if the schema is carefully written.
  • The Document class can be built from a xml file (according to an internal xml schema).
  • The input and outputs are given by java.io.Reader and java.io.Writer which means that it is not necessarily files that are parsed or generated.
  • The file parsing schema contains information about how to parse each cell regarding data type and syntax.
  • Parsing errors can either be handled by exceptions thrown at first error or the errors can be collected during parsing to be able to deal with them later.
  • JUnit tests for most classes within the library.
  • Support for localisation.

Java Schema Parser

The javadoc within the package contains more comprehensive documentation regarding the classes mentioned below.

The JSaPar package is a java library that provides a parser for flat and CSV (Comma Separated Values) files. The concept is that a schema class denotes the way a file should be parsed or written. The schema class can be built by specifying a xml-document or it can be constructed programmatically by using java code. The output of the parser is usually a org.jsapar.Document object that contains a list of org.jsapar.Line objects which contains a list of org.jsapar.Cell objects.

Supported file formats:
  • Fixed width - Also refered to as flat file. Each cell is described only by its positions within the line. The type of the line is denoted by its position within the file.
  • Fixed width contol value - The same as Fixed width above except that each line type is denoted by a control value in the leading characters of each line.
  • CSV - (Comma Separated Values) Each cell is limited by a separator character (or characters). The type of the line is denoted by its position within the file.
  • CSV contol value - The same as CSV above except that each line type is denoted by a control value in the leading cell of each line.

Events for each line

For very large files there can be a problem to build the complete org.jsapar.Document in the memory before further processing. It may simply take up to much memory. In that case you may choose to get an event for each line that is parsed instead. You do that by registering a sub-class of org.jsapar.ParsingEventListener to the org.jsapar.input.Parser. That way you can process one line at a time, thus freeing memory as you go along.

Converter

If you are only interesting in converting a file of one format into another, you can use the org.jsapar.io.Converter where you specify the input and the output schema for the conversion. The converter uses the event mechanism under the hood, thus it reads, converts and writes one line at a time. This means it is very lean regarding memory usage.

Building java objects

Use the method org.jsapar.Parser.buildJava() in order to build java objects for each line in a file (or input). Note that in order to be able to use this feature, the schema have to be carefully written. For instance, the line type (name) of the line within the schema have to contain the complete class name of the java class to build for each line.

Converting java objects into a file

Use the class org.jsapar.input.JavaBuilder in order to convert java objects into a org.jsapar.Document, which can be used to produce the output file according to a schema.

Using xml as input

It is possilbe to build a org.jsapar.Document by using a xml document according to the XMLDocumentFormat.xsl (http://jsapar.tigris.org/XMLDocumentFormat/1.0). Use the class org.jsapar.input.XmlDocumentParser in order to convert a xml document into a org.jsapar.Document.

java parse fixed-length files

parsing a text file in java

import org.apache.commons.lang.RandomStringUtils;
import java.io.File;
import java.io.FileOutputStream;
import java.io.PrintStream;
import java.util.Random;


public class GenerateFile {

public static void main(String[] args) throws Exception {
File file = new File("hugefile.txt");
PrintStream ps = new PrintStream(new FileOutputStream(file));
Random random = new Random(10);

StringBuffer sb = new StringBuffer();
for(int i=0; i<700000; i++){
for(int j=0; j<10; j++){
sb.append(RandomStringUtils.random(3+random.nextInt(10)%10, true, true));
if(j<9){
sb.append("|");
}
}
ps.println(sb.toString());
sb = new StringBuffer();
}

ps.close();
}
}



import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang.time.StopWatch;
import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;



public class TokenizeFile {

public static void main(String[] args) throws Exception {
File file = new File("hugefile.txt");
BufferedReader br = new BufferedReader(new FileReader(file));

StopWatch stopWatch = new StopWatch();

stopWatch.start();

String line = null;
long totalLinesProcessed = 0l;

while((line=br.readLine())!=null){
totalLinesProcessed ++;
StringUtils.split(line, "|");
}


stopWatch.stop();

br.close();

System.out.println("Total lines processed = "+totalLinesProcessed+" Time taken = "+stopWatch.getTime() +" ms");
}
}
If you run the above two files; you would be processing a 55 MB file.

A sample run results:
Total lines processed = 700000 Time taken = 4457 ms (1.6 GHZ, 512 MB RAM)

05 mayo 2010

VTD-XML: The Future of XML Processing

VTD-XML: The Future of XML Processing

  • The world's most memory-efficient (1.3x~1.5x the size of an XML document) random-access XML parser.

  • The world's fastest XML parser: On a Core2 2.5Ghz Laptop, VTD-XML outperforms DOM parsers by 5x~12x, delivering 90~120 MB/sec per core sustained throughput.

  • The world's fastest XPath 1.0 implementation.

  • The world's most efficient XML indexer that seamlessly integrates with your XML applications.

  • The world's only incremental-update capable XML parser capable of cutting, pasting, splitting and assembling XML documents with max efficiency.

  • The world's only XML parser that allows you to use XPath to process 256 GB XML documents.

  • The XML technology that they don't want you to know about.

ver el ejemplo 17:

/**
* This is a demonstration of how to use the extended VTD parser
* to process large XML file. You need 64-bit JVM to take full
* advantage of extended VTD.
*/
import com.ximpleware.extended.*;
public class mem_mapped_read {
public static void main(String[] s) throws Exception{
VTDGenHuge vg = new VTDGenHuge();
if (vg.parseFile("test.xml",true,VTDGenHuge.MEM_MAPPED)){
VTDNavHuge vnh = vg.getNav();
AutoPilotHuge aph = new AutoPilotHuge(vnh);
aph.selectXPath("//*");
int i = 0;
while ((i=aph.evalXPath())!=-1){
System.out.println(" element name is "+vnh.toString(i));
}
}
}
}

XMLBeans Support for Built-In Schema Types

XMLBeans Support for Built-In Schema Types

Built-In Schema Type XMLBean Type Natural Java Type
xs:anyType XmlObject org.apache.xmlbeans.XmlObject
xs:anySimpleType XmlAnySimpleType String
xs:anyURI XmlAnyURI String
xs:base64Binary XmlBase64Binary byte[]
xs:boolean XmlBoolean boolean
xs:byte XmlByte byte
xs:date XmlDate java.util.Calendar
xs:dateTime XmlDateTime java.util.Calendar
xs:decimal XmlDecimal java.math.BigDecimal
xs:double XmlDouble double
xs:duration XmlDuration org.apache.xmlbeans.GDuration
xs:ENTITIES XmlENTITIES String
xs:ENTITY XmlENTITY String
xs:float XmlFloat float
xs:gDay

XmlGDay

java.util.Calendar
xs:gMonth XmlGMonth java.util.Calendar
xs:gMonthDay XmlGMonthDay java.util.Calendar
xs:gYear XmlGYear java.util.Calendar
xs:gYearMonth XmlGYearMonth java.util.Calendar
xs:hexBinary XmlHexBinary byte[]
xs:ID XmlID String
xs:IDREF XmlIDREF String
xs:IDREFS XmlIDREFS String
xs:int XmlInt int
xs:integer XmlInteger java.math.BigInteger
xs:language XmlLanguage String
xs:long XmlLong long
xs:Name XmlName String
xs:NCName XmlNCNAME String
xs:negativeInteger XmlNegativeInteger java.math.BigInteger
xs:NMTOKEN XmlNMTOKEN String
xs:NMTOKENS XmlNMTOKENS String
xs:nonNegativeInteger XmlNonNegativeInteger java.math.BigInteger
xs:nonPositiveInteger XmlNonPositiveInteger java.math.BigInteger
xs:normalizedString XmlNormalizedString String
xs:NOTATION XmlNOTATION Not supported
xs:positiveInteger XmlPositiveInteger java.math.BigInteger
xs:QName XmlQName javax.xml.namespace.QName
xs:short XmlShort short
xs:string XmlString String
xs:time XmlTime java.util.Calendar
xs:token XmlToken String
xs:unsignedByte XmlUnsignedByte short
xs:unsignedInt XmlUnsignedInt long
xs:unsignedLong XmlUnsignedLong java.math.BigInteger
xs:unsignedShort XmlUnsignedShort int

XMLBeans Tools

XMLBeans Tools


XMLBeans includes several command-line tools you might find handy as shortcuts for common tasks. You'll find these tools in the bin directory of the XMLBeans installation or source tree.

dumpxsb (XSB File Dumper)
Prints the contents of an XSB file in human-readable form.
inst2xsd (Instance to Schema Tool)
Generates XML schema from XML instance files.
scomp (Schema Compiler)
Compiles a schema into XMLBeans classes and metadata.
scopy (Schema Copier)
Copies the XML schema at the specified URL to the specified file.
sdownload (Schema Downloader)
Maintains "xsdownload.xml," an index of locally downloaded XSD files. URLs that are specified are downloaded if they aren't already cached. If no files or URLs are specified, all indexed files are relevant.
sfactor (Schema Factoring Tool)
Factors redundant definitions out of a set of schemas and uses imports instead.
svalidate (Streaming Instance Validator)
Validates a schema definition and instances within the schema.
validate (Instance Validator)
Validates an instance against a schema.
xpretty (XML Pretty Printer)
Pretty prints the specified XML to the console.
xsd2inst (Schema to Instance Tool)
Prints an XML instance from the specified global element using the specified schema.
xsdtree (Schema Type Hierarchy Printer)
Prints an inheritance hierarchy of the types defined in a schema.
xmlbean Ant task
Compiles a set of XSD and/or WSDL files into XMLBeans types.

Open Source XML Parsers in Java

Open Source XML Parsers in Java

Para la aplicación beped debemos encontrar el parser que mejor se adapte a nuestras necesidades.
Los que más parecen ajustarse son:

XMLBeans


At a high level XMLBeans is an XML-Java binding tool that uses XML Schema as a basis for generating Java classes that you can use to easily access XML instance data in a natural manner in your Java programs. It was designed to provide both easy access to XML information via convenient Java classes as well as complete access to the underlying XML, combining the best of low-level APIs like SAX and DOM that provide full access with the convenience of Java binding. There are several factors that set XMLBeans apart from any other XML-Java binding alternatives:

* XML Schema Compliance - XMLBeans has achieved extremely high schema compliance and is able to compile even the most complex schemas. This is critical when adopting an XML-Java binding framework since you may received schemas that are out of your control.
* Access to the full underlying XML Infoset - The XML Cursor API allows you to access a lower level DOM like access to the underying XML Infoset. You can get a "cursor" at any point while using the strongly typed generated XMLBeans and begin navigating the underlying XML instance.
* Access to the schema type system - The XMLBeans schema API allows you to walk through the schema type system giving you full access to a Java object representation of the XML Schema that was compiled to generate the XMLBeans classes.
* Speed - XMLBeans is optimized for performance at many levels. For example, XMLBeans lazily constructs objects from XML, so that you do not have the performance overhead of object creation when you only access portions of an XML document. Several Fortune 500 customers have adopted XMLBeans based on speed alone.


JAXB

The Java Architecture for XML Binding (JAXB) provides a fast and convenient way to bind between XML schemas and Java representations, making it easy for Java developers to incorporate XML data and processing functions in Java applications


JiBX: Binding XML to Java Code

JiBX is a framework for binding XML data to Java objects. It lets you work with data from XML documents using your own class structures. The JiBX framework handles all the details of converting your data to and from XML based on your instructions. JiBX is designed to perform the translation between internal data structures and XML with very high efficiency, but still allows you a high degree of control over the translation process.


VTD-XML 1.5 (xcuriosidad)

VTD-XML is the next generation XML parser that goes beyond DOM and SAX in terms of performance, memory and ease of use. To XML developers, VTD-XML is simple and just works! Other innovative features include XML indexing (due to inherent persistence of VTD) and incremental update. It is also the world's fastest XML processor: On an Athlon64 3400+ PC, VTD-XML significantly (1.5x~2x) outperforms SAX parsers with NULL content handler, delivering 50~60 MB/sec sustained throughput, without sacrificing random access. Its memory usage is typically between 1.3x~1.5x the size of the XML document, with 1 being the XML itself.


Zeus

Zeus is, in a nutshell, an open source Java-to-XML Data Binding tool. It provides a means of taking an arbitrary XML document and converting that document into a Java object representing the XML. That Java object can then be used and manipulated like any other Java object in the VM (virtual machine). Then, once the object has been modified and operated upon, Zeus can be used to convert the Java object back into an XML representation.

02 mayo 2010

HERMES: Taylor identity para LDAP

Ldap

Using a database identity store is great for development and small applications. However, for enterprise applications you will want to use Ldap.

The following shows how to configure the Taylor Identity Ldap implementation.