just my notes...: mayo 2010

29 mayo 2010

beped: @create

probar @Create en el componente propertyConfigurator

hermes: Character encoding

30.1.4.5. Character encoding

Sets the character encoding of submitted form data.
This filter is not installed by default and requires an entry in components.xml to enable it:

<web:character-encoding-filter encoding="UTF-16"
override-client="true"
url-pattern="*.seam"/>

• encoding — The encoding to use.
• override-client — If this is set to true, the request encoding will be set to whatever is
specified by encoding no matter whether the request already specifies an encoding or not. If
set to false, the request encoding will only be set if the request doesn't already specify an
encoding. The default setting is false.

08 mayo 2010

Configuring a Seam EJB project for development with Maven and JBoss Tools

07 mayo 2010

java parse fixed-length files (2)

http://jsapar.tigris.org/

Mission

The goal of this project is to create a java library that contains a parser of flat files and csv files. The library should be simple to use and possible to extend.

Existing features

Support for flat files with fixed positions.
Support for CSV files.
The schema can be expressed with xml notation or created directly within the java code.
The parser can either produce a Document class, representing the content of the file, or you can choose to receive events for each line that has been successfully parsed.
Can handle huge files without loading everything into memory.
The output Document class contains a list of lines which contains a list of cells.
The Document class can be transformed into a Java object (via reflection) if the schema is carefully written.
It is also possible to produce java objects directly from the parser.
It is possible convert a list of java objects into a file according to a schema if the schema is carefully written.
The Document class can be built from a xml file (according to an internal xml schema).
The input and outputs are given by java.io.Reader and java.io.Writer which means that it is not necessarily files that are parsed or generated.
The file parsing schema contains information about how to parse each cell regarding data type and syntax.
Parsing errors can either be handled by exceptions thrown at first error or the errors can be collected during parsing to be able to deal with them later.
JUnit tests for most classes within the library.
Support for localisation.

Java Schema Parser

The javadoc within the package contains more comprehensive documentation regarding the classes mentioned below.

The JSaPar package is a java library that provides a parser for flat and CSV (Comma Separated Values) files. The concept is that a schema class denotes the way a file should be parsed or written. The schema class can be built by specifying a xml-document or it can be constructed programmatically by using java code. The output of the parser is usually a org.jsapar.Document object that contains a list of org.jsapar.Line objects which contains a list of org.jsapar.Cell objects.

Supported file formats:

Fixed width - Also refered to as flat file. Each cell is described only by its positions within the line. The type of the line is denoted by its position within the file.
Fixed width contol value - The same as Fixed width above except that each line type is denoted by a control value in the leading characters of each line.
CSV - (Comma Separated Values) Each cell is limited by a separator character (or characters). The type of the line is denoted by its position within the file.
CSV contol value - The same as CSV above except that each line type is denoted by a control value in the leading cell of each line.

Events for each line

For very large files there can be a problem to build the complete org.jsapar.Document in the memory before further processing. It may simply take up to much memory. In that case you may choose to get an event for each line that is parsed instead. You do that by registering a sub-class of org.jsapar.ParsingEventListener to the org.jsapar.input.Parser. That way you can process one line at a time, thus freeing memory as you go along.

Converter

If you are only interesting in converting a file of one format into another, you can use the org.jsapar.io.Converter where you specify the input and the output schema for the conversion. The converter uses the event mechanism under the hood, thus it reads, converts and writes one line at a time. This means it is very lean regarding memory usage.

Building java objects

Use the method org.jsapar.Parser.buildJava() in order to build java objects for each line in a file (or input). Note that in order to be able to use this feature, the schema have to be carefully written. For instance, the line type (name) of the line within the schema have to contain the complete class name of the java class to build for each line.

Converting java objects into a file

Use the class org.jsapar.input.JavaBuilder in order to convert java objects into a org.jsapar.Document, which can be used to produce the output file according to a schema.

Using xml as input

It is possilbe to build a org.jsapar.Document by using a xml document according to the XMLDocumentFormat.xsl (http://jsapar.tigris.org/XMLDocumentFormat/1.0). Use the class org.jsapar.input.XmlDocumentParser in order to convert a xml document into a org.jsapar.Document.

java parse fixed-length files

parsing a text file in java

import org.apache.commons.lang.RandomStringUtils;
import java.io.File;
import java.io.FileOutputStream;
import java.io.PrintStream;
import java.util.Random;

public class GenerateFile {

public static void main(String[] args) throws Exception {
File file = new File("hugefile.txt");
PrintStream ps = new PrintStream(new FileOutputStream(file));
Random random = new Random(10);

StringBuffer sb = new StringBuffer();
for(int i=0; i<700000; i++){
for(int j=0; j<10; j++){
sb.append(RandomStringUtils.random(3+random.nextInt(10)%10, true, true));
if(j<9){
sb.append("|");
}
}
ps.println(sb.toString());
sb = new StringBuffer();
}

ps.close();
}
}

import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang.time.StopWatch;
import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;

public class TokenizeFile {

public static void main(String[] args) throws Exception {
File file = new File("hugefile.txt");
BufferedReader br = new BufferedReader(new FileReader(file));

StopWatch stopWatch = new StopWatch();

stopWatch.start();

String line = null;
long totalLinesProcessed = 0l;

while((line=br.readLine())!=null){
totalLinesProcessed ++;
StringUtils.split(line, "|");
}

stopWatch.stop();

br.close();

System.out.println("Total lines processed = "+totalLinesProcessed+" Time taken = "+stopWatch.getTime() +" ms");
}
}

If you run the above two files; you would be processing a 55 MB file.

A sample run results:
Total lines processed = 700000 Time taken = 4457 ms (1.6 GHZ, 512 MB RAM)

05 mayo 2010

VTD-XML: The Future of XML Processing

VTD-XML: The Future of XML Processing

The world's most memory-efficient (1.3x~1.5x the size of an XML document) random-access XML parser.
The world's fastest XML parser: On a Core2 2.5Ghz Laptop, VTD-XML outperforms DOM parsers by 5x~12x, delivering 90~120 MB/sec per core sustained throughput.
The world's fastest XPath 1.0 implementation.
The world's most efficient XML indexer that seamlessly integrates with your XML applications.
The world's only incremental-update capable XML parser capable of cutting, pasting, splitting and assembling XML documents with max efficiency.
The world's only XML parser that allows you to use XPath to process 256 GB XML documents.
The XML technology that they don't want you to know about.

ver el ejemplo 17:

/**
* This is a demonstration of how to use the extended VTD parser
* to process large XML file. You need 64-bit JVM to take full
* advantage of extended VTD.
*/
import com.ximpleware.extended.*;
public class mem_mapped_read {
public static void main(String[] s) throws Exception{
VTDGenHuge vg = new VTDGenHuge();
if (vg.parseFile("test.xml",true,VTDGenHuge.MEM_MAPPED)){
VTDNavHuge vnh = vg.getNav();
AutoPilotHuge aph = new AutoPilotHuge(vnh);
aph.selectXPath("//*");
int i = 0;
while ((i=aph.evalXPath())!=-1){
System.out.println(" element name is "+vnh.toString(i));
}
}
}
}

XMLBeans Support for Built-In Schema Types

Built-In Schema Type	XMLBean Type	Natural Java Type
xs:anyType	XmlObject	org.apache.xmlbeans.XmlObject
xs:anySimpleType	XmlAnySimpleType	String
xs:anyURI	XmlAnyURI	String
xs:base64Binary	XmlBase64Binary	byte[]
xs:boolean	XmlBoolean	boolean
xs:byte	XmlByte	byte
xs:date	XmlDate	java.util.Calendar
xs:dateTime	XmlDateTime	java.util.Calendar
xs:decimal	XmlDecimal	java.math.BigDecimal
xs:double	XmlDouble	double
xs:duration	XmlDuration	org.apache.xmlbeans.GDuration
xs:ENTITIES	XmlENTITIES	String
xs:ENTITY	XmlENTITY	String
xs:float	XmlFloat	float
xs:gDay	XmlGDay	java.util.Calendar
xs:gMonth	XmlGMonth	java.util.Calendar
xs:gMonthDay	XmlGMonthDay	java.util.Calendar
xs:gYear	XmlGYear	java.util.Calendar
xs:gYearMonth	XmlGYearMonth	java.util.Calendar
xs:hexBinary	XmlHexBinary	byte[]
xs:ID	XmlID	String
xs:IDREF	XmlIDREF	String
xs:IDREFS	XmlIDREFS	String
xs:int	XmlInt	int
xs:integer	XmlInteger	java.math.BigInteger
xs:language	XmlLanguage	String
xs:long	XmlLong	long
xs:Name	XmlName	String
xs:NCName	XmlNCNAME	String
xs:negativeInteger	XmlNegativeInteger	java.math.BigInteger
xs:NMTOKEN	XmlNMTOKEN	String
xs:NMTOKENS	XmlNMTOKENS	String
xs:nonNegativeInteger	XmlNonNegativeInteger	java.math.BigInteger
xs:nonPositiveInteger	XmlNonPositiveInteger	java.math.BigInteger
xs:normalizedString	XmlNormalizedString	String
xs:NOTATION	XmlNOTATION	Not supported
xs:positiveInteger	XmlPositiveInteger	java.math.BigInteger
xs:QName	XmlQName	javax.xml.namespace.QName
xs:short	XmlShort	short
xs:string	XmlString	String
xs:time	XmlTime	java.util.Calendar
xs:token	XmlToken	String
xs:unsignedByte	XmlUnsignedByte	short
xs:unsignedInt	XmlUnsignedInt	long
xs:unsignedLong	XmlUnsignedLong	java.math.BigInteger
xs:unsignedShort	XmlUnsignedShort	int

XMLBeans Tools

XMLBeans includes several command-line tools you might find handy as shortcuts for common tasks. You'll find these tools in the bin directory of the XMLBeans installation or source tree.

dumpxsb (XSB File Dumper): Prints the contents of an XSB file in human-readable form.

inst2xsd (Instance to Schema Tool): Generates XML schema from XML instance files.

scomp (Schema Compiler): Compiles a schema into XMLBeans classes and metadata.

scopy (Schema Copier): Copies the XML schema at the specified URL to the specified file.

sdownload (Schema Downloader): Maintains "xsdownload.xml," an index of locally downloaded XSD files. URLs that are specified are downloaded if they aren't already cached. If no files or URLs are specified, all indexed files are relevant.

sfactor (Schema Factoring Tool): Factors redundant definitions out of a set of schemas and uses imports instead.

svalidate (Streaming Instance Validator): Validates a schema definition and instances within the schema.

validate (Instance Validator): Validates an instance against a schema.

xpretty (XML Pretty Printer): Pretty prints the specified XML to the console.

xsd2inst (Schema to Instance Tool): Prints an XML instance from the specified global element using the specified schema.

xsdtree (Schema Type Hierarchy Printer): Prints an inheritance hierarchy of the types defined in a schema.

xmlbean Ant task: Compiles a set of XSD and/or WSDL files into XMLBeans types.

Open Source XML Parsers in Java

Para la aplicación beped debemos encontrar el parser que mejor se adapte a nuestras necesidades.
Los que más parecen ajustarse son:

XMLBeans

At a high level XMLBeans is an XML-Java binding tool that uses XML Schema as a basis for generating Java classes that you can use to easily access XML instance data in a natural manner in your Java programs. It was designed to provide both easy access to XML information via convenient Java classes as well as complete access to the underlying XML, combining the best of low-level APIs like SAX and DOM that provide full access with the convenience of Java binding. There are several factors that set XMLBeans apart from any other XML-Java binding alternatives:

* XML Schema Compliance - XMLBeans has achieved extremely high schema compliance and is able to compile even the most complex schemas. This is critical when adopting an XML-Java binding framework since you may received schemas that are out of your control.
* Access to the full underlying XML Infoset - The XML Cursor API allows you to access a lower level DOM like access to the underying XML Infoset. You can get a "cursor" at any point while using the strongly typed generated XMLBeans and begin navigating the underlying XML instance.
* Access to the schema type system - The XMLBeans schema API allows you to walk through the schema type system giving you full access to a Java object representation of the XML Schema that was compiled to generate the XMLBeans classes.
* Speed - XMLBeans is optimized for performance at many levels. For example, XMLBeans lazily constructs objects from XML, so that you do not have the performance overhead of object creation when you only access portions of an XML document. Several Fortune 500 customers have adopted XMLBeans based on speed alone.

JAXB

The Java Architecture for XML Binding (JAXB) provides a fast and convenient way to bind between XML schemas and Java representations, making it easy for Java developers to incorporate XML data and processing functions in Java applications

JiBX: Binding XML to Java Code

JiBX is a framework for binding XML data to Java objects. It lets you work with data from XML documents using your own class structures. The JiBX framework handles all the details of converting your data to and from XML based on your instructions. JiBX is designed to perform the translation between internal data structures and XML with very high efficiency, but still allows you a high degree of control over the translation process.

VTD-XML 1.5 (xcuriosidad)

VTD-XML is the next generation XML parser that goes beyond DOM and SAX in terms of performance, memory and ease of use. To XML developers, VTD-XML is simple and just works! Other innovative features include XML indexing (due to inherent persistence of VTD) and incremental update. It is also the world's fastest XML processor: On an Athlon64 3400+ PC, VTD-XML significantly (1.5x~2x) outperforms SAX parsers with NULL content handler, delivering 50~60 MB/sec sustained throughput, without sacrificing random access. Its memory usage is typically between 1.3x~1.5x the size of the XML document, with 1 being the XML itself.

Zeus

Zeus is, in a nutshell, an open source Java-to-XML Data Binding tool. It provides a means of taking an arbitrary XML document and converting that document into a Java object representing the XML. That Java object can then be used and manipulated like any other Java object in the VM (virtual machine). Then, once the object has been modified and operated upon, Zeus can be used to convert the Java object back into an XML representation.

02 mayo 2010

HERMES: Taylor identity para LDAP

Ldap

Using a database identity store is great for development and small applications. However, for enterprise applications you will want to use Ldap.

The following shows how to configure the Taylor Identity Ldap implementation.