Thursday, November 27, 2014

ABINITIO








Silent features of Abinitio:

  • It is one of the important ETL(Extract ,Transform, Loading ) tool for analyzing the data for business purpose
  • Data processing tool from Ab Initio software corporation (http://www.abinitio.com)
  • Latin meaning of Abinitio is “from the beginning”
  • Ab Initio is a general purpose data processing platform for enterprise class, mission critical applications such as data warehousing, click stream processing, data movement, data transformation and analytics.
  • Designed to support largest and most complex business applications
  • Proven best of breed ETL solution.
  • Applications of Ab Initio:
    • ETL for data warehouses, data marts and operational data sources.
    • Parallel data cleansing and validation.
    • Parallel data transformation and filtering.
    • High performance analytics
    • Real time, parallel data capture.

ETL Introduction:

Extract, Transform, and Load (ETL) is a process that involves
  • Extracting data from outside sources,
  • Transforming it to fulfill business needs which include quality levels,
  • Loading it into the target, i.e. data warehouse.


Extract
 
The first part of an ETL process is to extract the data from the source systems. Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization / format. Common data source formats are relational databases and flat files, but may include non-relational database structures such as IMS or other data structures such as VSAM or ISAM. Extraction converts the data into a format for transformation processing.

Transform

The transform stage applies a series of rules or functions to the extracted data from the source to derive the data to be loaded to the end target. Some data sources will require very little or even no manipulation of data. In other cases, one or more of the following transformations types to meet the business and technical needs of the end target may be required:

  • Selecting only certain columns to load (or selecting null columns not to load)
  • Translating coded values (e.g., if the source system stores 1 for male and 2 for female, but the warehouse stores M for male and F for female)
  • Encoding free-form values (e.g., mapping "Male" to "1" and "Mr." to M)
  • Deriving a new calculated value (e.g., sale amount = qty * unit price)
  • Joining together data from multiple sources (e.g., lookup, merge, etc.)
  • Summarizing multiple rows of data (e.g., total sales for each store, and for each region)
  • Generating surrogate key values
  • Transposing or pivoting (turning multiple columns into multiple rows or vice versa)
  • Splitting a column into multiple columns (e.g., putting a comma-separated list specified as a string in one column as individual values in different columns)
  • Applying any form of simple or complex data validation; if failed, a full, partial or no rejection of the data, and thus no, partial or all the data is handed over to the next step, depending on the rule design and exception handling. Most of the above transformations itself might result in an exception, e.g. when a code-translation parses an unknown code in the extracted data.


Load

  • The load phase loads the data into the end target, usually being the data warehouse (DW). Depending on the requirements of the organization, this process ranges widely. Some data warehouses might overwrite existing information with cumulative, updated data, while other DW (or even other parts of the same DW) might add new data in a histories form, e.g. hourly. The timing and scope to replace or append are strategic design choices dependent on the time available and the business needs. Some systems maintain a history and audit trail of all changes to the data loaded in the DW.
  • As the load phase interacts with a database, the constraints defined in the database schema as well as in triggers activated upon data load apply (e.g. uniqueness, referential integrity, mandatory fields), which also contribute to the overall data quality performance of the ETL process.

ETL Advantages:
  • Remove mistake and corrects data
  • Documented measures of confidence in data
  • Capture the flow of transaction data
  • Adjust data from multiple sources to be used together
  • Structure data to be usable by BI tools
  • Enables subsequent business/analytical data processing


Ab Initio product comes with three suits

Graphical Development Environment (GDE)

GDE is a graphical application for developers which is used for designing and running AbInitio graphs. GDE lets Developer to create applications by dragging and dropping components onto a canvas configuring them with familiar, intuitive point and click operations, and connecting them into executable flowcharts.
These diagrams are architectural documents that developers and managers alike can understand and use, but they are not mere pictures: the co>operating use, executes these flowcharts directly. This means that there is a seamless and solid connection between the abstract picture of the application and the concrete reality of its execution.

CO>Operating System

The Co>Operating System is core software that unites a network of computing resources- CPUs, storage disks, programs, datasets-into a production-quality data processing system with scalable performance and mainframe reliability.
The Co>Operating System is layered on top of the native operating systems of a collection of computers. It provides a distributed model for process execution, file management, process monitoring, check-pointing, and debugging.
On a typical installation, the Co-operating system is installed on a Unix or Windows NT server while the GDE is installed on a Pentium PC.

Enterprise Metadata Environment (EME)

EME is an Ab-Initio repository and environment for storing and managing metadata. It provides capability to store both business and technical metadata. EME metadata can be accessed from the Ab Initio GDE, web browser or AbInitio CoOperating system command line. In short we can say EME is Storage Area.



Ab initio Architecture





Ab Initio S/w Versions, File Extensions & operating systems

Ab Initio S/w Versions

– Co>Operating System Version =>
– GDE Version

File ExtensionsFile

– .mp                            Stored Ab Initio graph or graph component
– .mpc                          Program or custom component
– .mdc                          Dataset or custom dataset component
– .dml                           Data Manipulation Language file or record definition
– .xfr                             Transform function file
–.dat                             Data file (either serial file or multifile)

Operating systems

•       Compaq Tru64 UNIX
•       Digital unix
•       Hewlett-Packard HP-UNIX
•       IBM AIX Unix
•       NCR MP-RAS
•       Red Hat Linux
•       IBM/Sequent DYNIX/ptx
•       Siemens Pyramid Reliant UNIX
•       Silicon Graphics IRIX
•       Sun Solaris
•       Windows NT and Windows 2000



Useful Links

1.Ab Initio Sandbox
2.Ab Initio Components
3.Ab initio Parallelism
4.Ab Initio Basic Graph Development     
5.Ab Initio Multifile