Rechercher une page de manuel
MARC::Charset.3pm
Langue: en
Version: 2008-05-27 (ubuntu - 07/07/09)
Section: 3 (Bibliothèques de fonctions)
Sommaire
NAME
MARC::Charset - convert MARC-8 encoded strings to UTF-8SYNOPSIS
# import the marc8_to_utf8 function use MARC::Charset 'marc8_to_utf8'; # prepare STDOUT for utf8 binmode(STDOUT, 'utf8'); # print out some marc8 as utf8 print marc8_to_utf8($marc8_string);
DESCRIPTION
MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records.http://www.loc.gov/marc/specifications/spechome.html
EXPORTS
ignore_errors()
Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helepfuli if you have records that contain both MARC8 and UNICODE characters.my $ignore = MARC::Charset->ignore_errors(); MARC::Charset->ignore_errors(1); # ignore errors MARC::Charset->ignore_errors(0); # DO NOT ignore errors
assume_unicode()
Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting. This is helepfuli if you have records that contain both MARC8 and UNICODE characters.my $setting = MARC::Charset->assume_unicode(); MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8) MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode
assume_encoding()
Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and other characters.my $setting = MARC::Charset->assume_encoding(); MARC::Charset->assume_encoding('cp850'); # assume characters are cp850 MARC::Charset->assume_encoding(''); # DO NOT assume any encoding
marc8_to_utf8()
Converts a MARC-8 encoded string to UTF-8.my $utf8 = marc8_to_utf8($marc8);
If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value:
my $utf8 = marc8_to_utf8($marc8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8);
utf8_to_marc8()
Will attempt to translate utf8 into marc8.my $marc8 = utf8_to_marc8($utf8);
If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:
my $marc8 = utf8_to_marc8($utf8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8);
DEFAULT CHARACTER SETS
If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code:use MARC::Charset::Constants qw(:all); $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC; $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;
SEE ALSO
- •
- MARC::Charset::Constant
- •
- MARC::Charset::Table
- •
- MARC::Charset::Code
- •
- MARC::Charset::Compiler
- •
- MARC::Record
- •
- MARC::XML
AUTHOR
Ed Summers (ehs@pobox.com)Contenus ©2006-2024 Benjamin Poulain
Design ©2006-2024 Maxime Vantorre