About Unicode, ASCII, and ANSI encoding

Home
Questions

All

ASP.NET
Caching, Client Side Scripting (Ajax / JavaScript), Configuration, Controls, Enhancements, General, HTML / CSS, HTTP, Security, State Management, Web Services

C#
Coding Horror, General, Language Enhancements, OOPs (Object-Oriented Programming), Operators, Windows Services

.NET Framework
CTS (Common Type System), General, Reflection, WCF

Patterns & Practices
Architectural Patterns, Design Patterns, Development Methodologies

SQL Server
DBA, General, PL/SQL
Company FAQs...
Accenture, Broadridge, CGI, CII, CTS (Cognizant Technology Solutions), Dell, Deloitte, iGATE, Inooga Solutions, K.V.S.Infotech, Kenexa, LiquidHub, Mahindra Satyam, MCP/MCTS, Microsoft, MindTree, Olivegreen, ProKarma, RealPage, SemanticSpace Technologies, TCS (Tata Consultancy Services), ValueLabs, Vencel Soft, Wipro
Articles

All

ASP.NET
Configuration, General, State Management, Web Services

C#
General, Operators, Windows Services

.NET Framework
General

Patterns & Practices
Design Patterns, Development Methodologies

SQL Server
PL/SQL
Videos

All

ASP.NET
ASP.NET 3.5 LINQ TO SQL Video Series, ASP.NET MVC3 Video Series, How Do I? ASP.NET AJAX Video Series, Introduction to Cloud Computing Video Series

C#
Aspects of the TPL Video Series, MVVM in WPF and Silverlight Video Series, Visual Studio 2008, Linq to SQL, C#, and WPF Video Series, WCF 4.0 new features Video Series

.NET Framework
.NET Serialization Video Series, Core .NET 2.0 Dev Video Series, Delegates Intro Video Series, Getting Started with Entity Framework 4.0 Video Series, WCF Intro Demo Video Series

Patterns & Practices
Design Patterns - Applying MVP Part Video Series

SQL Server
Database Mail Video Series, Did you know? SQL Server 2008 Video Series, SQL SERVER Indexing Video Series, SSIS 2008 Basics Video Series, SSRS Video Series

YOU ARE HERE: HOME

What is the difference between UTF ASCII and ANSI code format of encoding

About Unicode, ASCII, and ANSI encoding	View(s): 17822
What is the difference between UTF, ASCII and ANSI code format of encoding?

Answer 1)

The Basics

Letters are represented in a computer by numeric codes. Pretty much everybody agrees that, when the computer sees a code of 100 (decimal), it represents a lowercase "d". We don't all agree on what 250 represents, and therein lies the rub.

ASCII vs ANSI

We commonly refer to character encoding as a letter's "ASCII value," when we really mean "ANSI value." A lot of the time that's sufficient, but in fact the ASCII standard is pretty much obsolete.

ASCII (American Standard Code for Information Interchange) is a 7-bit standard that has been around since the late 1950s (its current incarnation dates from 1968). It defines 128 different characters, which is more than enough for English: upper- and lowercase letters, punctuation, numerals, control codes (remember control-c?), and nonprinting codes such as tab, return, and backspace.

ASCII and ANSI are pretty good as long as you are western European. These two mappings are extremely limited in that they may only code (i.e. assign a number to) 256 letters, so that there is no space to include other glyphs from other languages.

Unicode

Unicode fixes the limitations of ASCII and ANSI, by providing enough space for over a million different symbols. Like the above two systems, each character is given a number, so that Russian ? is 042F, and the Korean won symbol ? is 20A9. (Note that all Unicode numbers are Hexadecimal, meaning that one counts by 16’s not 10’s, not a problem as users really don’t need to know the mapping numbers anyway.) So, although not yet totally comprehensive, Unicode covers most of the world’s writing systems. Most importantly, the mapping is consistent, so that any user anywhere on any computer has the same encoding as everyone else, no matter what font is being used.

So Unicode is a map, a chart of (what will one day be) all of the characters, letters, symbols, punctuation marks, etc. necessary for writing all of the world’s languages past and present.

What is the difference between UTF-8, UTF-16?

UTF-8 uses variable byte to store a Unicode. In different code range, it has its own code length, varies from 1 byte to 6 bytes. Because it varies from 8 bits (1 byte), it is so called "UTF-8". UTF-8 is suitable for using on Internet, networks or some kind of applications that needs to use slow connection.

Unicode (or UCS) Transformation Format, 16-bit encoding form. The UTF-16 is the Unicode Transformation Format that serializes a Unicode scalar value (code point) as a sequence of two bytes, in either big-endian or little-endian format. Because it is grouped by 16-bits (2 bytes), it is also called "UTF-16", which is the most commonly used standard.

Asked in: SemanticSpace Technologies Expertise Level: Experienced

Last updated on Saturday, 23 March 2013

All

ASP.NET

C#

.NET Framework

Patterns & Practices

SQL Server

All

ASP.NET

C#

.NET Framework

Patterns & Practices

SQL Server

All

ASP.NET

C#

.NET Framework

Patterns & Practices

SQL Server

What is the difference between UTF, ASCII and ANSI code format of encoding?