Wednesday 24 September 2014

INTRODUCTION TO DATABASE-2


            In this session, we will get to know what a database is and why those are that important. I strongly recommend going through my previous post 'Introduction-to-database-1' to understand this session with ease.
             From the last session, we know what data is. Now let us consider we have a lot of data consisting of the details of the employees in an organization. As such data is really important for an organization; it needs to be stored somewhere and somehow. It is sure that maintaining hard copy of those is not only tedious but also it is not very easy to search anything from those. Hence it is clear that we need to store them as softcopy.
            Now it’s time for us to discuss on the ways of storing our data as softcopy. One can say that we can store them as they are in flat files where data is stored in a plain text file. Each line of the text file holds one record (We will get to know what a record is soon), with fields separated by delimiters, such as commas or tabs. Though maintaining data in flat files is simple in structure, they don't have multiple tables (Data is stored in the form as tables in a database). Also, data cannot be viewed or retrieved easily and quickly when the amount of data maintained grows in size over a period of time. There are many other disadvantages of 
 maintaining data in flat files such as:
1) Detection of duplicate record is not that easy.
2) Duplication of records is possible.
3) Data retrieval is almost impossible when the data grows in size.
4) Data entered may not be of same data type when entered by different persons. For example, one person may enter the salary of an employee as 5000 while some other may enter it as 5K.
5) Security of these systems is quite less.
           These are only few reasons for change in the method to store data from flat file database system to Relational database systems.

DATABASE:
           Database is an organized collection of data so that data retrieval as well as operations on the data can be done easily. Data is stored in the form of tables in a database. A database table is similar to any Microsoft excel sheet that has rows and columns. The data in a column should be of same data type (We will get to know data types in the next session).


Record:
A record is the data arranged in a row such that the data in a row gives information of particular entity.

Databases are generally represented by the following symbol, so if you come across this symbol anywhere,its nothing but a database.


Why data bases?
1) Data retrieval or updating becomes easy as the data can be queried.
2) Multiprocessing and multi-threaded are supported.
3) Quality in data as many constraints can be imposed on the data being entered.
4) Data type mismatch in the columns can be eliminated completely.
5) Reliable
6) Transactions can be handled by databases.
7) Data integrity
8) Access through networks is made possible by databases.
9)  Due to indexing, records for particular condition are fetched easily.
These are only few advantages we have with the databases.
Some of the databases and their vendors are as follows:
      1)      Oracle Corporation – Oracle Database 11g (Proprietary)
                                          MySQL (GPL or Proprietary)
      2)      Sybase –Adaptive Database Server;
                    SQL Anywhere (Proprietary)
      3)      IBM – DB2, Informix Dynamic Server (Proprietary)
      4)      FileMaker – FileMaker (Proprietary)
      5)      Microsoft – SQL Server, Access
      6)      Hewlett-Packard – HP NonStop SQL (Proprietary)
      7)       Apache – Apache Derby (Apache License)
      8)      PostgreSQL Global Development Group – PostgreSQL (PostgreSQL open source license)
NOTE: All the above databases except the Microsoft products are compatible with at least one more operating system apart from Microsoft Windows.

Monday 15 September 2014

INTRODUCTION TO DATABASE-1

In this modern world, we are using a lot of software applications. It is a blunder to think that a well written software code in any of the programming languages is responsible for the smooth and faster running of the application. There are a lot of underlying things which we need to understand which helps us for the smoother running of these software. Apart from that, we need to store the data collected by various means in one place or other in one way or other and it should be done in such a way that it can be retrieved as soon as possible. For this purpose, databases have been evolved.
To know what a database is, we need to understand certain terms like Data, Big Data, Structured and Unstructured Data, Meta data.

Data:

Data may be individual facts, statistics, or items of information. Data is a plural of datum, which is originally a Latin noun meaning “something given.”
Data never has a specific meaning on its own, but upon combining with the suitable data processing system gets a meaning to it. For example, we all know that the computer represent data in binary in most cases. If I simply say the binary data 01110101, then there is an ambiguity whether it represents the color blue in a video or the integer 117 or ASCII value of lower case U or so on. On the other hand, the same binary data clearly represents an integer while addition, lower case U in a text line etc.

Database:

Databases are used to protect, store, and retrieve data that is necessary. For now, just remember that the data is stored in a database in records.

Big Data:

Data sets, typically consisting of billions or trillions of records, which are so vast and complex that they require new and powerful computational resources to process. Data held in such large amounts that it can be difficult to process. The term big data is believed to have originated with Web search companies who needed to query very large distributed aggregations of loosely-structured data.

Structured data:

Organized data that exists in a record or a file is called structured data. Hence the data can be easily loaded or retrieved (Querying is the term used for retrieving the data).
SQL (Structured Query Language) originally developed by IBM in the early 1970s and later developed commercially by Relational Software, Inc. (now Oracle Corporation) is generally used to manage structured data. Using this programming language, we can query for data in relational database management systems.

Unstructured data:

All those data such as images,videos,graphic images, pdf files etc which cannot be structured in a record or a file is called Unstructured data. Multimedia files are good example for unstructured data.
It is estimated that 80-90% of the data in any organization is unstructured and is growing at much higher rate when compared to structured databases

Note: Eventhough unstructured data may have an internal structure, they are still considered "unstructured" because the data contained in them cannot be arranged in rows and columns in a database.

 

Semi-structured data:

As the name suggests, semi-structured data is an intermediate between the structured and the unstructured data. It can be considered as a type of structured data, but it does not have a data model structure. For example, word processing software now can include meta-data showing the author's name and the date created, with the bulk of the document just being unstructured text.

Mining Unstructured data uses Hadoop and many other techniques. As of now, Lets stick to structured data.

In the next session(session2), lets discuss on what databases are and how data is stored in them.