In this modern world, we are using a lot of software applications. It is a blunder to think that a well written software code in any of the programming languages is responsible for the smooth and faster running of the application. There are a lot of underlying things which we need to understand which helps us for the smoother running of these software. Apart from that, we need to store the data collected by various means in one place or other in one way or other and it should be done in such a way that it can be retrieved as soon as possible. For this purpose, databases have been evolved.
To know what a database is, we need to understand certain terms like Data, Big Data, Structured and Unstructured Data, Meta data.
Data:
Data may be individual facts, statistics, or items of information. Data is a plural of datum, which is originally a Latin noun meaning “something given.”
Data never has a specific meaning on its own, but upon combining with the suitable data processing system gets a meaning to it. For example, we all know that the computer represent data in binary in most cases. If I simply say the binary data 01110101, then there is an ambiguity whether it represents the color blue in a video or the integer 117 or ASCII value of lower case U or so on. On the other hand, the same binary data clearly represents an integer while addition, lower case U in a text line etc.
Database:
Databases are used to protect, store, and retrieve data that is necessary. For now, just remember that the data is stored in a database in records.
Big Data:
Data sets, typically consisting of billions or trillions of records, which are so vast and complex that they require new and powerful computational resources to process. Data held in such large amounts that it can be difficult to process. The term big data is believed to have originated with Web search companies who needed to query very large distributed aggregations of loosely-structured data.
Structured data:
Organized data that exists in a record or a file is called structured data. Hence the data can be easily loaded or retrieved (Querying is the term used for retrieving the data).
SQL (Structured Query Language) originally developed by IBM in the early 1970s and later developed commercially by Relational Software, Inc. (now Oracle Corporation) is generally used to manage structured data. Using this programming language, we can query for data in relational database management systems.
Unstructured data:
All those data such as images,videos,graphic images, pdf files etc which cannot be structured in a record or a file is called Unstructured data. Multimedia files are good example for unstructured data.
It is estimated that 80-90% of the data in any organization is unstructured and is growing at much higher rate when compared to structured databases
Note: Eventhough unstructured data may have an internal structure, they are still considered "unstructured" because the data contained in them cannot be arranged in rows and columns in a database.
Semi-structured data:
As the name suggests, semi-structured data is an intermediate between the structured and the unstructured data. It can be considered as a type of structured data, but it does not have a data model structure. For example, word processing software now can include meta-data showing the author's name and the date created, with the bulk of the document just being unstructured text.
Mining Unstructured data uses Hadoop and many other techniques. As of now, Lets stick to structured data.
In the next session(session2), lets discuss on what databases are and how data is stored in them.
No comments:
Post a Comment
COMMENTS PLEASE.....