Buy presentation: October 2019

Thursday, October 31, 2019

Trilingualism In Education Essay Example | Topics and Well Written Essays - 1250 words

Trilingualism In Education - Essay Example Acquiring a second language may be due to exposure to other languages to become bilingual, which is being able to speak two or more languages (Barnes, 2006). Other people may learn a third language due to exposures to different language and cultural settings (Sagin, 2006). This can result from the parentsÃ¢â‚¬â„¢ change of citizenship to a new country, and the children acquire a third language, making them to become trilingual, which is the ability to speak more than two languages. Trilingualism can be considered another type of bilingualism, and researchers have used studies on bilingualism to study trilingualism (Hammarberg, 2009). Trilingualism can be achieved through three ways: children growing up in a trilingual surrounding, adults living in a trilingual or multilingual community, and fluent bilinguals who acquire a third language through learning at school or other areas (Wang, 2008). This essay is a literary review about trilingualism in the classroom and the effects that it brings to a childÃ¢â‚¬â„¢s education. It starts with evaluating circumstances leading to acquisition of trilingualism in the society. Through reference from earlier studies, the essay also discusses the prevalence of trilingualism and how it affects education in children. The research then concludes by calling for more research on trilingualism due to the limited current research trilingualism (Davidiak, 2010). The ability to speak more than two languages depends on several circumstances. First, children can become trilingual by being exposed to a trilingual society. Secondly, people who speak two or more languages can go to school to study a third language, and thirdly, living in a trilingual or multilingual society can affect peopleÃ¢â‚¬â„¢s language. In these three circumstances, researches on trilingualism have showed that there is no choice of whether or not one wants to acquire a third language, but conditions force them to become trilingual. However, the biggest challenge is how people deal with three languages or cultures because they cannot be balanced (Barron-Hauwaert, 2000). Whereas it is easy to acquire an additional language, it may prove difficult to adopt the culture. A third language acquisition can also depend on the childÃ¢â‚¬â„¢s age in relation to local, father or motherÃ¢â‚¬â„¢s language choice (Lasagabaster, 2007). Older children can easily acquire a third language especially in a situation where the local language is a third language to them because of exposure to it. SuzanneÃ¢â‚¬â„¢s research on language acquisition in children shows that children aged between 2 and 3.5 years used motherÃ¢â‚¬â„¢s language, children aged between 3 to 4 used fatherÃ¢â‚¬â„¢s language as their first language, and children aged 6 and above years used the countryÃ¢â‚¬â„¢s language (Lasagabaster, 2007). Acquisition of the motherÃ¢â‚¬â„¢s language at a young age is possibly because of the child living with the mother and having no peer interaction in the commun ity (Tokuhama-Espinosa, 2003). Although the reason for the acquisition of fatherÃ¢â‚¬â„¢s language by some children is not clear, (Barron-Hauwaert2000) points that it might be fathers stepping in to expose the child to their language. Speaking the local language of older children is due to exposure to the community that speaks the local language or peer group at school. Barron-Hauwaert shows that exposure to different circumstances leads people to become

Tuesday, October 29, 2019

Efficient Market Essay Example for Free

Efficient Market Essay Q1. Efficient market is one in which stock prices fully reflect the information of a company, either positive or negative. If the information from a company is positive, investor will give a good response and the price of shares of this company will increase. Since the information is reflected in price at once, normal rate of return should only be obtained. Also the price that the firm received from issuing securities is the present value, and valuable financing opportunities are unavailable. There are three conditions that will cause market efficiency, which are the rationality of investors, dependent deviations from rationality and arbitrage. Three forms are divided by researcher according to the availability of information. The first one is Weak Form. The price in this form were just focused the past stock price. This is the cheapest, easiest strategy to find the pattern in stock price. But the future information is random due to random walk hypothesis therefore it is unable to generate any profit. The Semistrong Form will appear when all information is publicly available, including the historical price information. The price should rise at once when the news release and no chance for profit when the investor analysis the information. The Strong Form appears in which the price reflects all information publicly or privately on market. Secret news or insiderÃ¢â‚¬â„¢s news is useless for investors to earn profit in this form. Arbitrage will generates profit from the rationally purchase and sale of similar stocks in market in order to make the profit riskless. The rationally decision is included estimate the business rationally and methodically. So in Weak Form just need to obtain the historical stock information is enough for knowing the different of price. But financial statement, economic and politic situation is needed to consider in order to obtain the arbitrage in Semistrong Form. And some private information like the purchase of resources or amalgamations of firms are needed for understands in order to obtain arbitrage in Strong Form. Q2. Below are the advantage and disadvantage of different investment rules. Net Present Value is used to calculate the net change in companyÃ¢â‚¬â„¢s asset with respect to a project after considering the time value of money. So company can base on the result to make the decision, where positive NPV should accept the project. The advantage of NPV is accurate to obtain the best decision since it can fairly rank different projects and classify them by their size and duration. Because NPV consider and apply cash flow and discount rate, difficult to estimate and full of uncertainty of these two elements will be the disadvantage of NPV. Payback period is used to make decision by comparing break-even point that the project can payback within a predetermined cutoff. The advantage of it is simple and directly analysis. Also, it is useful for short-term project that cash management is the first priority since it payback mainly focus on liquidity of the project. The disadvantage is payback ignore time value of money and the cash flow after cutoff period, so some cost may neglected and cause an inaccurate result. Discounted Payback Period will convert the cash flow as the present value and compare the discounted break-even point that the project can discounted cash flows payback within a predetermined cutoff. Since the calculating is similar with Payback so they share their advantage and disadvantage. Since the time value of money is considered so result can be more accurate, but it also specified the discount rate so the analysis will become more complicated. Internal Rate of Return is the discount rate that NPV become zero. It will accept the project that has a greater discount rate compare to IRR. It is usually used because it is easy for manager to find out the rate of return. But it is no accurate when the project have some non-normal cash flow or when evaluating mutually exclusive projects. Finally, Profitability index is used to measure profitability among different projects. It obtains present value by discounting the cash flow. With formula: benefit/cost ratio, profit of all positive NPV project can be ranked properly. Therefore it is useful for manager to rank and select suitable projects through the list. But discount rate is difficult to calculate because it is very uncertain. And profitability index will be broken down by other resource constraints. In conclusion, there are similarities between some investment rules. NPV, IRR and PI consider the time value of money while the left rules do not. Although it is more complicated when computing with discount rate, the more accurate and detailed result can be work out so it is worth to do so, just like the rules discounted payback period and payback period. Moreover, IRR can give the accurate result to manager quickly, but it may affect by the non-normal cash flow. Therefore using NPV may be the right choice of rules to evaluating mutually exclusive projects. As a result, managers should choose different rules for particular situations.

Sunday, October 27, 2019

Importance Of Preservation Of Biodiversity Philosophy Essay

Importance Of Preservation Of Biodiversity Philosophy Essay There are three main arguments in the book. Friedman explains the problems by breaking them down into the simple categories that the world is getting hot, flat, and crowded. He related that the world is hot by explaining global warming and what problems it causes. Globalization is a main contributor to global warming. People consume more so they demand more to be produced which promotes globalization and global warming. The more we produce the more gas toxins are released into the atmosphere causing our air quality to diminish. Friedman takes an optimistic view of global warming by saying that it will help our economy downsize and force us into developing innovative technologies and eventually free us from depending on oil producing countries. Friedman also explains that the world is now flat, meaning that the rise of high-consuming middle classes all over the world is all linked together. As the economy increases the standard of living increases and the middle-class are the ones benefiting the most and causing the most problems. We want too much and waste so much. We buy and buy and buy and then just throw things away after a few uses instead of recycling and conserving. More countries like China and Russia are adopting the American way of living and the planet just doesnt have enough resources. Eventually, all the natural resources of the world will be depleted and we wont know what to do. The last argument is that the world is crowded. The worlds population increases by about a billion every thirteen years. People live longer and there is just not enough space for everyone. We are destroying our forests and jungles to build houses and factories. Pretty soon there will be no natural land preserved for farming and natural habitat. Friedman wraps up his arguments with one main thesis stating that America can recover from the major problems and by developing new technologies and policy solutions that pertain to energy and environmental stresses on the planet. He predicts that because America is the major contributor to these problems and that we have been facing these problems for some time now, we will be the first to invent these innovative technologies. Once we have these inventions we will be able to sell them to the rest of the world and gain some of the power we have lost. Part 2- Analysis (15 points) (1.5 pages max for each answer) Answer any THREE (5 points each) Why is the preservation of biodiversity important in a hot, flat and crowded world? How can we preserve biodiversity? The preservation of biodiversity is important because it is what keeps life going on our planet. It keeps species from becoming extinct, it provides crucial services to poor and under-developed areas, and its the little things that help us adapt to the ever-changing world around us. We cant live in a world where species die out regularly. There would be no cycle or continuity. We cant live in a world of cement and stainless-steel. There needs to be life on our planet in order to produce natural resources to survive. Rapid climate change and human development are destroying the biodiversity on our planet. It affects the quality of our lives. If we allow the planet to keep running on this destructive path we will kill off the small unnoticed organisms and species that keep everything running. My old basketball coach used to say take care of the little things and the big things will fall into place. Friedman is basically saying the same thing. If we keep the little things running like insects and plants, we bigger and more developed organisms will benefit. Friedman talks about two main problems with biodiversity. He explains how the poor communities look to natural resources to attain whatever they can to survive. The problem is that too many people are doing this. There are too many poor people depleting our ecosystems. The second problem is globalization. Globalization solves the problem of decreasing the amount of poverty, but it causes so many more. Globalization demands increases in production and consumption which results in competition to get as much, as quick as possible. This causes extinction of all aspects of life on our planet to come much quicker than ever before. To prevent this, the idea of government regulations and ethics of conservation have to be set forth. Governments can put restraints on where companies can develop and preserve ecosystems. Also there has to be new limits on consumption. Consumption of food, land, fuel and pretty much everything has to be cut in order for our planet to survive. Friedman basically explains that our ecosystems have to work in harmony in order to preserve biodiversity. Human beings are the cause of this dissonance. At one point our planet thrived and provided humans with all the natural resources it needed. We have over-consumed and destroyed too much of the planet for it to provide as much as it used too. The more we destroy the more we need to develop artificial ways to provide those natural resources. If we just cut back on consumption and work on making the planet work as it used too we will preserve biodiversity. What is energy poverty and what are its causes? Do you agree that ending energy poverty can help make a hot, crowded and flat world better? How? If you dont agree, explain why. Energy poverty is the fact that one out of every four people do not have access to energy. We take for granted the fact that when we walk into a room we flip a switch and a light turns on. In many countries like Africa that isnt the case. Friedman quotes Freling saying that, energy poverty means you cant pump clean water regularly, theres no communications, no way to have adult literacy classes, and certainly no way to run computers at school or have access to connectivity. Energy poverty means you do not have access to electricity, its more difficult to adapt to climate changes, there is no means to use computers or cell phones which mean you are limited in global commerce, education, collaboration, and innovation. Basically energy poverty limits your ability to do work and therefore limits your ability to thrive in todays world. It also hinders the ability to acquire basic needs the people who arent energy poor take for granted. The causes of energy poverty according to Friedman are economic growth, increased population, overconsumption, high oil and natural gas prices, rationing, and droughts. There is also the problem that some countries dont have the facilities to provide electricity and dont have the funds to build them. Some of these poorer countries are not governed by anyone or thing and are engaged in constant war. I agree that solving the energy poverty problem would make the world better, but I dont think it is a cure-all. Providing energy to these poor countries would definitely give them a way to educate themselves and connect with each other, but how do we make that happen? Friedman goes into saying that the problem with education is there is a teacher shortage and an energy shortage. Providing energy does not necessarily solve the teacher shortage. Who is going to teach the teachers? There is a healthcare issue in these poor countries, but providing energy doesnt mean doctors will want to go to these places, or that there are medicines to cure and help all of these people. Providing energy to places like Africa would be a huge leap for them, but my biggest problem is how do we do that, and where does the money come from? Providing ways of education and facilities to run electricity and allowing communication to be easier wont solve the turmoil going on in these sections of Africa, and will not cure all the diseases and problems they have. It would be a very timely and costly mission that seems like a fairytale. What is the reasoning behind Friedmans argument that Mother Nature and the Market hit the wall at the same time? The Great Recession is when Freidman says Mother Nature and the Market hit the wall. According to Friedman our planet and our markets have been growing at a pace too quick and too destructive to keep up with. Friedman focuses on three main reasons of why the Market and Mother Nature have come to a stop: unethical business and ecological values, under pricing the true costs of risks we partake in, and privatizing gains and socializing losses. Major economies like the US and China have come out with great technologies, but at a very high price. We didnt have the means to develop these products so we borrowed them. This is where the unethical business values and under pricing illuminates. If we spend too much money and too many resources there is nothing left, but these new technologies that last for a short period and then are disregarded. Now that these technologies are thriving we cannot return the resources and demand more. We are living beyond our means. Friedman says that instead of recovering from this recession we should use it as a time to change things. We need to stop living beyond our means and conserve. We cannot keep up this standard of living and pass it on to our children. Something has to be given up. The economy as it is now is unsustainable. Part 3- Critique (1 page maximum) (5 points) My impression of the book is that Friedman touches on many interesting and eye-opening topics. It really made me think about how much I really consume. America is a really wasteful country. I especially liked when Friedman touched on the fact that Americans buy ridiculous gadgets, use them twice, and then buy something else. If America focused on essentials we wouldnt consume so much. I dont usually look too far into things like global warming, but Friedman had good facts backing him up and I was really surprised at how real global warming is. I am big on things like recycling and a greener America. It is good that there are people out there trying to inform the world that changes need to happen and that they need to happen now. Friedman puts a sense of urgency on the fact that changes must be made. He describes and intertwines these problems in a very strategic and understandable way. What I dont like is that he doesnt have direct solutions to these problems. He looks heavily to the government which gives the government more control, and in my opinion, isnt always a good thing. Also, Friedmans ideas seem very costly and he doesnt provide explanations on where this money will come from. We are already in an economic crisis, there isnt any money to work with now let alone put into motion a whole new system of how the world works. My last argument with Friedmans ideas is that he is planning everything around the fact that America will develop these life changing methods of energy and fuel. This is a great optimistic attitude, but what happens if we dont? I hate to be a pessimist, but in todays world nothing is a definite. You cant structure a plan around something that hasnt been developed yet. Overall I enjoyed the book and have a different perspective on what I consume, and what needs to change.

Friday, October 25, 2019

Themes and Characters in For Whom the Bell Tolls Essay -- For Whom the

Themes and Characters in For Whom the Bell Tolls For Whom the Bell Tolls, by Ernest Hemingway, is a contemporary novel about the realities of war. The novel is wrought with themes of life and stark direct writing. The characterization in the story is what comprises the intricacy of the underlying themes within the tale. The story itself is not complex, but the relationships of the characters with the environment and with each other coupled with Hemingway's command of description and understanding make the novel as a whole, increasingly developed. The emotions of the story are not found in the dry narrative but rather from the character's themselves. The main character, Robert Jordan, has personality traits spanning various aspects of the heroic side of human nature. In addition, he displays ingenuity and perfectionism. His actions also show a high degree of introspection and philosophical thought. His relationship with Maria and the conflict it causes results in Robert Jordan's discovery of his personal values. He struggles to understand what defines his life and resolve the conflict of what to live or die for. Other secondary characters within the novel are Maria, Pilar, and Pablo. Pilar and Pablo play pivotal roles in both the story and the development of Robert Jordan's character. Their personality traits come into direct conflict with each other, affecting Robert in a wide variety of ways. Pilar can be best described as an aggressive, dedicated, outspoken women who feels comfortable leading a group or controlling a situation. Pilar demonstrates her skill at various times within the text, most notable ... ...xual trauma and made a woman by Robert, and he is given true happiness by her. Indeed, the rarity of their love is apparent when one analyzes the diction and syntax describing their lovemaking: lightly, lovingly, exultingly, innerly happy and unthinking and untired and unworried and only feeling a great delight and he also said my little rabbit, my darling, my sweet, my long lovely. The repetition of word structures and then sentence structures creates a catharsis. The repetition of words beginning in "l" and then "u" establish a parallel sentence structure which creates a rhythm alluding to their own physical interaction. They fall in love, these two people, one always looking ahead and the other always looking back. Through the necessity of war and the help of Pilar, they are able to learn to live in the now, and through this learning are able to grow as characters.

Thursday, October 24, 2019

Marketing of Service Ã¢â‚¬ Restaurant Chain Essay

With the rise in disposable income, dining outside has become a staple part of the modern world. This has been a phenomenon of most of the cities across the continent. The beautiful and pristine continent of Africa is no exception with the spurt of the cities and the settlements from the outside countries. An interesting cradle of development in the continent is the country of South Africa. As the standard of living of most South Africans has risen over the last decade, eating out has become a popular leisure activity. According to Statistics South Africa, restaurants and coffee shops are steadily growing their businesses year on year. Take-away also did well, with businesses growing at an annual rate of around 15%. These increases were in spite of rising interest rates. In recent years, this market has grown and more restaurants have opened Ã¢â‚¬â€œ offering a wide variety and an improving quality of food. So, while the market offers plenty of opportunity for a small business, it also demands quality and preferably a special or different offering. We, at Golden Restaurants , in our explorations to roll out across the seas , couldnÃ¢â‚¬â„¢t help overlook this burgeoning market. Hence, going ahead with our vision of taking our Flagship Restaurant Brand Ã¢â‚¬ËœThe Golden BowlÃ¢â‚¬â„¢ to the International Market, we have thought of setting our eyes on the AfricaÃ¢â‚¬â„¢s , the beautiful South Africa to begin with . Being in the Indian Restaurant Market for quite sometime and having burnt our kitchens to serve clients from different classes , particularly , the rich and the creamy , we would like to conjure our expertise in positioning ourselves as a class apart and an amphitheatre for the Rich and the Super Rich Indian South Africans . This document provides a peek into the South African Indian Market and our strategy to market and promote the experience of Dining in a different way to the Rich Indian populace. Introduction Setting up a restaurant means first deciding what type of food to serve. South Africa has plenty of Ã¢â‚¬ËœtraditionalÃ¢â‚¬â„¢ dishes of its own, and has long been a fertile market for cuisines from India, Italy, Greece, France, China and Japan. This exposure has grown in the last decade, and will continue as the country has become home to thousands from other African countries. Knowing our expertise in the Indian Cuisines, we plan to target High-earning individuals or families of the large Indian Diaspora with plenty of disposable income but not much time as well as dual-income family groups and the Flux of Indian Tourists to the country. Though there are a sizeable number of Indian Restaurants, around 40, located in the Indian strongholds like Durban, Johannesburg, Cape Town, Pretoria catering to the different strata of the diaspora, we intend to focus ourselves on the niche rich segment and provide an enriching experience with differentiated Service Value Addition. We plan to start with Johannesburg, as our strategic location , it being a hotspot of Indian settlement and also one of the wealtiest cities of the country . We would like to offer our guests a dining experience like no other. A unique, interactive dining experience creating memorable moments with family and friends or the corporate honchos. From the time the first piece of bread is dipped and the last piece of dessert is savored, youÃ¢â‚¬â„¢ll be graced with the time to discover new things about people you thought you knew. And, those youÃ¢â‚¬â„¢re getting to know. The emphasis would be on the first impressions and the power of contrast, simplified but exhaustive dining, an engagement of the senses and a choreographed ambience. The pick of the cuisines of the four corners of India would be on offer and the Indian exotic feel would be the main forte. Indian Diaspora in South Africa The history of the Indian diaspora in South Africa is a fascinating saga of almost a hundred & forty years. Indian South Africans are people of Indian descent living in South Africa and mostly live in and around the city of Durban, making it Ã¢â‚¬Ëœthe largest Ã¢â‚¬ËœIndianÃ¢â‚¬â„¢ city outside IndiaÃ¢â‚¬â„¢. Many Indians in South Africa are descendents of migrants from colonial India (South Asia) during late 19th-century through early 20th-century. At other times Indians were subsumed in the broader geographical category Ã¢â‚¬Å"AsiansÃ¢â‚¬ , including persons originating in present-day Iran and parts of the small Chinese community. The modern South African Indian community is largely descended from Indians who arrived in South Africa from 1860 onwards. The first 342 of these came on board the Truro from Madras, followed by the Belvedere from Calcutta. They were transported as ndentured laborers to work on the sugarcane plantations of Natal Colony, and, in total, approximately 150,000 Indians arrived as indentured laborers over a period of 5 decades, later also as indentured coal miners and railway construction workers. The indentured laborers tended to speak Tamil, Telugu and Hindi, and the majority were Hindu with Christians and Muslims among them. The remaining Indian immigration was from passenger Indians, comprising traders, and others who migrated to South Africa shortly after the indentured labourers, paid for their own fares and travelled as British Subjects. These immigrant Indians who became traders were from varying religious backgrounds, some being Hindu and some being Muslims from Gujarat (including Memons and Surtis), later joined by Kokanis, and Urdu speakers from Uttar Pradesh. . There was also a significant number of Gujarati Hindus in this group. Indian traders were sometimes referred to as Ã¢â‚¬Å"Arab tradersÃ¢â‚¬ because of their dress, as large numbers of them were Muslim. Passenger Indians, who initially operated in Durban, expanded inland, to the South African Republic (Transvaal), establishing communities in settlements on the main road between Johannesburg and Durban. NatalÃ¢â‚¬â„¢s Indian traders rapidly displaced small white shop owners in trade with other Indians, and with black Africans, causing resentment among white businesses. | Population, Regional & Linguistic Distribution The South African Indian origin community currently numbers around 1. 15 million and constitutes about 2. 5% of South AfricaÃ¢â‚¬â„¢s total population of 45. 45 million. About 80% of the Indian community lives in the province of KwaZulu-Natal, about 15% in the Gauteng (previously Transvaal) area and the remaining 5% in the Cape Town area. In KwaZulu-Natal, the major concentration of the Indian population is in Durban. The largest concentrations of Indian settlement are at Chatsworth, Phoenix, Tongaat and Stanger in the Durban Coastal area, which covers approximately 500,000 of the Indian origin community. Pietermaritzburg Ã¢â‚¬â€œ noted for its link with Mahatma Gandhi Ã¢â‚¬â€œ has a community of approximately 200,000. Smaller inland towns in KwaZulu Natal such as Ladysmith, Newcastle, Dundee and Glencoe make up the bulk of the remaining Indian population. In the Gauteng area, the Indian community is largely concentrated around Lenasia outside Johannesburg and Laudium and other suburbs outside Pretoria. There are also smaller groups in towns in the Eastern Cape and other provinces. Settlement of Indian origin people in a particular area, as with other South African peoples, came about as a result of the Group Areas Act that forced racial division into particular designated areas. According to the figures provided by the Department of Education and Culture, in the Province of KwaZulu-Natal, the linguistic break-up of the Indian community is as follows: Tamil 51%, Hindi 30%, Gujarati 7%, Telugu 6%, Urdu 5% and others 1%. Starting a restaurant in South Africa Product is a key element in the overall market offering. Marketing-mix planning begins with formulating an offering that brings value to target customers. This offering becomes the basis upon which the company builds profitable relationships with customers. A companyÃ¢â‚¬â„¢s market offering often includes both tangible goods and services. Each component can be a minor or a major part of the total offer. At one extreme, the offer may consist of a pure tangible good, such as soap. Toothpaste, or saltÃ¢â‚¬â€no services accompanying the product. At the other extreme are pure services, for which the offer consists primarily of a service. Examples include a doctorÃ¢â‚¬â„¢s exam or financial services. Between these two extremes, however, many goods-and-services combinations are possible, the best examples is Ã¢â‚¬Å"RestaurantÃ¢â‚¬ . A restaurant is an ideal case of a product meets services story and the success of the greater concept as a whole depends on the combined successes or excellence of the entire gamut of offerings right from the food served to the services rendered to the ambience offered. We are not just offering our core Product with an elite Service but we blend it with a rich dining experience, one that would linger on for quite sometime. Now that we have identified the country, learnt about the population and have good statistical information which support the opening of an Indian Restaurant in South Africa, letÃ¢â‚¬â„¢s put on the Thinking Hat and do some Brainstorming like a marketers. We have the vast South African Market which is more or less a mixed kind of market with heterogeneous culture. So, at first we need to identify our target market and position our pro-ser-exp (product served in a unique manner to give an experience of lifetime) by the process of S. T. P (i. e. Segmenting, Targeting and Positioning)

Wednesday, October 23, 2019

Study Plan for Masters in Surgery

Tuesday, October 22, 2019

Platos Poesis In Republic Essays - Platonism, Dialogues Of Plato

Plato's Poesis In Republic Essays - Platonism, Dialogues Of Plato Plato's Poesis In Republic Plato's three main objections to poetry are that poetry is not ethical, philosophical or pragmatic. It is not ethical because it promotes undesirable passions, it is not philosophical because it does not provide true knowledge, and it is not pragmatic because it is inferior to the practical arts and therefore has no educational value. Plato then makes a challenge to poets to defend themselves against his criticisms. Ironically it was Plato's most famous student, Aristotle, who was the first theorist to defend literature and poetry in his writing Poetics. Throughout the Republic Plato condemns art in all forms including literature or poetry. Despite the fact that he wrote, Plato advocates the spoken word over the written word. He ranks imitation (mimetic representation) on a lower plane than narrative, even though his own works read like scripts (the Republic is written in dialogue form with characters doing all the talking). It appears as though his reasoning is that imitation of reality is not in itself bad, but imitation without understanding and reason is. Plato felt that poetry, like all forms of art, appeals to the inferior part of the soul, the irrational, emotional cowardly part. The reader of poetry is seduced into feeling undesirable emotions. To Plato, an appreciation of poetry is incompatible with an appreciation of reason, justice, and the search for Truth. To him drama is the most dangerous form of literature because the author is imitating things that he/she is not. Plato seemingly feels that no words are strong enough to condemn drama. Plato felt that all the world's evils derived from one source: a faulty understanding of reality. Miscommunication, confusion and ignorance were facets of a corrupted comprehension of what Plato always strived for - Truth. Plato is, above all, a moralist. His primary objective in the Republic is to come up with the most righteous, intelligent way to live one's life and to convince others to live this way. Everything else should conform in order to achieve this perfect State. Plato considers poetry useful only as a means of achieving this State, that is, only useful if it helps one to become a better person, and if it does not, it should be expelled from the community. Plato's question in Book X is the intellectual status of literature. He states that, the good poet cannot compose well unless he knows his subject, and he who has not this knowledge can never be a poet(Adams 33). Plato says of imitative poetry and Homer, A man is not to be reverenced more than the truth (Adams 31). Plato says this because he believes that Homer speaks of many things of which he has no knowledge, just as the painter who paints a picture of a bed does not necessarily know how to make a bed. His point is that in order to copy or imitate correctly, one must have knowledge of the original. Plato says that imitation is three degrees removed from the truth. Stories that are untrue have no value, as no untrue story should be told in the City. He states that nothing can be learned from imitative poetry. Plato's commentary on poetry in Republic is overwhelmingly negative. In books II and III Plato's main concern about poetry is that children's minds are too impressionable to be reading false tales and misrepresentations of the truth. As stated in book II, For a young person cannot judge what is allegorical and what is literal; anything that he receives into his mind at that age is likely to become indelible and unalterable; and therefore it is most important that the tales which the young first hear should be models of virtuous thought (Adams 19). He is essentially saying that children cannot tell the difference between fiction and reality and this compromises their ability to discern right from wrong. Thus, children should not be exposed to poetry so that later in life they will be able to seek the Truth without having a preconceived, or misrepresented, view of reality. Plato reasons that literature that portrays the gods as behaving in immoral ways should be kept away from children , so that they will not be influenced to act the same way.

Sunday, October 20, 2019

buy custom Business Decision Making essay

buy custom Business Decision Making essay Among the most compelling topics were mainly those of framing the Null and Alternative Hypothesis, hypothesis testing in statistics, statement Structuring, determining the Alternative and Null hypothesis centered on degrees of self-assurance Illustration of the the coefficient of the association of two numerical values and determining of coefficient of association with accessible data. Other topics which I found out to be challenging included describing how distinct the observations are from each other by use of the standard deviation. The essence of leeway fault, quantitative and definite data. Developing average scores of ordinary distributions, developing the right siz of models; including models of small sizes and finally the examination of the SWOT analysis of association. I have come to realize that discussion groups are very helpful in further understanding of the challenging topics especially those on the determination of the averages of the data. This is because these topics needs much practice. With a committed discussion group of active members, I have been convinced that everything needs further perusal to get the clear concept. This is because you get the chance to do a problem and get corrections and clarifications thereof from the group. So far, most of the topics are almost clear but I think some topics need to bbe re-visited for revision purposes and for clarity. These include: null and alternate hypothesis, averages, sample types and standard deviation since these are the main concepts that needs statistical gauging. Self-assurance intervals, Prospect and Payouts, graphing of the data acquired, the degrees of self-assurance and the Coefficient of association. According to my personal view, I think had the presentation be done in the lecture halls, it would have enhanced more understanding as this enhances active participation of everyone in the class. To summarize I cannot forget to admit that the coverage on these topics has been very satisfactory. Buy custom Business Decision Making essay

Saturday, October 19, 2019

Bud Light Marketing Analysis

Anheuser-Busch Inc. is a dominating global leader in the beer industry, specifically in the United States. Its roots can be traced all the way back to 1852 from the Bavarian Brewery in St. Louis MO when Adolphus Busch traveled from Germany to join his father-in-law. In 1876 Budweiser was founded and rooted its brand in values, ethics, and quality. These core staples of the company evolved all the way to 1982 when Bud Light was introduced. Today Bud Light is the best selling beer in the U. S. and the #1 beer sold by volume in the world. LetÃ¢â‚¬â„¢s take a look into the marketing mix that makes this product so successful. Product. Bud Light was only preceded by Bamblinger and Miller in the Ã¢â‚¬Å"LightÃ¢â‚¬ beer segment of the industry and is brewed at all 12 Anheuser-Busch U. S. based breweries. ItÃ¢â‚¬â„¢s brewed with all natural ingredients (water, barley malt, rice, premium hops, and yeast) and the clean, crisp, smooth taste is derived from the two and six row malt and cereal grains used during fermentation. Each 12 ounce serving contains 110 calories, 6. 6 grams of carbohydrates and is 4. 2% alcohol by volume. Consumers in the beer segment are very open to try different types of beverages so having a unique taste and Ã¢â‚¬Å"superior drinkabilityÃ¢â‚¬ separates Bud LightÃ¢â‚¬â„¢s product from its competitors. Place. Bud Light (through Anheuser-Busch) has a very large and extremely effective distribution system. It starts with 12 breweries located all across the U. S. which in turn helps minimize delivery times and costs associated. After Bud Light is brewed a chain of over 600 wholesalers distribute the product to the suppliers, who in turn sell and deliver the beer to locations where itÃ¢â‚¬â„¢s sold. Each business that sells Bud Light is provided with a secondary supplier to reduce the risk of stock outs. Over half of wholesalers who distribute Anheuser-Busch products deliver only its products. This vertical marketing system ensures two things: 1) Anheuser-Busch has more control over where its products are sold and the price of the beer. 2) Delivery time- in the beer sector this is crucial to quality and taste. Promotion. This area of the marketing mix is where Bud Light excels and blows the competition away! It taps a variety of advertising media. Primarily utilizing television commercials from a comical standpoint, Bud Light ensures it reaches and relates across all demographics. Effective advertising is also attained through sponsorships of special events, concerts, and sporting events. This accomplishes two goals: 1) it guarantees that only its product will be sold during the sponsored event. 2) The event will incur a connection between the Ã¢â‚¬Å"good timeÃ¢â‚¬ and drinking Bud Light. Symbolic attributes also play an important role in promotional strategy. The seasonal association between Clydesdale horses and Anheuser-Busch is unmistakable. Also think about females depicted during their television ads, they are stunningly beautiful, playfully flirtatious, and seem genuinely into the target audience represented (males 18 -35). Physical styling and packaging is also incorporated into Bud LightÃ¢â‚¬â„¢s advertising campaign. It has introduced co-branding and sports marketing promotional packaging with affiliations of Major League Baseball and NFL teams. All these promotional techniques can be summarized into one simple strategy. Bud Light uses peopleÃ¢â‚¬â„¢s emotional need for a relationship to buy their product. This type of targeted advertising coincides with MaslowÃ¢â‚¬â„¢s Hierarchy of Needs. They specifically target consumers who view their ads consistently to entice them with sexuality, attractiveness, and a sense of the Ã¢â‚¬Å"good lifeÃ¢â‚¬ . With the use of pathos, repetitiveness, and color psychology Bud Light provokes its audience to dream of a euphoric experience. Price. Bud Light retains a position most competitors envy in this particular segment. Since Anheuser-Busch cements itself atop of U.S. beer sales at a 49% clip, it positions itself according to projected earnings growth. There are commodity factors that influence this pricing such as: barley price, gas prices, and advertising expenditures however Bud Light continues to price itself above its competition. Historically Bud Light prices have increased about half the rate of inflation. With competition reluctant to pare its prices with Bud Light, this strategy will remain for the foreseeable future. Bud Light SWOT analysis: Strengths Bud Light is the best selling beer in the U.S. * 37% market share * Brand Strength Ã¢â‚¬â€œ leading light brand in the US * Pricing Leadership Ã¢â‚¬â€œ Continues to set industry benchmarks for pricing. | Weaknesses * Many consumers consider the taste to Ã¢â‚¬Å"watered downÃ¢â‚¬ . * Challenge of expanding internationally due to the stigma of an American beer. | Opportunities * Possibility of expanding its target focuses to include women. * Continue to retain brand loyal consumers with Ã¢â‚¬Å"reminderÃ¢â‚¬ campaign. | Threats: * Fast growing light beer segment with new competitors entering the arena. As a beer drinker I will not contest the fact that it is almost impossible to differentiate between unlabeled beer brands. This is where brand image is enormously important. The single most important factor Bud Light has on their side is the momentum from its brand culture. In a judgmental society, each person is constantly categorized by their shape, clothes, ethnicity, religion, salary, and yes even the beer they drink. With this awareness, Bud Light has proven that while they may offer a distinct and quality beverage it is the marketing strategy employed that delivers success.

Friday, October 18, 2019

Learning Styles Coursework Example | Topics and Well Written Essays - 1000 words

Learning Styles - Coursework Example However learning styles have been criticized stating that in achieving goals and learnerÃ¢â‚¬â„¢s motivation, teaching or learning based on learning styles have very little impact and does not play a significant role and might typecast the learners (Coffield et al, 2004). Using the Learning style inventory assessment questionnaire (Honey and Mumford, 1992), I learnt that my preferred way of learning or learning style is reflector. In the present paper a critical reflection of the identified learning style in the context of three personal development goals is analysed along with the critical evaluation of the internal and external factors contributing to the attainment of my personal development goals and at the end learning developmental plan is giving on SMART principles. Critical Analysis of Reflector Learning Style Kolb (1984, p.38) who developed learning cycle defined Ã¢â‚¬Å"Learning as the process whereby knowledge is created through the transformation of experienceÃ¢â‚¬ . Base d on KolbÃ¢â‚¬â„¢s (1984) learning cycle, Honey and Mumford (1992) developed the Ã¢â‚¬Ëœlearning styles inventoryÃ¢â‚¬â„¢, and according to Honey and Mumford (1992, p.1) Ã¢â‚¬Å"the term learning styles is used as a description of the attitudes and behaviours which determine an individualÃ¢â‚¬â„¢s preferred way of learningÃ¢â‚¬ and no learning style is superior to that of another. The below figure gives the four learning styles developed by Honey and Mumford (1992), where they said that people either learn through teaching or experience and explained four learning stages and suggested that a person may start learning at any of the stages illustrated below. Figure 1: Honey and Mumford: Typology of Learners (Honey and Mumford, 1992) Honey and Mumford (1992), state that a reflector collects information and evaluates it before coming to a conclusion. Reflectors until and unless are sure about the conclusion does not give their opinion or judgement as they are very cautious and thoughtf ul. Reflectors favour intellectual activities, situations that are passive, for participative activity need substantial briefing beforehand, preparation time, thorough research, learning situations that are structured and do not favour pressure or time limits. They do not cope up well with lack of adequate information and well laid instructions, spontaneous thinking and time bound activities (UMIST, 2003 and University of Southampton, 2003). My main personal goal is to become a Diplomat and a good martial arts teacher because I am interested in diplomatic service and learning martial arts from a long time and want to impart the knowledge to others and popularize the art form, other than that I want to score good grades in the Business Management course, the below analysis examines whether my learning style helps in achieving these goals. A critical reflection of the identified learning style-Reflector in the context of three personal development goals. Basing on the principles of C- SMART goal setting that stands for the goals that are Challenging, Specific, Measurable, Achievable, Realistic and Timely (Life Rocks, 2007; ECU, 2010; University of Ballarat, 2012) I have set my goals as follows. I want to become a Diplomat and a Capoeira martial arts teacher Ã¢â‚¬â€œ an Afro-Brazilian martial art combined with dance but before that I want to secure good grades in the Business management course ending May 2014. I want to secure A+ grade in the course, meanwhile work part time as a Capoeira martia

Literary works comment on society Essay Example | Topics and Well Written Essays - 250 words

Literary works comment on society - Essay Example These all show her vacillation between tradition and modernization, comfort and progress, and that Tess is unable to decide which is right for her. The new order seems to ignore emotion, but the idea of condemning the baby Sorrow to eternity in purgatory for the sake of her anti-Christian beliefs makes Ã¢â‚¬Å"her nightgown damp with perspirationÃ¢â‚¬ (Chapter XIV). Tess becomes the unsure frontrunner of the new, twentieth-century combination of Christian doubt and personal spirituality. Tess is personified as a Ã¢â‚¬Å"daughter of NatureÃ¢â‚¬ (Chapter XVIII), with religion as a function of civilization, and as such she cannot quite choose which authority to be persuaded by: tradition deems that she should follow Christian law closely, although certain allowances are made in her hometown. For example, near the start of the novel, Tess participates in Cerealia, a festival for the Goddess of the Harvest (Chapter II).

Shermine Narwani and Maysaloon Albadri Research Paper

Shermine Narwani and Maysaloon Albadri - Research Paper Example However, Maysaloon Albadri, a critic of Narwani, uses logos and the rhetoric appeal of pathos to discredit her assertions. In this article, I will illustrate the manner in which Maysaloon and Narwani have applied logos, pathos, and rhetoric appeal to make their claims appealing to the audience. Maysaloon begins his essay by analyzing the nature of NarwaniÃ¢â‚¬â„¢s article. He points out that any well-written and relatively neutral article that raises the slightest doubts that AssadÃ¢â‚¬â„¢s regime is killing its people should not be taken seriously. He explains that NarwaniÃ¢â‚¬â„¢s article, which talks about the regime killing its people, is distributed crazily and cited as future proof that Syria subjected to conspiracy (Maysaloon 2). The point that he is trying to put across is that the fact that a piece is well-written does not mean it holds the truth. Essentially, Maysaloon uses this kind of argument, which is based on credible evidence, to invalidate the assertions of Narwani in a way that really appeals to the readers. This is a perfect application of logos, which increase authenticity of the authorÃ¢â‚¬â„¢s claims. Ideally, the reader would identify with this kind of logic, which is very appealing and convincing. Maysaloon increases the appeal of his argument by logically analyzing the happenings in the Arab League, in a way that disputes NarwaniÃ¢â‚¬â„¢s main theme. Narwani explains that there are armed groups fighting the regime, which were not mentioned in the protocol. Maysaloon acknowledged this as a fact, but uses pathos to create a false sense of pity for NarwaniÃ¢â‚¬â„¢s tendency to create fabrications of the issues regarding the international media. This aspect is brought out clearly when he says Ã¢â‚¬Å"It is curious that Miss Narwani seems to think that the conventional narrative does not mention an armed element to SyriaÃ¢â‚¬â„¢s uprising, when it doesÃ¢â‚¬ (Maysaloon 5). Here, Maysaloon uses the word Ã¢â‚¬ËœcuriousÃ¢â‚¬â„¢ to create the illusion that he would not expect a person of the class of Narwani to reason in such a manner, and could not identify why she misunderstood traditional narrative. This word is used to create a condescending tone and pathos, which is a clever way improving the appeal of oneÃ¢â‚¬â„¢s ideas. Ideally, the pathos and logos are used to portray Narwani as incapable of writing sensible articles for media publication. The use of logos throughout the article, therefore, makes the readers want to know more about what Maysaloon is discussing, and identifies with NarwaniÃ¢â‚¬â„¢s assertions as lacking credence. In her article, Ã¢â‚¬Å"Foolishly ignoring the Arabs League report on SyriaÃ¢â‚¬ (Narwani 1), Narwani says that the international media completely ignores the armed entities that are also fighting against the regime, a fact that Maysaloon disputes strongly by use of logos. As a result, the audience is easily convinced that Narwani claims are misinformed. To support his divergent views, Maysaloon says that the conventional narrative that Narwani refers to exist only Ã¢â‚¬Å"in the mind of most ardent supporters of AssadÃ¢â‚¬ . He adds that the media has, in fact, made many reports about groups such as the free Syrian army and oth er local groups that are attempting to protect themselves from the regime (Maysaloon 5). Maysaloon further discredits Narwani through a simple observation that she never cites any reference or source regarding the claim that there is a media conspiracy, which undermines or degrades the Arab League mission. Maysaloon is also using logos through reasoning and logic, in order to rule out the farces, and hence, seek the truth. NarwaniÃ¢â‚¬â„¢s notion regarding conspiracy has no foundation and,

Thursday, October 17, 2019

Module 4 - SLP THREAT ANALYSIS Essay Example | Topics and Well Written Essays - 500 words

Module 4 - SLP THREAT ANALYSIS - Essay Example This follows the fact that terrorism activities are often targeted at specific places and sites, and not general areas. This development has led to curiosity and the need for research on what exactly champions the terroristsÃ¢â‚¬â„¢ thinking and planning of their activities. With reference to choice of targets, it is evident that terrorists do not make random choices like those witnessed with freedom fighters or liberation movements. Rislien & Rislien (2010) have observed that, Ã¢â‚¬Å"Ã¢â‚¬ ¦one may add the indispensability of ideology, not only because it provides the initial dynamic for the terroristsÃ¢â‚¬â„¢ actions, but because it sets out the moral framework within which they operate.Ã¢â‚¬ (P. 134). There seems to have been some drastic changes in the terroristsÃ¢â‚¬â„¢ targeting policies; traditionally, those terrorists that had political ambitions would target major installation to attract media attention and general loss of life. For this reason, commercial centres were least spared. Currently, various considerations are put into perspective by terrorists; (Dugdale (2005), points out that, Ã¢â‚¬Å"First, they do a risk analysisÃ¢â‚¬ (p. 1). Terroriats make careful considerations of the potential benefits they stand to get from the target; this is often in contrast to the amount of resources and potential for success. In the same way, Rislien & Rislien (2010), explain that, Ã¢â‚¬Å"Academia thus claims that terrorism is rational and has a clear singleness of purposeÃ¢â‚¬ (p. 134), something that underpins the need for an understanding about their target choices and decision. It is this analysis that leads to ear marking of places as Ã¢â‚¬Å"softÃ¢â‚¬ and Ã¢â‚¬Å"hardÃ¢â‚¬ targets by the particular terrorist groups. In this analysis, a Ã¢â‚¬Å"hardÃ¢â‚¬ target is that which has considerable security and may make the terrorists be easily intercepted in the course of their actions. On the other hand, a Ã¢â‚¬Å"softÃ¢â‚¬ target is that which has few security considerations

Armani Hotel (Dubai) - Managing Customer Service Essay

Armani Hotel (Dubai) - Managing Customer Service - Essay Example The world of 21st century is an arena that promotes fast growth, tremendous development and high competition. The high level of internet connectivity in various corners of the world, along with the existence of the open economies has provided the platform for demand of products and services of international standards. The luxury sector, especially the hospitality sector always demands international standards and qualities of services mostly because of its need to maintain a homogenous quality and standard of high level of customer service at all of its location of presence. It is important to say that in the steady cycle of economic peaks and troughs that has continued to affect the business prospects of various countries in the recent times; the luxury hospitality sector has always maintained a steady level of growth all the time. The reason behind it can be attached to the fact that the luxury sector always experiences an inelastic demand mostly because of its significant choosing of its target audience, which are mostly comprised of the elite and extremely rich people. Brief on Service Marketing Services can be defined as a concept which represents intangible actions and attributes that are performed by individuals or a team of individuals for the purpose of providing superior level of value perception to the consumers in regards to their individual requirements of value of tangible or intangible nature (Rao, 2011, p. 5). Talking a little more about services, it can be said that because of its characteristics, services are a little different from the products. In case of services, the characteristics like the intangibility, homogeneity, inseparability and perishability exists. (Shanker, 2002, p. 36). While talking about marketing of services, it is important to mention that it surely includes the marketing features associated with the highly popular 4PÃ¢â‚¬â„¢s concept. For the purpose of attaining success in a highly competitive environment, the value of services needs to be created, communicated, distributed and captured for the right target audience. However, it is very important to mention that there are three other variables that help in the process of providing value to the customers. The factors of people, process and physical evidence has to be mentioned without ignorance (Bhattacharya, 2006, p. 117). The existence of the three new variables is very important as it helps in a great way in the process of communication of value of the services to the consumers (Zeithaml & et.al, 2011, p. 21). It can be said that for the purpose of providing high level of services to generate superior customer satisfaction and hence increase profitability of the serv ices, analysing of the services in regards to its ability to meet customersÃ¢â‚¬â„¢ expectations is very important on a regular basis. For developing a successful analysis of the services, the GAP model can be used (Lamb, Hair and McDaniel, 2008, p. 354). It is important to mention in these regards that the GAP model of service quality tries to analyze the service offerings of any enterprise from the perspectives of both the customer as well as the service provider. Source: Lamb, Hair and McDaniel, 2008, p. 354 Overview of customer relationship marketing (CRM) It has to be said that in the case of customer relationship marketing, it belongs to the division of marketing of services. In the context of marketing of products as well as services, it has often been realized that retaining of customers helps in increasing the profitability of the organization at a comparatively lower costs, rather than aggressive acquisition of new customers on a regular basis

Wednesday, October 16, 2019

Shermine Narwani and Maysaloon Albadri Research Paper

Armani Hotel (Dubai) - Managing Customer Service Essay

Tuesday, October 15, 2019

Researched on magazines Essay Example for Free

Researched on magazines Essay My magazine is called Flava and it is aimed at teenagers, as when I researched on magazines I found this one was quite popular. I spent 4 weeks on my magazine and put a lot of effort into it. Before I started to produce my magazine I planned out how I was going to set it out and what type of things I was going to include in it. I chose the above features, as they are the basic things included in a teenage magazine. I used Microsoft Publisher for the majority of it but I also used Microsoft word for things such as my real life stories. I found Publisher better because you get a wide variety of different backgrounds and formats whereas in Microsoft Word it is more basic. I used a number of different formats and fonts. I did a lot of research on the Internet using Yahoo and Google. I worked with another pupil in my class, Khiley Williams, and we both came up with our own ideas. The pages I produced was the, Dear Angel problem page, the album review of Christina Aguileras Stripped, dish of the day page, the real life story of How I coped with Anorexia, the front cover, the celebrity page (all the celebs dressed in black), and the whats hot and whats not page. The page which required the most research was the real life story but I also put a lot of effort into the front cover as I wanted to make it eye catching and interesting. I used Christina Aguilera on my front cover as she is hot and sexy and catches peoples eye as they look at the magazine. She is also a role model for a lot of young people so they would want to read anything that they see her on. The front of my magazine is bright pink as this also helps to draw peoples attention to it and would hopefully be intrigued as to what is inside it once they start looking over the cover. I used the band Busteds logo on my front cover as well as there is a feature on them inside the magazine. To get the logo I went onto the Official Busted website, www. busted. com, and had to cut, copy and paste it onto publisher. I then had to fill in its original red background with pink to match the background of my front cover. Also on the front cover, I have included the price, a barcode and a logo, Girls with taste get Flava. For my barcode I used the search engine Google and typed in barcodes. I found one and cut copied and pasted again.

Monday, October 14, 2019

Data Pre-processing Tool

Data Pre-processing Tool Chapter- 2 Real life data rarely comply with the necessities of various data mining tools. It is usually inconsistent and noisy. It may contain redundant attributes, unsuitable formats etc. Hence data has to be prepared vigilantly before the data mining actually starts. It is well known fact that success of a data mining algorithm is very much dependent on the quality of data processing. Data processing is one of the most important tasks in data mining. In this context it is natural that data pre-processing is a complicated task involving large data sets. Sometimes data pre-processing take more than 50% of the total time spent in solving the data mining problem. It is crucial for data miners to choose efficient data preprocessing technique for specific data set which can not only save processing time but also retain the quality of the data for data mining process. A data pre-processing tool should help miners with many data mining activates. For example, data may be provided in different formats as discussed in previous chapter (flat files, database files etc). Data files may also have different formats of values, calculation of derived attributes, data filters, joined data sets etc. Data mining process generally starts with understanding of data. In this stage pre-processing tools may help with data exploration and data discovery tasks. Data processing includes lots of tedious works, Data pre-processing generally consists of Data Cleaning Data Integration Data Transformation And Data Reduction. In this chapter we will study all these data pre-processing activities. 2.1 Data Understanding In Data understanding phase the first task is to collect initial data and then proceed with activities in order to get well known with data, to discover data quality problems, to discover first insight into the data or to identify interesting subset to form hypothesis for hidden information. The data understanding phase according to CRISP model can be shown in following . 2.1.1 Collect Initial Data The initial collection of data includes loading of data if required for data understanding. For instance, if specific tool is applied for data understanding, it makes great sense to load your data into this tool. This attempt possibly leads to initial data preparation steps. However if data is obtained from multiple data sources then integration is an additional issue. 2.1.2 Describe data Here the gross or surface properties of the gathered data are examined. 2.1.3 Explore data This task is required to handle the data mining questions, which may be addressed using querying, visualization and reporting. These include: Sharing of key attributes, for instance the goal attribute of a prediction task Relations between pairs or small numbers of attributes Results of simple aggregations Properties of important sub-populations Simple statistical analyses. 2.1.4 Verify data quality In this step quality of data is examined. It answers questions such as: Is the data complete (does it cover all the cases required)? Is it accurate or does it contains errors and if there are errors how common are they? Are there missing values in the data? If so how are they represented, where do they occur and how common are they? 2.2 Data Preprocessing Data preprocessing phase focus on the pre-processing steps that produce the data to be mined. Data preparation or preprocessing is one most important step in data mining. Industrial practice indicates that one data is well prepared; the mined results are much more accurate. This means this step is also a very critical fro success of data mining method. Among others, data preparation mainly involves data cleaning, data integration, data transformation, and reduction. 2.2.1 Data Cleaning Data cleaning is also known as data cleansing or scrubbing. It deals with detecting and removing inconsistencies and errors from data in order to get better quality data. While using a single data source such as flat files or databases data quality problems arises due to misspellings while data entry, missing information or other invalid data. While the data is taken from the integration of multiple data sources such as data warehouses, federated database systems or global web-based information systems, the requirement for data cleaning increases significantly. This is because the multiple sources may contain redundant data in different formats. Consolidation of different data formats abs elimination of redundant information becomes necessary in order to provide access to accurate and consistent data. Good quality data requires passing a set of quality criteria. Those criteria include: Accuracy: Accuracy is an aggregated value over the criteria of integrity, consistency and density. Integrity: Integrity is an aggregated value over the criteria of completeness and validity. Completeness: completeness is achieved by correcting data containing anomalies. Validity: Validity is approximated by the amount of data satisfying integrity constraints. Consistency: consistency concerns contradictions and syntactical anomalies in data. Uniformity: it is directly related to irregularities in data. Density: The density is the quotient of missing values in the data and the number of total values ought to be known. Uniqueness: uniqueness is related to the number of duplicates present in the data. 2.2.1.1 Terms Related to Data Cleaning Data cleaning: data cleaning is the process of detecting, diagnosing, and editing damaged data. Data editing: data editing means changing the value of data which are incorrect. Data flow: data flow is defined as passing of recorded information through succeeding information carriers. Inliers: Inliers are data values falling inside the projected range. Outlier: outliers are data value falling outside the projected range. Robust estimation: evaluation of statistical parameters, using methods that are less responsive to the effect of outliers than more conventional methods are called robust method. 2.2.1.2 Definition: Data Cleaning Data cleaning is a process used to identify imprecise, incomplete, or irrational data and then improving the quality through correction of detected errors and omissions. This process may include format checks Completeness checks Reasonableness checks Limit checks Review of the data to identify outliers or other errors Assessment of data by subject area experts (e.g. taxonomic specialists). By this process suspected records are flagged, documented and checked subsequently. And finally these suspected records can be corrected. Sometimes validation checks also involve checking for compliance against applicable standards, rules, and conventions. The general framework for data cleaning given as: Define and determine error types; Search and identify error instances; Correct the errors; Document error instances and error types; and Modify data entry procedures to reduce future errors. Data cleaning process is referred by different people by a number of terms. It is a matter of preference what one uses. These terms include: Error Checking, Error Detection, Data Validation, Data Cleaning, Data Cleansing, Data Scrubbing and Error Correction. We use Data Cleaning to encompass three sub-processes, viz. Data checking and error detection; Data validation; and Error correction. A fourth improvement of the error prevention processes could perhaps be added. 2.2.1.3 Problems with Data Here we just note some key problems with data Missing data : This problem occur because of two main reasons Data are absent in source where it is expected to be present. Some times data is present are not available in appropriately form Detecting missing data is usually straightforward and simpler. Erroneous data: This problem occurs when a wrong value is recorded for a real world value. Detection of erroneous data can be quite difficult. (For instance the incorrect spelling of a name) Duplicated data : This problem occur because of two reasons Repeated entry of same real world entity with some different values Some times a real world entity may have different identifications. Repeat records are regular and frequently easy to detect. The different identification of the same real world entities can be a very hard problem to identify and solve. Heterogeneities: When data from different sources are brought together in one analysis problem heterogeneity may occur. Heterogeneity could be Structural heterogeneity arises when the data structures reflect different business usage Semantic heterogeneity arises when the meaning of data is different n each system that is being combined Heterogeneities are usually very difficult to resolve since because they usually involve a lot of contextual data that is not well defined as metadata. Information dependencies in the relationship between the different sets of attribute are commonly present. Wrong cleaning mechanisms can further damage the information in the data. Various analysis tools handle these problems in different ways. Commercial offerings are available that assist the cleaning process, but these are often problem specific. Uncertainty in information systems is a well-recognized hard problem. In following a very simple examples of missing and erroneous data is shown Extensive support for data cleaning must be provided by data warehouses. Data warehouses have high probability of Ã¢â‚¬Å"dirty dataÃ¢â‚¬ since they load and continuously refresh huge amounts of data from a variety of sources. Since these data warehouses are used for strategic decision making therefore the correctness of their data is important to avoid wrong decisions. The ETL (Extraction, Transformation, and Loading) process for building a data warehouse is illustrated in following . Data transformations are related with schema or data translation and integration, and with filtering and aggregating data to be stored in the data warehouse. All data cleaning is classically performed in a separate data performance area prior to loading the transformed data into the warehouse. A large number of tools of varying functionality are available to support these tasks, but often a significant portion of the cleaning and transformation work has to be done manually or by low-level programs that are difficult to write and maintain. A data cleaning method should assure following: It should identify and eliminate all major errors and inconsistencies in an individual data sources and also when integrating multiple sources. Data cleaning should be supported by tools to bound manual examination and programming effort and it should be extensible so that can cover additional sources. It should be performed in association with schema related data transformations based on metadata. Data cleaning mapping functions should be specified in a declarative way and be reusable for other data sources. 2.2.1.4 Data Cleaning: Phases 1. Analysis: To identify errors and inconsistencies in the database there is a need of detailed analysis, which involves both manual inspection and automated analysis programs. This reveals where (most of) the problems are present. 2. Defining Transformation and Mapping Rules: After discovering the problems, this phase are related with defining the manner by which we are going to automate the solutions to clean the data. We will find various problems that translate to a list of activities as a result of analysis phase. Example: Remove all entries for J. Smith because they are duplicates of John Smith Find entries with `bule in colour field and change these to `blue. Find all records where the Phone number field does not match the pattern (NNNNN NNNNNN). Further steps for cleaning this data are then applied. Etc Ã¢â‚¬ ¦ 3. Verification: In this phase we check and assess the transformation plans made in phase- 2. Without this step, we may end up making the data dirtier rather than cleaner. Since data transformation is the main step that actually changes the data itself so there is a need to be sure that the applied transformations will do it correctly. Therefore test and examine the transformation plans very carefully. Example: Let we have a very thick C++ book where it says strict in all the places where it should say struct 4. Transformation: Now if it is sure that cleaning will be done correctly, then apply the transformation verified in last step. For large database, this task is supported by a variety of tools Backflow of Cleaned Data: In a data mining the main objective is to convert and move clean data into target system. This asks for a requirement to purify legacy data. Cleansing can be a complicated process depending on the technique chosen and has to be designed carefully to achieve the objective of removal of dirty data. Some methods to accomplish the task of data cleansing of legacy system include: n Automated data cleansing n Manual data cleansing n The combined cleansing process 2.2.1.5 Missing Values Data cleaning addresses a variety of data quality problems, including noise and outliers, inconsistent data, duplicate data, and missing values. Missing values is one important problem to be addressed. Missing value problem occurs because many tuples may have no record for several attributes. For Example there is a customer sales database consisting of a whole bunch of records (lets say around 100,000) where some of the records have certain fields missing. Lets say customer income in sales data may be missing. Goal here is to find a way to predict what the missing data values should be (so that these can be filled) based on the existing data. Missing data may be due to following reasons Equipment malfunction Inconsistent with other recorded data and thus deleted Data not entered due to misunderstanding Certain data may not be considered important at the time of entry Not register history or changes of the data How to Handle Missing Values? Dealing with missing values is a regular question that has to do with the actual meaning of the data. There are various methods for handling missing entries 1. Ignore the data row. One solution of missing values is to just ignore the entire data row. This is generally done when the class label is not there (here we are assuming that the data mining goal is classification), or many attributes are missing from the row (not just one). But if the percentage of such rows is high we will definitely get a poor performance. 2. Use a global constant to fill in for missing values. We can fill in a global constant for missing values such as unknown, N/A or minus infinity. This is done because at times is just doesnt make sense to try and predict the missing value. For example if in customer sales database if, say, office address is missing for some, filling it in doesnt make much sense. This method is simple but is not full proof. 3. Use attribute mean. Let say if the average income of a a family is X you can use that value to replace missing income values in the customer sales database. 4. Use attribute mean for all samples belonging to the same class. Lets say you have a cars pricing DB that, among other things, classifies cars to Luxury and Low budget and youre dealing with missing values in the cost field. Replacing missing cost of a luxury car with the average cost of all luxury cars is probably more accurate then the value youd get if you factor in the low budget 5. Use data mining algorithm to predict the value. The value can be determined using regression, inference based tools using Bayesian formalism, decision trees, clustering algorithms etc. 2.2.1.6 Noisy Data Noise can be defined as a random error or variance in a measured variable. Due to randomness it is very difficult to follow a strategy for noise removal from the data. Real world data is not always faultless. It can suffer from corruption which may impact the interpretations of the data, models created from the data, and decisions made based on the data. Incorrect attribute values could be present because of following reasons Faulty data collection instruments Data entry problems Duplicate records Incomplete data: Inconsistent data Incorrect processing Data transmission problems Technology limitation. Inconsistency in naming convention Outliers How to handle Noisy Data? The methods for removing noise from data are as follows. 1. Binning: this approach first sort data and partition it into (equal-frequency) bins then one can smooth it using- Bin means, smooth using bin median, smooth using bin boundaries, etc. 2. Regression: in this method smoothing is done by fitting the data into regression functions. 3. Clustering: clustering detect and remove outliers from the data. 4. Combined computer and human inspection: in this approach computer detects suspicious values which are then checked by human experts (e.g., this approach deal with possible outliers).. Following methods are explained in detail as follows: Binning: Data preparation activity that converts continuous data to discrete data by replacing a value from a continuous range with a bin identifier, where each bin represents a range of values. For instance, age can be changed to bins such as 20 or under, 21-40, 41-65 and over 65. Binning methods smooth a sorted data set by consulting values around it. This is therefore called local smoothing. Let consider a binning example Binning Methods n Equal-width (distance) partitioning Divides the range into N intervals of equal size: uniform grid if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N. The most straightforward, but outliers may dominate presentation Skewed data is not handled well n Equal-depth (frequency) partitioning 1. It divides the range (values of a given attribute) into N intervals, each containing approximately same number of samples (elements) 2. Good data scaling 3. Managing categorical attributes can be tricky. n Smooth by bin means- Each bin value is replaced by the mean of values n Smooth by bin medians- Each bin value is replaced by the median of values n Smooth by bin boundaries Each bin value is replaced by the closest boundary value Example Let Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 n Partition into equal-frequency (equi-depth) bins: o Bin 1: 4, 8, 9, 15 o Bin 2: 21, 21, 24, 25 o Bin 3: 26, 28, 29, 34 n Smoothing by bin means: o Bin 1: 9, 9, 9, 9 ( for example mean of 4, 8, 9, 15 is 9) o Bin 2: 23, 23, 23, 23 o Bin 3: 29, 29, 29, 29 n Smoothing by bin boundaries: o Bin 1: 4, 4, 4, 15 o Bin 2: 21, 21, 25, 25 o Bin 3: 26, 26, 26, 34 Regression: Regression is a DM technique used to fit an equation to a dataset. The simplest form of regression is linear regression which uses the formula of a straight line (y = b+ wx) and determines the suitable values for b and w to predict the value of y based upon a given value of x. Sophisticated techniques, such as multiple regression, permit the use of more than one input variable and allow for the fitting of more complex models, such as a quadratic equation. Regression is further described in subsequent chapter while discussing predictions. Clustering: clustering is a method of grouping data into different groups , so that data in each group share similar trends and patterns. Clustering constitute a major class of data mining algorithms. These algorithms automatically partitions the data space into set of regions or cluster. The goal of the process is to find all set of similar examples in data, in some optimal fashion. Following shows three clusters. Values that fall outsid e the cluster are outliers. 4. Combined computer and human inspection: These methods find the suspicious values using the computer programs and then they are verified by human experts. By this process all outliers are checked. 2.2.1.7 Data cleaning as a process Data cleaning is the process of Detecting, Diagnosing, and Editing Data. Data cleaning is a three stage method involving repeated cycle of screening, diagnosing, and editing of suspected data abnormalities. Many data errors are detected by the way during study activities. However, it is more efficient to discover inconsistencies by actively searching for them in a planned manner. It is not always right away clear whether a data point is erroneous. Many times it requires careful examination. Likewise, missing values require additional check. Therefore, predefined rules for dealing with errors and true missing and extreme values are part of good practice. One can monitor for suspect features in survey questionnaires, databases, or analysis data. In small studies, with the examiner intimately involved at all stages, there may be small or no difference between a database and an analysis dataset. During as well as after treatment, the diagnostic and treatment phases of cleaning need insight into the sources and types of errors at all stages of the study. Data flow concept is therefore crucial in this respect. After measurement the research data go through repeated steps of- entering into information carriers, extracted, and transferred to other carriers, edited, selected, transformed, summarized, and presented. It is essential to understand that errors can occur at any stage of the data flow, including during data cleaning itself. Most of these problems are due to human error. Inaccuracy of a single data point and measurement may be tolerable, and associated to the inherent technological error of the measurement device. Therefore the process of data clenaning mus focus on those errors that are beyond small technical variations and that form a major shift within or beyond the population distribution. In turn, it must be based on understanding of technical errors and expected ranges of normal values. Some errors are worthy of higher priority, but which ones are most significant is highly study-specific. For instance in most medical epidemiological studies, errors that need to be cleaned, at all costs, include missing gender, gender misspecification, birth date or examination date errors, duplications or merging of records, and biologically impossible results. Another example is in nutrition studies, date errors lead to age errors, which in turn lead to errors in weight-for-age scoring and, further, to misclassification of subjects as under- or overweight. Errors of sex and date are particularly important because they contaminate derived variables. Prioritization is essential if the study is under time pressures or if resources for data cleaning are limited. 2.2.2 Data Integration This is a process of taking data from one or more sources and mapping it, field by field, onto a new data structure. Idea is to combine data from multiple sources into a coherent form. Various data mining projects requires data from multiple sources because n Data may be distributed over different databases or data warehouses. (for example an epidemiological study that needs information about hospital admissions and car accidents) n Sometimes data may be required from different geographic distributions, or there may be need for historical data. (e.g. integrate historical data into a new data warehouse) n There may be a necessity of enhancement of data with additional (external) data. (for improving data mining precision) 2.2.2.1 Data Integration Issues There are number of issues in data integrations. Consider two database tables. Imagine two database tables Database Table-1 Database Table-2 In integration of there two tables there are variety of issues involved such as 1. The same attribute may have different names (for example in above tables Name and Given Name are same attributes with different names) 2. An attribute may be derived from another (for example attribute Age is derived from attribute DOB) 3. Attributes might be redundant( For example attribute PID is redundant) 4. Values in attributes might be different (for example for PID 4791 values in second and third field are different in both the tables) 5. Duplicate records under different keys( there is a possibility of replication of same record with different key values) Therefore schema integration and object matching can be trickier. Question here is how equivalent entities from different sources are matched? This problem is known as entity identification problem. Conflicts have to be detected and resolved. Integration becomes easier if unique entity keys are available in all the data sets (or tables) to be linked. Metadata can help in schema integration (example of metadata for each attribute includes the name, meaning, data type and range of values permitted for the attribute) 2.2.2.1 Redundancy Redundancy is another important issue in data integration. Two given attribute (such as DOB and age for instance in give table) may be redundant if one is derived form the other attribute or set of attributes. Inconsistencies in attribute or dimension naming can lead to redundancies in the given data sets. Handling Redundant Data We can handle data redundancy problems by following ways n Use correlation analysis n Different coding / representation has to be considered (e.g. metric / imperial measures) n Careful (manual) integration of the data can reduce or prevent redundancies (and inconsistencies) n De-duplication (also called internal data linkage) o If no unique entity keys are available o Analysis of values in attributes to find duplicates n Process redundant and inconsistent data (easy if values are the same) o Delete one of the values o Average values (only for numerical attributes) o Take majority values (if more than 2 duplicates and some values are the same) Correlation analysis is explained in detail here. Correlation analysis (also called Pearsons product moment coefficient): some redundancies can be detected by using correlation analysis. Given two attributes, such analysis can measure how strong one attribute implies another. For numerical attribute we can compute correlation coefficient of two attributes A and B to evaluate the correlation between them. This is given by Where n n is the number of tuples, n and are the respective means of A and B n ÃÆ'A and ÃÆ'B are the respective standard deviation of A and B n ÃŽ £(AB) is the sum of the AB cross-product. a. If -1 b. If rA, B is equal to zero it indicates A and B are independent of each other and there is no correlation between them. c. If rA, B is less than zero then A and B are negatively correlated. , where if value of one attribute increases value of another attribute decreases. This means that one attribute discourages another attribute. It is important to note that correlation does not imply causality. That is, if A and B are correlated, this does not essentially mean that A causes B or that B causes A. for example in analyzing a demographic database, we may find that attribute representing number of accidents and the number of car theft in a region are correlated. This does not mean that one is related to another. Both may be related to third attribute, namely population. For discrete data, a correlation relation between two attributes, can be discovered by a Ãâ€¡Ã‚ ²(chi-square) test. Let A has c distinct values a1,a2,Ã¢â‚¬ ¦Ã¢â‚¬ ¦ac and B has r different values namely b1,b2,Ã¢â‚¬ ¦Ã¢â‚¬ ¦br The data tuple described by A and B are shown as contingency table, with c values of A (making up columns) and r values of B( making up rows). Each and every (Ai, Bj) cell in table has. X^2 = sum_{i=1}^{r} sum_{j=1}^{c} {(O_{i,j} E_{i,j})^2 over E_{i,j}} . Where n Oi, j is the observed frequency (i.e. actual count) of joint event (Ai, Bj) and n Ei, j is the expected frequency which can be computed as E_{i,j}=frac{sum_{k=1}^{c} O_{i,k} sum_{k=1}^{r} O_{k,j}}{N} , , Where n N is number of data tuple n Oi,k is number of tuples having value ai for A n Ok,j is number of tuples having value bj for B The larger the Ãâ€¡Ã‚ ² value, the more likely the variables are related. The cells that contribute the most to the Ãâ€¡Ã‚ ² value are those whose actual count is very different from the expected count Chi-Square Calculation: An Example Suppose a group of 1,500 people were surveyed. The gender of each person was noted. Each person has polled their preferred type of reading material as fiction or non-fiction. The observed frequency of each possible joint event is summarized in following table.( number in parenthesis are expected frequencies) . Calculate chi square. Play chess Not play chess Sum (row) Like science fiction 250(90) 200(360) 450 Not like science fiction 50(210) 1000(840) 1050 Sum(col.) 300 1200 1500 E11 = count (male)*count(fiction)/N = 300 * 450 / 1500 =90 and so on For this table the degree of freedom are (2-1)(2-1) =1 as table is 2X2. for 1 degree of freedom , the Ãâ€¡Ã‚ ² value needed to reject the hypothesis at the 0.001 significance level is 10.828 (taken from the table of upper percentage point of the Ãâ€¡Ã‚ ² distribution typically available in any statistic text book). Since the computed value is above this, we can reject the hypothesis that gender and preferred reading are independent and conclude that two attributes are strongly correlated for given group. Duplication must also be detected at the tuple level. The use of renormalized tables is also a source of redundancies. Redundancies may further lead to data inconsistencies (due to updating some but not others). 2.2.2.2 Detection and resolution of data value conflicts Another significant issue in data integration is the discovery and resolution of data value conflicts. For example, for the same entity, attribute values from different sources may differ. For example weight can be stored in metric unit in one source and British imperial unit in another source. For instance, for a hotel cha Data Pre-processing Tool Data Pre-processing Tool Chapter- 2 Real life data rarely comply with the necessities of various data mining tools. It is usually inconsistent and noisy. It may contain redundant attributes, unsuitable formats etc. Hence data has to be prepared vigilantly before the data mining actually starts. It is well known fact that success of a data mining algorithm is very much dependent on the quality of data processing. Data processing is one of the most important tasks in data mining. In this context it is natural that data pre-processing is a complicated task involving large data sets. Sometimes data pre-processing take more than 50% of the total time spent in solving the data mining problem. It is crucial for data miners to choose efficient data preprocessing technique for specific data set which can not only save processing time but also retain the quality of the data for data mining process. A data pre-processing tool should help miners with many data mining activates. For example, data may be provided in different formats as discussed in previous chapter (flat files, database files etc). Data files may also have different formats of values, calculation of derived attributes, data filters, joined data sets etc. Data mining process generally starts with understanding of data. In this stage pre-processing tools may help with data exploration and data discovery tasks. Data processing includes lots of tedious works, Data pre-processing generally consists of Data Cleaning Data Integration Data Transformation And Data Reduction. In this chapter we will study all these data pre-processing activities. 2.1 Data Understanding In Data understanding phase the first task is to collect initial data and then proceed with activities in order to get well known with data, to discover data quality problems, to discover first insight into the data or to identify interesting subset to form hypothesis for hidden information. The data understanding phase according to CRISP model can be shown in following . 2.1.1 Collect Initial Data The initial collection of data includes loading of data if required for data understanding. For instance, if specific tool is applied for data understanding, it makes great sense to load your data into this tool. This attempt possibly leads to initial data preparation steps. However if data is obtained from multiple data sources then integration is an additional issue. 2.1.2 Describe data Here the gross or surface properties of the gathered data are examined. 2.1.3 Explore data This task is required to handle the data mining questions, which may be addressed using querying, visualization and reporting. These include: Sharing of key attributes, for instance the goal attribute of a prediction task Relations between pairs or small numbers of attributes Results of simple aggregations Properties of important sub-populations Simple statistical analyses. 2.1.4 Verify data quality In this step quality of data is examined. It answers questions such as: Is the data complete (does it cover all the cases required)? Is it accurate or does it contains errors and if there are errors how common are they? Are there missing values in the data? If so how are they represented, where do they occur and how common are they? 2.2 Data Preprocessing Data preprocessing phase focus on the pre-processing steps that produce the data to be mined. Data preparation or preprocessing is one most important step in data mining. Industrial practice indicates that one data is well prepared; the mined results are much more accurate. This means this step is also a very critical fro success of data mining method. Among others, data preparation mainly involves data cleaning, data integration, data transformation, and reduction. 2.2.1 Data Cleaning Data cleaning is also known as data cleansing or scrubbing. It deals with detecting and removing inconsistencies and errors from data in order to get better quality data. While using a single data source such as flat files or databases data quality problems arises due to misspellings while data entry, missing information or other invalid data. While the data is taken from the integration of multiple data sources such as data warehouses, federated database systems or global web-based information systems, the requirement for data cleaning increases significantly. This is because the multiple sources may contain redundant data in different formats. Consolidation of different data formats abs elimination of redundant information becomes necessary in order to provide access to accurate and consistent data. Good quality data requires passing a set of quality criteria. Those criteria include: Accuracy: Accuracy is an aggregated value over the criteria of integrity, consistency and density. Integrity: Integrity is an aggregated value over the criteria of completeness and validity. Completeness: completeness is achieved by correcting data containing anomalies. Validity: Validity is approximated by the amount of data satisfying integrity constraints. Consistency: consistency concerns contradictions and syntactical anomalies in data. Uniformity: it is directly related to irregularities in data. Density: The density is the quotient of missing values in the data and the number of total values ought to be known. Uniqueness: uniqueness is related to the number of duplicates present in the data. 2.2.1.1 Terms Related to Data Cleaning Data cleaning: data cleaning is the process of detecting, diagnosing, and editing damaged data. Data editing: data editing means changing the value of data which are incorrect. Data flow: data flow is defined as passing of recorded information through succeeding information carriers. Inliers: Inliers are data values falling inside the projected range. Outlier: outliers are data value falling outside the projected range. Robust estimation: evaluation of statistical parameters, using methods that are less responsive to the effect of outliers than more conventional methods are called robust method. 2.2.1.2 Definition: Data Cleaning Data cleaning is a process used to identify imprecise, incomplete, or irrational data and then improving the quality through correction of detected errors and omissions. This process may include format checks Completeness checks Reasonableness checks Limit checks Review of the data to identify outliers or other errors Assessment of data by subject area experts (e.g. taxonomic specialists). By this process suspected records are flagged, documented and checked subsequently. And finally these suspected records can be corrected. Sometimes validation checks also involve checking for compliance against applicable standards, rules, and conventions. The general framework for data cleaning given as: Define and determine error types; Search and identify error instances; Correct the errors; Document error instances and error types; and Modify data entry procedures to reduce future errors. Data cleaning process is referred by different people by a number of terms. It is a matter of preference what one uses. These terms include: Error Checking, Error Detection, Data Validation, Data Cleaning, Data Cleansing, Data Scrubbing and Error Correction. We use Data Cleaning to encompass three sub-processes, viz. Data checking and error detection; Data validation; and Error correction. A fourth improvement of the error prevention processes could perhaps be added. 2.2.1.3 Problems with Data Here we just note some key problems with data Missing data : This problem occur because of two main reasons Data are absent in source where it is expected to be present. Some times data is present are not available in appropriately form Detecting missing data is usually straightforward and simpler. Erroneous data: This problem occurs when a wrong value is recorded for a real world value. Detection of erroneous data can be quite difficult. (For instance the incorrect spelling of a name) Duplicated data : This problem occur because of two reasons Repeated entry of same real world entity with some different values Some times a real world entity may have different identifications. Repeat records are regular and frequently easy to detect. The different identification of the same real world entities can be a very hard problem to identify and solve. Heterogeneities: When data from different sources are brought together in one analysis problem heterogeneity may occur. Heterogeneity could be Structural heterogeneity arises when the data structures reflect different business usage Semantic heterogeneity arises when the meaning of data is different n each system that is being combined Heterogeneities are usually very difficult to resolve since because they usually involve a lot of contextual data that is not well defined as metadata. Information dependencies in the relationship between the different sets of attribute are commonly present. Wrong cleaning mechanisms can further damage the information in the data. Various analysis tools handle these problems in different ways. Commercial offerings are available that assist the cleaning process, but these are often problem specific. Uncertainty in information systems is a well-recognized hard problem. In following a very simple examples of missing and erroneous data is shown Extensive support for data cleaning must be provided by data warehouses. Data warehouses have high probability of Ã¢â‚¬Å"dirty dataÃ¢â‚¬ since they load and continuously refresh huge amounts of data from a variety of sources. Since these data warehouses are used for strategic decision making therefore the correctness of their data is important to avoid wrong decisions. The ETL (Extraction, Transformation, and Loading) process for building a data warehouse is illustrated in following . Data transformations are related with schema or data translation and integration, and with filtering and aggregating data to be stored in the data warehouse. All data cleaning is classically performed in a separate data performance area prior to loading the transformed data into the warehouse. A large number of tools of varying functionality are available to support these tasks, but often a significant portion of the cleaning and transformation work has to be done manually or by low-level programs that are difficult to write and maintain. A data cleaning method should assure following: It should identify and eliminate all major errors and inconsistencies in an individual data sources and also when integrating multiple sources. Data cleaning should be supported by tools to bound manual examination and programming effort and it should be extensible so that can cover additional sources. It should be performed in association with schema related data transformations based on metadata. Data cleaning mapping functions should be specified in a declarative way and be reusable for other data sources. 2.2.1.4 Data Cleaning: Phases 1. Analysis: To identify errors and inconsistencies in the database there is a need of detailed analysis, which involves both manual inspection and automated analysis programs. This reveals where (most of) the problems are present. 2. Defining Transformation and Mapping Rules: After discovering the problems, this phase are related with defining the manner by which we are going to automate the solutions to clean the data. We will find various problems that translate to a list of activities as a result of analysis phase. Example: Remove all entries for J. Smith because they are duplicates of John Smith Find entries with `bule in colour field and change these to `blue. Find all records where the Phone number field does not match the pattern (NNNNN NNNNNN). Further steps for cleaning this data are then applied. Etc Ã¢â‚¬ ¦ 3. Verification: In this phase we check and assess the transformation plans made in phase- 2. Without this step, we may end up making the data dirtier rather than cleaner. Since data transformation is the main step that actually changes the data itself so there is a need to be sure that the applied transformations will do it correctly. Therefore test and examine the transformation plans very carefully. Example: Let we have a very thick C++ book where it says strict in all the places where it should say struct 4. Transformation: Now if it is sure that cleaning will be done correctly, then apply the transformation verified in last step. For large database, this task is supported by a variety of tools Backflow of Cleaned Data: In a data mining the main objective is to convert and move clean data into target system. This asks for a requirement to purify legacy data. Cleansing can be a complicated process depending on the technique chosen and has to be designed carefully to achieve the objective of removal of dirty data. Some methods to accomplish the task of data cleansing of legacy system include: n Automated data cleansing n Manual data cleansing n The combined cleansing process 2.2.1.5 Missing Values Data cleaning addresses a variety of data quality problems, including noise and outliers, inconsistent data, duplicate data, and missing values. Missing values is one important problem to be addressed. Missing value problem occurs because many tuples may have no record for several attributes. For Example there is a customer sales database consisting of a whole bunch of records (lets say around 100,000) where some of the records have certain fields missing. Lets say customer income in sales data may be missing. Goal here is to find a way to predict what the missing data values should be (so that these can be filled) based on the existing data. Missing data may be due to following reasons Equipment malfunction Inconsistent with other recorded data and thus deleted Data not entered due to misunderstanding Certain data may not be considered important at the time of entry Not register history or changes of the data How to Handle Missing Values? Dealing with missing values is a regular question that has to do with the actual meaning of the data. There are various methods for handling missing entries 1. Ignore the data row. One solution of missing values is to just ignore the entire data row. This is generally done when the class label is not there (here we are assuming that the data mining goal is classification), or many attributes are missing from the row (not just one). But if the percentage of such rows is high we will definitely get a poor performance. 2. Use a global constant to fill in for missing values. We can fill in a global constant for missing values such as unknown, N/A or minus infinity. This is done because at times is just doesnt make sense to try and predict the missing value. For example if in customer sales database if, say, office address is missing for some, filling it in doesnt make much sense. This method is simple but is not full proof. 3. Use attribute mean. Let say if the average income of a a family is X you can use that value to replace missing income values in the customer sales database. 4. Use attribute mean for all samples belonging to the same class. Lets say you have a cars pricing DB that, among other things, classifies cars to Luxury and Low budget and youre dealing with missing values in the cost field. Replacing missing cost of a luxury car with the average cost of all luxury cars is probably more accurate then the value youd get if you factor in the low budget 5. Use data mining algorithm to predict the value. The value can be determined using regression, inference based tools using Bayesian formalism, decision trees, clustering algorithms etc. 2.2.1.6 Noisy Data Noise can be defined as a random error or variance in a measured variable. Due to randomness it is very difficult to follow a strategy for noise removal from the data. Real world data is not always faultless. It can suffer from corruption which may impact the interpretations of the data, models created from the data, and decisions made based on the data. Incorrect attribute values could be present because of following reasons Faulty data collection instruments Data entry problems Duplicate records Incomplete data: Inconsistent data Incorrect processing Data transmission problems Technology limitation. Inconsistency in naming convention Outliers How to handle Noisy Data? The methods for removing noise from data are as follows. 1. Binning: this approach first sort data and partition it into (equal-frequency) bins then one can smooth it using- Bin means, smooth using bin median, smooth using bin boundaries, etc. 2. Regression: in this method smoothing is done by fitting the data into regression functions. 3. Clustering: clustering detect and remove outliers from the data. 4. Combined computer and human inspection: in this approach computer detects suspicious values which are then checked by human experts (e.g., this approach deal with possible outliers).. Following methods are explained in detail as follows: Binning: Data preparation activity that converts continuous data to discrete data by replacing a value from a continuous range with a bin identifier, where each bin represents a range of values. For instance, age can be changed to bins such as 20 or under, 21-40, 41-65 and over 65. Binning methods smooth a sorted data set by consulting values around it. This is therefore called local smoothing. Let consider a binning example Binning Methods n Equal-width (distance) partitioning Divides the range into N intervals of equal size: uniform grid if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N. The most straightforward, but outliers may dominate presentation Skewed data is not handled well n Equal-depth (frequency) partitioning 1. It divides the range (values of a given attribute) into N intervals, each containing approximately same number of samples (elements) 2. Good data scaling 3. Managing categorical attributes can be tricky. n Smooth by bin means- Each bin value is replaced by the mean of values n Smooth by bin medians- Each bin value is replaced by the median of values n Smooth by bin boundaries Each bin value is replaced by the closest boundary value Example Let Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 n Partition into equal-frequency (equi-depth) bins: o Bin 1: 4, 8, 9, 15 o Bin 2: 21, 21, 24, 25 o Bin 3: 26, 28, 29, 34 n Smoothing by bin means: o Bin 1: 9, 9, 9, 9 ( for example mean of 4, 8, 9, 15 is 9) o Bin 2: 23, 23, 23, 23 o Bin 3: 29, 29, 29, 29 n Smoothing by bin boundaries: o Bin 1: 4, 4, 4, 15 o Bin 2: 21, 21, 25, 25 o Bin 3: 26, 26, 26, 34 Regression: Regression is a DM technique used to fit an equation to a dataset. The simplest form of regression is linear regression which uses the formula of a straight line (y = b+ wx) and determines the suitable values for b and w to predict the value of y based upon a given value of x. Sophisticated techniques, such as multiple regression, permit the use of more than one input variable and allow for the fitting of more complex models, such as a quadratic equation. Regression is further described in subsequent chapter while discussing predictions. Clustering: clustering is a method of grouping data into different groups , so that data in each group share similar trends and patterns. Clustering constitute a major class of data mining algorithms. These algorithms automatically partitions the data space into set of regions or cluster. The goal of the process is to find all set of similar examples in data, in some optimal fashion. Following shows three clusters. Values that fall outsid e the cluster are outliers. 4. Combined computer and human inspection: These methods find the suspicious values using the computer programs and then they are verified by human experts. By this process all outliers are checked. 2.2.1.7 Data cleaning as a process Data cleaning is the process of Detecting, Diagnosing, and Editing Data. Data cleaning is a three stage method involving repeated cycle of screening, diagnosing, and editing of suspected data abnormalities. Many data errors are detected by the way during study activities. However, it is more efficient to discover inconsistencies by actively searching for them in a planned manner. It is not always right away clear whether a data point is erroneous. Many times it requires careful examination. Likewise, missing values require additional check. Therefore, predefined rules for dealing with errors and true missing and extreme values are part of good practice. One can monitor for suspect features in survey questionnaires, databases, or analysis data. In small studies, with the examiner intimately involved at all stages, there may be small or no difference between a database and an analysis dataset. During as well as after treatment, the diagnostic and treatment phases of cleaning need insight into the sources and types of errors at all stages of the study. Data flow concept is therefore crucial in this respect. After measurement the research data go through repeated steps of- entering into information carriers, extracted, and transferred to other carriers, edited, selected, transformed, summarized, and presented. It is essential to understand that errors can occur at any stage of the data flow, including during data cleaning itself. Most of these problems are due to human error. Inaccuracy of a single data point and measurement may be tolerable, and associated to the inherent technological error of the measurement device. Therefore the process of data clenaning mus focus on those errors that are beyond small technical variations and that form a major shift within or beyond the population distribution. In turn, it must be based on understanding of technical errors and expected ranges of normal values. Some errors are worthy of higher priority, but which ones are most significant is highly study-specific. For instance in most medical epidemiological studies, errors that need to be cleaned, at all costs, include missing gender, gender misspecification, birth date or examination date errors, duplications or merging of records, and biologically impossible results. Another example is in nutrition studies, date errors lead to age errors, which in turn lead to errors in weight-for-age scoring and, further, to misclassification of subjects as under- or overweight. Errors of sex and date are particularly important because they contaminate derived variables. Prioritization is essential if the study is under time pressures or if resources for data cleaning are limited. 2.2.2 Data Integration This is a process of taking data from one or more sources and mapping it, field by field, onto a new data structure. Idea is to combine data from multiple sources into a coherent form. Various data mining projects requires data from multiple sources because n Data may be distributed over different databases or data warehouses. (for example an epidemiological study that needs information about hospital admissions and car accidents) n Sometimes data may be required from different geographic distributions, or there may be need for historical data. (e.g. integrate historical data into a new data warehouse) n There may be a necessity of enhancement of data with additional (external) data. (for improving data mining precision) 2.2.2.1 Data Integration Issues There are number of issues in data integrations. Consider two database tables. Imagine two database tables Database Table-1 Database Table-2 In integration of there two tables there are variety of issues involved such as 1. The same attribute may have different names (for example in above tables Name and Given Name are same attributes with different names) 2. An attribute may be derived from another (for example attribute Age is derived from attribute DOB) 3. Attributes might be redundant( For example attribute PID is redundant) 4. Values in attributes might be different (for example for PID 4791 values in second and third field are different in both the tables) 5. Duplicate records under different keys( there is a possibility of replication of same record with different key values) Therefore schema integration and object matching can be trickier. Question here is how equivalent entities from different sources are matched? This problem is known as entity identification problem. Conflicts have to be detected and resolved. Integration becomes easier if unique entity keys are available in all the data sets (or tables) to be linked. Metadata can help in schema integration (example of metadata for each attribute includes the name, meaning, data type and range of values permitted for the attribute) 2.2.2.1 Redundancy Redundancy is another important issue in data integration. Two given attribute (such as DOB and age for instance in give table) may be redundant if one is derived form the other attribute or set of attributes. Inconsistencies in attribute or dimension naming can lead to redundancies in the given data sets. Handling Redundant Data We can handle data redundancy problems by following ways n Use correlation analysis n Different coding / representation has to be considered (e.g. metric / imperial measures) n Careful (manual) integration of the data can reduce or prevent redundancies (and inconsistencies) n De-duplication (also called internal data linkage) o If no unique entity keys are available o Analysis of values in attributes to find duplicates n Process redundant and inconsistent data (easy if values are the same) o Delete one of the values o Average values (only for numerical attributes) o Take majority values (if more than 2 duplicates and some values are the same) Correlation analysis is explained in detail here. Correlation analysis (also called Pearsons product moment coefficient): some redundancies can be detected by using correlation analysis. Given two attributes, such analysis can measure how strong one attribute implies another. For numerical attribute we can compute correlation coefficient of two attributes A and B to evaluate the correlation between them. This is given by Where n n is the number of tuples, n and are the respective means of A and B n ÃÆ'A and ÃÆ'B are the respective standard deviation of A and B n ÃŽ £(AB) is the sum of the AB cross-product. a. If -1 b. If rA, B is equal to zero it indicates A and B are independent of each other and there is no correlation between them. c. If rA, B is less than zero then A and B are negatively correlated. , where if value of one attribute increases value of another attribute decreases. This means that one attribute discourages another attribute. It is important to note that correlation does not imply causality. That is, if A and B are correlated, this does not essentially mean that A causes B or that B causes A. for example in analyzing a demographic database, we may find that attribute representing number of accidents and the number of car theft in a region are correlated. This does not mean that one is related to another. Both may be related to third attribute, namely population. For discrete data, a correlation relation between two attributes, can be discovered by a Ãâ€¡Ã‚ ²(chi-square) test. Let A has c distinct values a1,a2,Ã¢â‚¬ ¦Ã¢â‚¬ ¦ac and B has r different values namely b1,b2,Ã¢â‚¬ ¦Ã¢â‚¬ ¦br The data tuple described by A and B are shown as contingency table, with c values of A (making up columns) and r values of B( making up rows). Each and every (Ai, Bj) cell in table has. X^2 = sum_{i=1}^{r} sum_{j=1}^{c} {(O_{i,j} E_{i,j})^2 over E_{i,j}} . Where n Oi, j is the observed frequency (i.e. actual count) of joint event (Ai, Bj) and n Ei, j is the expected frequency which can be computed as E_{i,j}=frac{sum_{k=1}^{c} O_{i,k} sum_{k=1}^{r} O_{k,j}}{N} , , Where n N is number of data tuple n Oi,k is number of tuples having value ai for A n Ok,j is number of tuples having value bj for B The larger the Ãâ€¡Ã‚ ² value, the more likely the variables are related. The cells that contribute the most to the Ãâ€¡Ã‚ ² value are those whose actual count is very different from the expected count Chi-Square Calculation: An Example Suppose a group of 1,500 people were surveyed. The gender of each person was noted. Each person has polled their preferred type of reading material as fiction or non-fiction. The observed frequency of each possible joint event is summarized in following table.( number in parenthesis are expected frequencies) . Calculate chi square. Play chess Not play chess Sum (row) Like science fiction 250(90) 200(360) 450 Not like science fiction 50(210) 1000(840) 1050 Sum(col.) 300 1200 1500 E11 = count (male)*count(fiction)/N = 300 * 450 / 1500 =90 and so on For this table the degree of freedom are (2-1)(2-1) =1 as table is 2X2. for 1 degree of freedom , the Ãâ€¡Ã‚ ² value needed to reject the hypothesis at the 0.001 significance level is 10.828 (taken from the table of upper percentage point of the Ãâ€¡Ã‚ ² distribution typically available in any statistic text book). Since the computed value is above this, we can reject the hypothesis that gender and preferred reading are independent and conclude that two attributes are strongly correlated for given group. Duplication must also be detected at the tuple level. The use of renormalized tables is also a source of redundancies. Redundancies may further lead to data inconsistencies (due to updating some but not others). 2.2.2.2 Detection and resolution of data value conflicts Another significant issue in data integration is the discovery and resolution of data value conflicts. For example, for the same entity, attribute values from different sources may differ. For example weight can be stored in metric unit in one source and British imperial unit in another source. For instance, for a hotel cha

Buy presentation

Thursday, October 31, 2019

Trilingualism In Education Essay Example | Topics and Well Written Essays - 1250 words

Tuesday, October 29, 2019

Efficient Market Essay Example for Free

Sunday, October 27, 2019

Importance Of Preservation Of Biodiversity Philosophy Essay

Friday, October 25, 2019

Themes and Characters in For Whom the Bell Tolls Essay -- For Whom the

Thursday, October 24, 2019

Marketing of Service Ã¢â‚¬ Restaurant Chain Essay

Wednesday, October 23, 2019

Study Plan for Masters in Surgery

Tuesday, October 22, 2019

Platos Poesis In Republic Essays - Platonism, Dialogues Of Plato

Sunday, October 20, 2019

buy custom Business Decision Making essay

Saturday, October 19, 2019

Bud Light Marketing Analysis

Friday, October 18, 2019

Learning Styles Coursework Example | Topics and Well Written Essays - 1000 words

Literary works comment on society Essay Example | Topics and Well Written Essays - 250 words

Shermine Narwani and Maysaloon Albadri Research Paper

Thursday, October 17, 2019

Module 4 - SLP THREAT ANALYSIS Essay Example | Topics and Well Written Essays - 500 words

Armani Hotel (Dubai) - Managing Customer Service Essay

Wednesday, October 16, 2019

Shermine Narwani and Maysaloon Albadri Research Paper

Armani Hotel (Dubai) - Managing Customer Service Essay

Tuesday, October 15, 2019

Researched on magazines Essay Example for Free

Monday, October 14, 2019

Data Pre-processing Tool

Blog Archive

About Me