{"id":115,"date":"2017-05-18T15:17:12","date_gmt":"2017-05-18T13:17:12","guid":{"rendered":"http:\/\/proekspert-ee.vserver.zonevs.eu\/?p=115"},"modified":"2023-07-19T19:59:16","modified_gmt":"2023-07-19T17:59:16","slug":"use-of-graphics-processing-units-on-the-rise","status":"publish","type":"post","link":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/blog\/bi-data-analytics\/use-of-graphics-processing-units-on-the-rise\/","title":{"rendered":"Use of Graphics Processing Units on the Rise"},"content":{"rendered":"<p>An overview of this year\u2019s <a href=\"http:\/\/www.gputechconf.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">GPU Technology Conference<\/a> (GTC) is about the world of GPU-driven deep learning and real-world applications of AI.<br \/>\n<strong>&#8211; Andr\u00e9 Karpi\u0161t\u0161enko<\/strong><\/p>\n<h3>GPUs are the present in accelerated computing for analytics and engineering.<\/h3>\n<p>Proekspert has been at the cutting edge of smart machines and software for 24 years and is actively investing in data science software and infrastructure. In the spirit of genchi genbutsu (\u201cgo to the source and see it for yourself\u201d), we visited this year\u2019s GTC. Here is a recap of the zeitgeist at the event<\/p>\n<p>While the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Central_processing_unit\" target=\"_blank\" rel=\"noopener noreferrer\">CPU<\/a> outperforms the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Graphics_processing_unit\" target=\"_blank\" rel=\"noopener noreferrer\">GPU<\/a> in latency and energy efficiency, the GPU is the way forward for high-throughput massively parallel computing (growing 1.5 times year over year), matching the pace of data growth and reducing the compute gap of the CPUs. John Hennessy from Stanford University has claimed the start of a new era for computing in 2017. The underlying core concept is <a href=\"https:\/\/en.wikipedia.org\/wiki\/CUDA\" target=\"_blank\" rel=\"noopener noreferrer\">CUDA<\/a> (Compute Unified Device Architecture), a decades old parallel computing platform and programming model suitable for accelerating common tensor operations (matrix multiplication and summation), for example in <a href=\"https:\/\/en.wikipedia.org\/wiki\/Deep_learning\" target=\"_blank\" rel=\"noopener noreferrer\">deep learning<\/a>. With CUDA 9, synchronizing across multiple GPUs enables any scale of computing, a step towards an operating system for accelerated computing. The GPU Open Analytics Initiative is working towards pushing the entire stack of data science into GPUs, with <a href=\"https:\/\/www.anaconda.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Anaconda<\/a> data science distribution, the <a href=\"https:\/\/www.h2o.ai\/\" target=\"_blank\" rel=\"noopener noreferrer\">H2O<\/a> data science platform and <a href=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/contact\/\">MapD<\/a> database providing the basis.<\/p>\n<p>One of the fields taking the most out of this trend is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Weak_AI\" target=\"_blank\" rel=\"noopener noreferrer\">narrow AI<\/a> and its applications. At GTC, deep learning and AI made up well over half of the content.<\/p>\n<h3>Deep Learning in GPUs<\/h3>\n<p>The time when more software will be written by software than humans is no longer so distant. At the forefront of this direction are five tribes of machine learning, a subfield of AI: symbolists, Bayesians, analogizers, evolutionaries, and most prominently connectionists \u2014 called deep learning in the mainstream.<\/p>\n<div id=\"attachment_116\" style=\"width: 1374px\" class=\"wp-caption alignnone\"><img decoding=\"async\" aria-describedby=\"caption-attachment-116\" class=\"size-full wp-image-116 lazyload\" data-src=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/deep-learning-logos.png\" alt=\"\" width=\"1364\" height=\"694\" data-srcset=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/deep-learning-logos.png 1364w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/deep-learning-logos-300x153.png 300w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/deep-learning-logos-1024x521.png 1024w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/deep-learning-logos-768x391.png 768w\" data-sizes=\"(max-width: 1364px) 100vw, 1364px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 1364px; --smush-placeholder-aspect-ratio: 1364\/694;\" \/><p id=\"caption-attachment-116\" class=\"wp-caption-text\"><strong>Deep Learning Frameworks<\/strong> supporting the most advanced GPUs<\/p><\/div>\n<h4>Deep learning frameworks<\/h4>\n<p>For high-end development of deep learning models, numerous frameworks support the most advanced data center GPUs. If you are an engineer making decisions about your technology stack, there is ample choice. <a href=\"https:\/\/www.microsoft.com\/en-us\/cognitive-toolkit\/\" target=\"_blank\" rel=\"noopener noreferrer\">Microsoft Cognitive Toolkit<\/a> (CNTK), which focuses on scalability and performance, Facebook\u2019s highly customizable <a href=\"http:\/\/pytorch.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">PyTorch<\/a>, the production-ready <a href=\"https:\/\/research.fb.com\/downloads\/caffe2\/\" target=\"_blank\" rel=\"noopener noreferrer\">Caffe2<\/a>, Google\u2019s popular <a href=\"https:\/\/www.tensorflow.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">TensorFlow<\/a>, academic <a href=\"http:\/\/deeplearning.net\/software\/theano\/\" target=\"_blank\" rel=\"noopener noreferrer\">Theano<\/a>, and the collaborative endeavor <a href=\"https:\/\/mxnet.incubator.apache.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">MXNet<\/a>, provide the basis for adding intelligent features related to computer vision, text, speech, images, videos, time-series and more. Symbolic loops over sequences with dynamic scheduling, turning graphs into parallel programs through mini-batching, reduced communication overhead, are but a few of the exemplary features available at production quality. For example, building a leading image classification ResNet that performs better than humans at a 3.5 percent error rate, is estimated to be a 30-minute task with the new frameworks. Deep learning has turned into a popular choice with its Lego-like building blocks that can be rearranged into specialized network architectures. There are many use cases for the method.<\/p>\n<p>As a specific example of networks inspired by game theory, generative adversarial networks are starting to find new applications. Some examples: for simulating data, working with missing data, realistic generation tasks, image-to-image translation (from day to night, for example), simulation by prediction for particle physics, learning useful embeddings in images, and others. Networks are strong for perceiving and learning, but not for abstracting and reasoning. This is being solved by the new wave of AI for contextual adaptation that combines the statistical learning approach with handcrafted knowledge. The need for samples is decreasing considerably both for networks and for the new wave of models. For example, the new models can be trained with tens of labels in a handwritten dataset instead of the previous 60k.<\/p>\n<h3>Autonomous Vehicles<\/h3>\n<div id=\"attachment_117\" style=\"width: 810px\" class=\"wp-caption alignnone\"><img decoding=\"async\" aria-describedby=\"caption-attachment-117\" class=\"size-full wp-image-117 lazyload\" data-src=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/udacity.jpg\" alt=\"\" width=\"800\" height=\"333\" data-srcset=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/udacity.jpg 800w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/udacity-300x125.jpg 300w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/udacity-768x320.jpg 768w\" data-sizes=\"(max-width: 800px) 100vw, 800px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 800px; --smush-placeholder-aspect-ratio: 800\/333;\" \/><p id=\"caption-attachment-117\" class=\"wp-caption-text\">Udacity is democratizing development skills and knowledge for building autonomous cars<\/p><\/div>\n<p>Not limited to deep learning, the rising professional application of GPUs is narrow AI. A prominent field here is autonomous cars, where custom L3\/L4 autonomy for cars can be bought without having to build the physical infra. <a href=\"http:\/\/www.nvidia.com\/object\/drive-px.html\" target=\"_blank\" rel=\"noopener noreferrer\">Nvidia PX2<\/a>, and modular and scalable <a href=\"https:\/\/developer.nvidia.com\/driveworks\" target=\"_blank\" rel=\"noopener noreferrer\">Driveworks SDKs<\/a>, make advanced tasks like calibration, sensor fusion, free space detection, lane detection, object detection (cars, trucks, traffic signs, cycles, pedestrians, etc.) and localization fast and easy. Developers of autonomous vehicles can focus on their applications instead of the highly complex development of the base components.<\/p>\n<h3>AR\/VR<\/h3>\n<div id=\"attachment_118\" style=\"width: 2010px\" class=\"wp-caption alignnone\"><img decoding=\"async\" aria-describedby=\"caption-attachment-118\" class=\"size-full wp-image-118 lazyload\" data-src=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/ar-vr.jpg\" alt=\"\" width=\"2000\" height=\"1334\" data-srcset=\"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/ar-vr.jpg 1000w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/ar-vr-300x200.jpg 300w, https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-content\/uploads\/2017\/10\/ar-vr-768x512.jpg 768w\" data-sizes=\"(max-width: 2000px) 100vw, 2000px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 2000px; --smush-placeholder-aspect-ratio: 2000\/1334;\" \/><p id=\"caption-attachment-118\" class=\"wp-caption-text\">Virtual Reality and Augmented Reality are finding its first scalable business cases<\/p><\/div>\n<p>Moving closer to the roots of GPUs, namely computer graphics, there was another maturing trend well present at GTC. The devices for AR and VR have matured considerably in the decades since their inception. Novel directions like AI in VR are explored for interactive speech interfaces, visual recognition, data analysis and collaborative sharing. Corporate R&amp;D teams are working on concepts for the metaverse native generations that are in the early stages. A step in this direction is <a href=\"https:\/\/www.theverge.com\/2017\/5\/10\/15613018\/nvidia-vr-project-holodeck-koenigsegg-car-demo\" target=\"_blank\" rel=\"noopener noreferrer\">Nvidia\u2019s Holodeck<\/a>, which is a photorealistic, collaborative virtual reality environment that incorporates the feeling of real-world presence through sight, sound and haptics. The state of the art can handle products as complex as the new electric <a href=\"http:\/\/koenigsegg.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Koeningsegg<\/a> car design. By fitting the entire dataset into the GPU, multi-caching technologies enable interactive slice and dice queries and visualizations of fairly large datasets (384GB as of May 2017) in milliseconds.<\/p>\n<h3>Looking Forward in deep learning trend of GPUs<\/h3>\n<p>Many industries are affected by the rising trend of GPUs; for example, companies focused on healthcare, materials, agriculture, maritime, retail, the elderly, mapping, localization, self-driving, graphics, analytics, games and music are discovering and inventing new ways of interacting with the new era of abundant computing power. While I\/O is still the bottleneck, we are entering a new era of craftsmanship focused work at the intersection of art, science and engineering. This is evident from the 11-fold rise in GPU developers during the past five years.<\/p>\n<p>The frontier is about finding better ways to manage the model and experiment complexity explosion. For example, in 2017 <a href=\"https:\/\/en.wikipedia.org\/wiki\/Google_Neural_Machine_Translation\" target=\"_blank\" rel=\"noopener noreferrer\">Google NMT<\/a> runs at 105 <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exascale_computing\" target=\"_blank\" rel=\"noopener noreferrer\">exaFLOPS<\/a> with 8.7B parameters; in 2016 20 exaFLOPS and 300M parameters were needed for <a href=\"https:\/\/arxiv.org\/pdf\/1512.02595v1.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Baidu Deep Speech 2<\/a>, and in 2015 <a href=\"https:\/\/arxiv.org\/pdf\/1512.03385v1.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Microsoft ResNet<\/a> required 7 exaFLOPS with 60M parameters. One exaFLOPS is equivalent to running all the supercomputers in the world for one second in May 2017. Proekspert is evaluating how this trend is impacting data scientists generally, and what tools a data scientist needs to achieve and maintain high performance and productivity in the new era.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>An overview of this year\u2019s GPU Technology Conference (GTC) is about the world of GPU-driven deep learning and real-world applications of AI. &#8211; Andr\u00e9 Karpi\u0161t\u0161enko GPUs are the present in accelerated computing for analytics and engineering. Proekspert has been at the cutting edge of smart machines and software for 24 years and is actively investing in data science software and infrastructure. In the spirit of genchi genbutsu (\u201cgo to the source and see it for yourself\u201d), we visited this year\u2019s GTC. Here is a recap of the zeitgeist at the event While the CPU outperforms the GPU in latency and energy efficiency, the GPU is the way forward for high-throughput massively parallel computing (growing 1.5 times year over year), matching the pace of data growth and reducing the compute gap of the CPUs. John Hennessy from Stanford University has claimed the start of a new era for computing in 2017. The underlying core concept is CUDA (Compute Unified Device Architecture), a decades old parallel computing platform and programming model suitable for accelerating common tensor operations (matrix multiplication and summation), for example in deep learning. With CUDA 9, synchronizing across multiple GPUs enables any scale of computing, a step towards an<\/p>\n","protected":false},"author":10,"featured_media":118,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[3],"tags":[],"class_list":["post-115","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bi-data-analytics"],"acf":[],"_links":{"self":[{"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/posts\/115","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/comments?post=115"}],"version-history":[{"count":3,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/posts\/115\/revisions"}],"predecessor-version":[{"id":4439,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/posts\/115\/revisions\/4439"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/media\/118"}],"wp:attachment":[{"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/media?parent=115"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/categories?post=115"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/clients.triloogia.ee\/proekspert\/wp-new\/wp-json\/wp\/v2\/tags?post=115"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}