<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[adhadse]]></title><description><![CDATA[A Machine Learning Engineer ]]></description><link>https://adhadse.com/</link><image><url>https://adhadse.com/favicon.png</url><title>adhadse</title><link>https://adhadse.com/</link></image><generator>Ghost 5.2</generator><lastBuildDate>Wed, 22 Apr 2026 07:38:12 GMT</lastBuildDate><atom:link href="https://adhadse.com/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[No webcam? Use your mobile as webcam on Linux.]]></title><description><![CDATA[Shitty webcam, Use your smartphone as webcam.]]></description><link>https://adhadse.com/no-webcam-use-your-mobile-as-webcam-on-linux/</link><guid isPermaLink="false">6728193017de340307ecdf72</guid><category><![CDATA[Developer]]></category><category><![CDATA[Linux]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Mon, 04 Nov 2024 01:42:15 GMT</pubDate><media:content url="https://adhadse.com/content/images/2024/11/ernest-ojeh-cJGDRjl0TEs-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2024/11/ernest-ojeh-cJGDRjl0TEs-unsplash.jpg" alt="No webcam? Use your mobile as webcam on Linux."><p>This mini-guide is divided into two parts. The first part is how I tried the nerdy way and the second part in much simpler and straight forward way.</p><p>The basic idea of using mobile camera as webcam on Linux is this way:</p><ol><li>We get a video stream from the mobile, a server on smartphone device streaming the feed.</li><li>The streamed feed is captured by a computer device and dump into a dummy/virtual camera device.</li><li>Other applications can then view virtual camera as if it&apos;s an actual camera.</li></ol><h2 id="part-1-manually-creating-virtual-device-on-linux-and-dumping-a-video-feed">Part 1. Manually creating virtual device on Linux and dumping a video feed.</h2><p>Step 1, I needed an android application which can make my smartphone act as a IP webcam. I chose &quot;IP Webcam&quot; application.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2024/11/image.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="2000" height="1332" srcset="https://adhadse.com/content/images/size/w600/2024/11/image.png 600w, https://adhadse.com/content/images/size/w1000/2024/11/image.png 1000w, https://adhadse.com/content/images/size/w1600/2024/11/image.png 1600w, https://adhadse.com/content/images/2024/11/image.png 2047w" sizes="(min-width: 720px) 720px"><figcaption>IP Webcam on Google Play Store</figcaption></figure><p>Step 2, then I created a virtual device on Linux. We&apos;ll install <code><a href="https://github.com/umlaeute/v4l2loopback#run">v4l2loopback</a></code> on Linux using distribution package manager. </p><pre><code class="language-bash"># from RPM Fusion
# On fedora, you&apos;ll only need to install v4l2loopback, not -utils or -dkms
sudo dnf install v4l2loopback</code></pre><p>To create a dummy virtual camera, use these set of commands:</p><pre><code class="language-bash"># remove the module
sudo rmmod v4l2loopback

# create a virtual camera device
sudo modprobe v4l2loopback video_nr=10 card_label=&quot;Dummy cam&quot; exclusive_caps=1

# start and reload the module
sudo modprobe v4l2loopback exclusive_caps=1

# WARNING; you may need to disable secure boot
sudo modprobe -r v4l2loopback

# list video-for-linux devices 
v4l2-ctl --list-devices  # this should show you the device &apos;Dummy cam&apos;
</code></pre><p>Step 3, start the video server on the smartphone, It should start the server on <code>192.168.1.204:8080</code>. If you visit this IP address, you&apos;ll see options like this. This application sends RTSP stream which we can then use to dump into our newly created virtual camera on Linux.</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2024/11/image-1.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="2000" height="1337" srcset="https://adhadse.com/content/images/size/w600/2024/11/image-1.png 600w, https://adhadse.com/content/images/size/w1000/2024/11/image-1.png 1000w, https://adhadse.com/content/images/size/w1600/2024/11/image-1.png 1600w, https://adhadse.com/content/images/2024/11/image-1.png 2047w" sizes="(min-width: 720px) 720px"></figure><p>Step 3, I used <code>ffmpeg</code> to dump the RTSP video stream into the virtual camera like this:</p><pre><code class="language-bash"># use ffmpeg to stream the rtsp stream to your virtual camera device
ffmpeg -fflags nobuffer \
       -flags low_delay \
       -rtsp_transport udp \
       -reorder_queue_size 0 \
       -i rtsp://192.168.1.204:8080/h264.sdp \
       -fps_mode passthrough \
       -max_delay 0 \
       -copytb 0 \
       -copyts \
       -probesize 32 \
       -analyzeduration 0 \
       -buffer_size 8192 \
       -vcodec rawvideo \
       -pix_fmt yuv420p \
       -threads 4 \
       -thread_type frame \
       -f v4l2 \
       /dev/video10 	# the virtual camera device number</code></pre><p>or using this one:</p><pre><code class="language-bash">ffmpeg -fflags nobuffer \
       -flags low_delay \
       -rtsp_transport tcp \
       -reorder_queue_size 0 \
       -use_wallclock_as_timestamps 1 \
       -probesize 4096 \
       -analyzeduration 0 \
       -buffer_size 16384 \
       -max_delay 100000 \
       -i rtsp://192.168.1.204:8080/h264.sdp \
       -fps_mode passthrough \
       -vcodec rawvideo \
       -pix_fmt yuv420p \
       -threads 4 \
       -thread_type frame \
       -f v4l2 \
       /dev/video21
</code></pre><h2 id="part-2-the-straight-forward-way">Part 2. The straight forward way.</h2><p>I did noticed that the video stream via my nerdy way, is a bit choppy and has latency.</p><p>I stumbled upon &quot;DroidCam&quot;. It&apos;s a similar application to part 1, but it differs in how it&apos;s supposed to be used. It integrates very easily with OBS. </p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2024/11/image-2.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="2000" height="1329" srcset="https://adhadse.com/content/images/size/w600/2024/11/image-2.png 600w, https://adhadse.com/content/images/size/w1000/2024/11/image-2.png 1000w, https://adhadse.com/content/images/size/w1600/2024/11/image-2.png 1600w, https://adhadse.com/content/images/2024/11/image-2.png 2047w" sizes="(min-width: 720px) 720px"></figure><p>On Linux, OBS or Open Broadcast Software is very easy to install and is used by many live streamers or content creators for video related tasks.</p><p>We can easily obtain OBS on Linux from Flatpak.</p><pre><code class="language-bash">flatpak install flathub com.obsproject.Studio</code></pre><p>Then to work with &quot;DroidCam&quot;, we&apos;ll need to install a plugin for OBS:</p><pre><code class="language-bash">flatpak install flathub com.obsproject.Studio.Plugin.DroidCam</code></pre><p>We&apos;ll also need to check if OBS virtual camera is listed in <code>v4l2-ctl --list-devices</code>. If it isn&apos;t we&apos;ll need to add it again:</p><pre><code class="language-bash"># if you did tried method 1; we&apos;ll need to add another virtual device at /dev/video0
sudo rmmod v4l2loopback # stop the module
sudo modprobe v4l2loopback video_nr=10,0 card_label=&quot;Dummy cam&quot;,&quot;OBS Virtual Camera&quot; exclusive_caps=1,1
sudo modprobe v4l2loopback exclusive_caps=1 # restart the module

v4l2-ctl --list-devices # should show the OBS as /dev/video0 and Dummy Cam as /dev/video10</code></pre><p>Restart OBS, and you should see &quot;DroidCam OBS&quot; options in &quot;Sources&quot; tab.</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2024/11/image-3.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="1921" height="1049" srcset="https://adhadse.com/content/images/size/w600/2024/11/image-3.png 600w, https://adhadse.com/content/images/size/w1000/2024/11/image-3.png 1000w, https://adhadse.com/content/images/size/w1600/2024/11/image-3.png 1600w, https://adhadse.com/content/images/2024/11/image-3.png 1921w" sizes="(min-width: 720px) 720px"></figure><p>Start your DroidCam application on smartphone and add &quot;DroidCam OBS&quot; as sources.</p><p>By default OBS should pick up the IP address of the DroidCam, but incase it doesn&apos;t you can manually enter it when you&apos;re adding it to sources or edit it afterwards.</p><p>And volla, you&apos;ve a video streamed from smartphone into your OBS.</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2024/11/image-4.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="1921" height="1049" srcset="https://adhadse.com/content/images/size/w600/2024/11/image-4.png 600w, https://adhadse.com/content/images/size/w1000/2024/11/image-4.png 1000w, https://adhadse.com/content/images/size/w1600/2024/11/image-4.png 1600w, https://adhadse.com/content/images/2024/11/image-4.png 1921w" sizes="(min-width: 720px) 720px"></figure><p>One last step to this procedure is to start virtual camera by enabling &quot;Start Virtual Camera&quot; in &quot;Controls&quot; panel, so that other application can pick up this stream as camera device. </p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2024/11/image-5.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="1921" height="1049" srcset="https://adhadse.com/content/images/size/w600/2024/11/image-5.png 600w, https://adhadse.com/content/images/size/w1000/2024/11/image-5.png 1000w, https://adhadse.com/content/images/size/w1600/2024/11/image-5.png 1600w, https://adhadse.com/content/images/2024/11/image-5.png 1921w" sizes="(min-width: 720px) 720px"></figure><p>Start your browser and fire a webcam test if it&apos;s working. On Linux, I&apos;ve noticed Firefox to pick it up very easily, but for Chromium based browser you may need to disable Hardware accelerated rendering.</p><p>That&apos;s it for today. This is Anurag Dhadse, signing off.<br></p><h3 id="edit-0-may-12-2025">Edit 0. May 12, 2025</h3><p>Droidcam suffers from latency, when connected via WiFi and for high quality video source. </p><p>Instead, I would recommend connecting via USB, allow the popups that appear on your android device like these to enable USB debugging. </p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2025/05/Screenshot_20250512_114030_One-UI-Home.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="1080" height="880" srcset="https://adhadse.com/content/images/size/w600/2025/05/Screenshot_20250512_114030_One-UI-Home.png 600w, https://adhadse.com/content/images/size/w1000/2025/05/Screenshot_20250512_114030_One-UI-Home.png 1000w, https://adhadse.com/content/images/2025/05/Screenshot_20250512_114030_One-UI-Home.png 1080w" sizes="(min-width: 720px) 720px"></figure><p>Open Droidcam, then on your computer, start a terminal and type this command to forward port on your phone to the PC:</p><pre><code class="language-bash">adb forward tcp:4747 tcp:4747
</code></pre><p>It should output the port number that you forwarded.</p><p>Once done that, head back to your OBS and add a source &quot;Browser&quot; and name it whatever you want. </p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2025/05/image.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="1168" height="597" srcset="https://adhadse.com/content/images/size/w600/2025/05/image.png 600w, https://adhadse.com/content/images/size/w1000/2025/05/image.png 1000w, https://adhadse.com/content/images/2025/05/image.png 1168w" sizes="(min-width: 720px) 720px"></figure><p>and add in these settings and URL, configure your resolution:</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2025/05/image-1.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="903" height="835" srcset="https://adhadse.com/content/images/size/w600/2025/05/image-1.png 600w, https://adhadse.com/content/images/2025/05/image-1.png 903w" sizes="(min-width: 720px) 720px"></figure><p>and hit &quot;OK&quot;. If nothing popsup, try hitting refresh for the Browser source you just added:</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2025/05/image-2.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="1168" height="597" srcset="https://adhadse.com/content/images/size/w600/2025/05/image-2.png 600w, https://adhadse.com/content/images/size/w1000/2025/05/image-2.png 1000w, https://adhadse.com/content/images/2025/05/image-2.png 1168w" sizes="(min-width: 720px) 720px"></figure><p>Or, another way you can add the USB DroidCam OBS source is by following all the steps above, and instead of adding a new source, clicking the properties of existing DroidCam OBS source and selecting the device with &quot;USB&quot; written on it.</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2025/06/image.png" class="kg-image" alt="No webcam? Use your mobile as webcam on Linux." loading="lazy" width="756" height="927" srcset="https://adhadse.com/content/images/size/w600/2025/06/image.png 600w, https://adhadse.com/content/images/2025/06/image.png 756w" sizes="(min-width: 720px) 720px"></figure>]]></content:encoded></item><item><title><![CDATA[Reflect on 2023]]></title><description><![CDATA[Recap of my previous year.]]></description><link>https://adhadse.com/reflect-on-2023/</link><guid isPermaLink="false">658fbd4717de340307ecdad9</guid><category><![CDATA[Life]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 30 Dec 2023 09:25:31 GMT</pubDate><media:content url="https://adhadse.com/content/images/2023/12/allec-gomes-pZY4-qP30GQ-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2023/12/allec-gomes-pZY4-qP30GQ-unsplash.jpg" alt="Reflect on 2023"><p>This year I&apos;ll start with reflecting on the previous year of my life.</p><p>Recollect what I learned, what new habits I developed, articles I published, ..., you get it! The progress, good things that happened to me and areas I lacked behind. </p><p>The questions I&apos;ll answer will be these three:</p><ol><li>What went well this year?</li><li>What didn&apos;t go so well this year?</li><li>What did I learn &amp; what I should work toward?</li></ol><h2 id="what-went-well-this-year">What went well this year?</h2><p><strong>Full time opportunity at <a href="https://www.kavida.ai/">Kavida.ai</a>. </strong>I had started working as a Data Science Intern around December (2022) and this year around May I got an offer to work full time as a Junior Data Scientist. <br><br>You won&apos;t believe how awesome it was to dive into the world of startups, especially in Artificial Intelligence and Supply Chain. It&apos;s like living the dream, seriously. I never really fancied the big tech giants; I&apos;m all about backing the guys with crazy ideas set to shake up the world.</p><p><strong>Regular exercise for a healthy life. </strong>During my college days, it was hard for me to find or even give some time to my health. It would all be back and forth between college and home. Nontheless, after my last semester ended I began giving some time to my overall fitness. The goal wasn&apos;t to jack up, but just to get in a better shape overall, and be filled with energy for the day.</p><figure class="kg-card kg-gallery-card kg-width-wide"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2023/12/Screenshot_20231230_131935_Samsung-Health-1.jpg" width="1080" height="2340" loading="lazy" alt="Reflect on 2023" srcset="https://adhadse.com/content/images/size/w600/2023/12/Screenshot_20231230_131935_Samsung-Health-1.jpg 600w, https://adhadse.com/content/images/size/w1000/2023/12/Screenshot_20231230_131935_Samsung-Health-1.jpg 1000w, https://adhadse.com/content/images/2023/12/Screenshot_20231230_131935_Samsung-Health-1.jpg 1080w" sizes="(min-width: 720px) 720px"></div><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2023/12/Screenshot_20231230_132301_Samsung-Health.jpg" width="1080" height="2340" loading="lazy" alt="Reflect on 2023" srcset="https://adhadse.com/content/images/size/w600/2023/12/Screenshot_20231230_132301_Samsung-Health.jpg 600w, https://adhadse.com/content/images/size/w1000/2023/12/Screenshot_20231230_132301_Samsung-Health.jpg 1000w, https://adhadse.com/content/images/2023/12/Screenshot_20231230_132301_Samsung-Health.jpg 1080w" sizes="(min-width: 720px) 720px"></div><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2023/12/Screenshot_20231230_132631_Samsung-Health.jpg" width="1080" height="2340" loading="lazy" alt="Reflect on 2023" srcset="https://adhadse.com/content/images/size/w600/2023/12/Screenshot_20231230_132631_Samsung-Health.jpg 600w, https://adhadse.com/content/images/size/w1000/2023/12/Screenshot_20231230_132631_Samsung-Health.jpg 1000w, https://adhadse.com/content/images/2023/12/Screenshot_20231230_132631_Samsung-Health.jpg 1080w" sizes="(min-width: 720px) 720px"></div></div></div></figure><p>Although I did not kept track of what kind of exercises I did, I should start tracking them as well.<br>Health is everything. Getting good amount of sleep, doing regular exercise really improved my level of activity. </p><p><strong>I graduated.</strong> Yeah. I&apos;m happy that I no longer have to travel 50 km to and from, every day. But I&apos;ll always remember my college friends.</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2023/12/IMG-20230517-WA0007.jpg" class="kg-image" alt="Reflect on 2023" loading="lazy" width="1600" height="1200" srcset="https://adhadse.com/content/images/size/w600/2023/12/IMG-20230517-WA0007.jpg 600w, https://adhadse.com/content/images/size/w1000/2023/12/IMG-20230517-WA0007.jpg 1000w, https://adhadse.com/content/images/2023/12/IMG-20230517-WA0007.jpg 1600w" sizes="(min-width: 720px) 720px"></figure><p><strong>Adapting to analog scheduling.</strong> Every day, I would start with the schedule I had written last night on paper. This paper would also eventually become my to-do list for the day to keep my mind free of clutter of tasks I had to remember for the day. <br><br>Austin Kleon in his book &quot;Steal like an Artist&quot;, says:</p><blockquote>&quot;The computer is really good for editing your idea, and it&apos;s really good for getting your ideas ready for publishing out into the world, but it&apos;s not really good for generating ideas.&quot;</blockquote><p>This notebook isn&apos;t meant for generating the ideas but it serves as the central place for ideas and tasks come back to as I go through my day. Crossing off a task felt with like a sense of accomplishment.</p><p>Here is what worked for me, I would separate the page into two section, left section will be filled with schedule. On the right side, it starts with <strong>Highlight</strong> (of the day), the major task that has to finished at the end of the day, and below it would be <strong>To-do</strong> of anything that come across my mind or as the day go by.</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2023/12/20231230_142455.jpg" class="kg-image" alt="Reflect on 2023" loading="lazy" width="2000" height="1177" srcset="https://adhadse.com/content/images/size/w600/2023/12/20231230_142455.jpg 600w, https://adhadse.com/content/images/size/w1000/2023/12/20231230_142455.jpg 1000w, https://adhadse.com/content/images/size/w1600/2023/12/20231230_142455.jpg 1600w, https://adhadse.com/content/images/size/w2400/2023/12/20231230_142455.jpg 2400w" sizes="(min-width: 720px) 720px"></figure><p><strong>An Upgrade to my desk.</strong> An ultrawide monitor. I always felt a need to work on an ultrawide, especially when I needed to go back-and-forth between tutorial/documentation and my main code editor window.</p><p>This was a nice addition and worth the wait. I love it!</p><figure class="kg-card kg-image-card kg-width-wide"><img src="https://adhadse.com/content/images/2023/12/20231226_070910.jpg" class="kg-image" alt="Reflect on 2023" loading="lazy" width="2000" height="924" srcset="https://adhadse.com/content/images/size/w600/2023/12/20231226_070910.jpg 600w, https://adhadse.com/content/images/size/w1000/2023/12/20231226_070910.jpg 1000w, https://adhadse.com/content/images/size/w1600/2023/12/20231226_070910.jpg 1600w, https://adhadse.com/content/images/size/w2400/2023/12/20231226_070910.jpg 2400w" sizes="(min-width: 1200px) 1200px"></figure><h2 id="2-what-didnt-go-so-well-this-year">2. What didn&apos;t go so well this year?</h2><p><strong>Consistency.</strong> I failed to achieve the level of consistency I wanted to. Things improved but not quite dramatically.</p><p>I failed to deliver article/blog every week. This was primarily due to my lack of ability to manage my time effectively. I focused more on delivering and completing the task at my full-time role, instead of learning things. </p><p>I also failed to post consistently on LinkedIn and share my new findings. </p><p><strong>Fewer amount of contribution to Open Source. </strong>Working at a startup is no easy task, and finding time to contribute easily became a challenge for me. But, that&apos;s good. I need to overcome that.</p><p><strong>Read lesser books.</strong> I managed to read only three books this year:</p><ul><li>Atomic Habits, by James Clear</li><li>Steal like an Artist, by Austin Kleon</li><li>Architecture Design Patterns for Python, by Harry J.W. Percival &amp; Bob Gregory</li></ul><h2 id="3-what-did-i-learn-what-i-should-work-toward">3. What did I learn &amp; what I should work toward?</h2><ol><li>Focus on learning. That&apos;s what life is all about. I would die when I stop learning.</li><li>Do small, fast low-cost experiments, and then scale it up. Experiments are the key to discovery. If one tool doesn&apos;t work for you, check out next. You&apos;ll find what you need, or you may create your own.</li><li>Bring consistency to the everyday chaos. I always absolutely remember nothing about the little things I did that day, but I do remember the major things that got finished. If I can bring in consistent nature to my habits every day, I&apos;ll slowly get back on track. If I do skip for a day, don&apos;t let it slide for more than 2.</li><li>Keep track of things. I can&apos;t improve If I don&apos;t measure it. Jot down what you&apos;re up to, what you&apos;re picking up, and anything else worth remembering.</li></ol><hr><p>That&apos;s it. Wish you a Happy New Year. </p>]]></content:encoded></item><item><title><![CDATA[Move to a 21st century terminal in the Era of Linux]]></title><description><![CDATA[BASH stands for Bash bAsh baSh basH, its a weird acronym.]]></description><link>https://adhadse.com/move-to-21st-century-terminal/</link><guid isPermaLink="false">64029164c8984e5debaf44a9</guid><category><![CDATA[Linux]]></category><category><![CDATA[Developer]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 04 Mar 2023 08:37:37 GMT</pubDate><media:content url="https://adhadse.com/content/images/2023/03/blackbox.png" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2023/03/blackbox.png" alt="Move to a 21st century terminal in the Era of Linux"><p>Since I <a href="https://adhadse.com/one-week-of-linux-as-a-developer/">switched to Linux last June</a> (2022), I have been particularly inclined towards using the terminal for almost every developer workflow I know. Using Git, quickly rename files, and quick editing, or install new packages/apps. </p><p>My terminal helped me wherever I go. </p><p>But I felt something was missing. My default shell (the program which lets you interact with your computer on a terminal by executing commands written by you) was BASH. It&apos;s great but it implements key features like Tab completions which is very useful to my workflow, in a pretty weird fashion:</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2023/03/image.png" class="kg-image" alt="Move to a 21st century terminal in the Era of Linux" loading="lazy" width="1157" height="291" srcset="https://adhadse.com/content/images/size/w600/2023/03/image.png 600w, https://adhadse.com/content/images/size/w1000/2023/03/image.png 1000w, https://adhadse.com/content/images/2023/03/image.png 1157w" sizes="(min-width: 720px) 720px"></figure><p>Yup. Tab completions throw every possibility out in the output (stdout). What I wanted was something close to IDE/Text editor like completion menu. </p><p>And maintaining it&apos;s <code>~/.bashrc</code> is nightmare. </p><p>What I want is a clean, customizable, yet fast shell, fun to use and improves my productivity.</p><h2 id="the-nushell">The Nushell</h2><p>Then I found <strong>Nushell</strong>. A new type of Shell written in Rust. </p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/nushell/nushell"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - nushell/nushell: A new type of shell</div><div class="kg-bookmark-description">A new type of shell. Contribute to nushell/nushell development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt="Move to a 21st century terminal in the Era of Linux"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">nushell</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/6a7b5c44a2657ef9761e88801901d012517d5538419556cdaefcd14db32cc26c/nushell/nushell" alt="Move to a 21st century terminal in the Era of Linux"></div></a></figure><figure class="kg-card kg-video-card"><div class="kg-video-container"><video src="https://adhadse.com/content/media/2023/03/nushell-autocomplete7.mp4" poster="https://img.spacergif.org/v1/1360x768/0a/spacer.png" width="1360" height="768" loop autoplay muted playsinline preload="metadata" style="background: transparent url(&apos;https://adhadse.com/content/images/2023/03/media-thumbnail-ember160.jpg&apos;) 50% 50% / cover no-repeat;"></video><div class="kg-video-overlay"><button class="kg-video-large-play-icon"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/></svg></button></div><div class="kg-video-player-container kg-video-hide"><div class="kg-video-player"><button class="kg-video-play-icon"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/></svg></button><button class="kg-video-pause-icon kg-video-hide"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/><rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/></svg></button><span class="kg-video-current-time">0:00</span><div class="kg-video-time">/<span class="kg-video-duration"></span></div><input type="range" class="kg-video-seek-slider" max="100" value="0"><button class="kg-video-playback-rate">1&#xD7;</button><button class="kg-video-unmute-icon"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/></svg></button><button class="kg-video-mute-icon kg-video-hide"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/></svg></button><input type="range" class="kg-video-volume-slider" max="100" value="100"></div></div></div></figure><p>From its config file, conveniently located in <code>~/.config/nushell</code> directory everything about Nushell is customizable, from prompt to keybindings, tables, and even external autocompleter.</p><p>Nushell in itself is a completely different shell with the idea being it sees data as having some kind of structure to it instead of raw bytes of stream. The fact that it&apos;s written in Rust makes it even more appealing to use, plus it&apos;s Blazing Fast!!! </p><p>Even better about Nushell is that it is cross-platform, it works on Linux, Windows, and Mac.</p><p>Bash is not bad, but when it comes to user experience, IMO it fails there. Nushell gave me a completely new experience and fun to use terminal. Although it&apos;s not POSIX compliant, it works pretty fine with external commands. Even if a command for Nu clashes with external commands of Bash like, <code>ls</code> just adding <code>^</code> in front calls the external command.</p><p>Other than that the config for Nu, is easily accessible in <code>nu</code> by <code>config nu</code> is way easier to read. </p><p>There are other options as well:</p><ol><li><a href="https://github.com/z-shell">Z-Shell</a>, good but not in Rust, requires its own plugin manager</li><li><a href="https://ohmyz.sh/">Oh My ZSH</a>, filled with Plugin</li><li><a href="https://fishshell.com/">Fish</a>, not POSIX compliant</li></ol><p>Again, I might be biased and you might want to try the above options as well.</p><h2 id="getting-the-prompt-right">Getting the Prompt Right</h2><p>The prompt is what you see as the text to the left (or sometimes even on right), when you open a terminal. </p><figure class="kg-card kg-gallery-card kg-width-wide kg-card-hascaption"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2023/03/bracketed-segments.png" width="1183" height="894" loading="lazy" alt="Move to a 21st century terminal in the Era of Linux" srcset="https://adhadse.com/content/images/size/w600/2023/03/bracketed-segments.png 600w, https://adhadse.com/content/images/size/w1000/2023/03/bracketed-segments.png 1000w, https://adhadse.com/content/images/2023/03/bracketed-segments.png 1183w" sizes="(min-width: 720px) 720px"></div><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2023/03/pastel-powerline.png" width="1183" height="894" loading="lazy" alt="Move to a 21st century terminal in the Era of Linux" srcset="https://adhadse.com/content/images/size/w600/2023/03/pastel-powerline.png 600w, https://adhadse.com/content/images/size/w1000/2023/03/pastel-powerline.png 1000w, https://adhadse.com/content/images/2023/03/pastel-powerline.png 1183w" sizes="(min-width: 720px) 720px"></div><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2023/03/pure-preset.png" width="1183" height="894" loading="lazy" alt="Move to a 21st century terminal in the Era of Linux" srcset="https://adhadse.com/content/images/size/w600/2023/03/pure-preset.png 600w, https://adhadse.com/content/images/size/w1000/2023/03/pure-preset.png 1000w, https://adhadse.com/content/images/2023/03/pure-preset.png 1183w" sizes="(min-width: 720px) 720px"></div></div></div><figcaption>Images from <a href="https://starship.rs/presets/pure-preset.html">Starship documentation (Presets)</a></figcaption></figure><p>I previously used a script in <code>~/.bashrc</code> to customize the prompt and display info such as git branch, python virtual env name or conda environment name with no icons. Again, plain minimalist look.</p><p>With Nu setup, I find out there is easier and yet pretty maintainable way to customize the prompt.</p><p>Introducing, <strong>Starship</strong>.</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/starship/starship"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - starship/starship: &#x2604;&#x1F30C;&#xFE0F; The minimal, blazing-fast, and infinitely customizable prompt for any shell!</div><div class="kg-bookmark-description">&#x2604;&#x1F30C;&#xFE0F; The minimal, blazing-fast, and infinitely customizable prompt for any shell! - GitHub - starship/starship: &#x2604;&#x1F30C;&#xFE0F; The minimal, blazing-fast, and infinitely customizable prompt for any shell!</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt="Move to a 21st century terminal in the Era of Linux"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">starship</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://repository-images.githubusercontent.com/178991158/17096280-8d4d-11e9-97e9-7fed5f61d6bf" alt="Move to a 21st century terminal in the Era of Linux"></div></a></figure><p>People love to customize it to their liking, and I just wanted a clean minimal look. Starship works not just with Nu, but almost every mainline shell available. </p><p>Plus point, again it&apos;s written in Rust! </p><p>With this I created my prompt which looked something like this:</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2023/03/minimalist_prompt.png" class="kg-image" alt="Move to a 21st century terminal in the Era of Linux" loading="lazy" width="1282" height="771" srcset="https://adhadse.com/content/images/size/w600/2023/03/minimalist_prompt.png 600w, https://adhadse.com/content/images/size/w1000/2023/03/minimalist_prompt.png 1000w, https://adhadse.com/content/images/2023/03/minimalist_prompt.png 1282w" sizes="(min-width: 720px) 720px"></figure><p>Pretty Minimalist I guess, hiding every feature :)</p><h2 id="external-completions">External Completions</h2><p>As of writing, Nu (0.76.0) doesn&apos;t support completions for commands outside of <code>nu</code> and do require an External Completer.</p><p>There is an issue regarding this to parse man pages and store the completions in <code>.nu</code> files since man pages don&apos;t really exist on Windows. </p><p>For the time being, we can utilize external autocompleters and this time unfortunately it&apos;s not Rust :/</p><p>But, it&apos;s Go! Hey common, Go is also equally good. Even though it has a garbage collector, doesn&apos;t mean it&apos;s a dumpster fire like Java&#x1F602;.</p><p>Let&apos;s introduce <strong>Carapace.</strong></p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/rsteube/carapace-bin"><div class="kg-bookmark-content"><div class="kg-bookmark-title">GitHub - rsteube/carapace-bin: multi-shell multi-command argument completer</div><div class="kg-bookmark-description">multi-shell multi-command argument completer. Contribute to rsteube/carapace-bin development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt="Move to a 21st century terminal in the Era of Linux"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">rsteube</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://repository-images.githubusercontent.com/257400448/6f43d096-057f-4d2f-9e86-dbcde8bc6980" alt="Move to a 21st century terminal in the Era of Linux"></div></a></figure><p>Supporting a wide range of default programs you have worked with and new ones like <code>gh</code> for GitHub CLI and <code>rg</code> ripgrep is also supported. </p><p>Adding it to Nushell or any shell is also pretty straightforward and clearly documented. </p><p>If you don&apos;t find your CLI program getting autocompletion, please put a PR to the above GitHub page and contribute. Who doesn&apos;t love contributions?</p><p>And if you find any bugs or have a new feature you want to work on, strike a conversation in the Issues tab of any of the above repositories and contribute however you&apos;d want.</p><p>If you want to use my setup for your terminal, the config files and instructions are on GitHub along with Instructions and minor details. </p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/adhadse/ConfigFiles/tree/master/terminal"><div class="kg-bookmark-content"><div class="kg-bookmark-title">ConfigFiles/terminal at master &#xB7; adhadse/ConfigFiles</div><div class="kg-bookmark-description">Config Files used across my system. Contribute to adhadse/ConfigFiles development by creating an account on GitHub.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.com/fluidicon.png" alt="Move to a 21st century terminal in the Era of Linux"><span class="kg-bookmark-author">GitHub</span><span class="kg-bookmark-publisher">adhadse</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://opengraph.githubassets.com/48174c41ccd8084ca01b18a313ab4d77b36814e52858bd77d29dfad60a504d08/adhadse/ConfigFiles" alt="Move to a 21st century terminal in the Era of Linux"></div></a></figure><p>Have fun with your new terminal.</p><p>This is Anurag Dhadse, signing off.</p>]]></content:encoded></item><item><title><![CDATA[Async/Await, Multithreading, and Multiprocessing demystified]]></title><description><![CDATA[Let's go beyond single-threaded applications.]]></description><link>https://adhadse.com/async-await-multithreading-and-multiprocessing/</link><guid isPermaLink="false">63d4ca7fc8984e5debaf43b8</guid><category><![CDATA[Software Engineering]]></category><category><![CDATA[System Design]]></category><category><![CDATA[Developer]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 28 Jan 2023 07:34:04 GMT</pubDate><media:content url="https://adhadse.com/content/images/2023/01/mehdi-messrro-8GlNjM_5HLQ-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2023/01/mehdi-messrro-8GlNjM_5HLQ-unsplash.jpg" alt="Async/Await, Multithreading, and Multiprocessing demystified"><p>You also are confused by these terminologies? I was too. But not anymore. </p><p>Let&apos;s understand them with the help of an example and along with that I&apos;ll explain when exactly you should be looking to use them in this mini-article, so you don&apos;t have to go through all YouTube videos or articles on the Internet.</p><p>Say, you have a GUI app, MVP right now so single-threaded. Right now this thread is usually called, the main thread, which is also a UI thread (all operations asked by the user on GUI are performed on the main thread).</p><p>Currently, our app is Synchronous.</p><h2 id="synchronous">Synchronous</h2><p>Each task inside a single thread do tasks one by one, if any one task is longer the other tasks can&#x2019;t proceed until that task is done.</p><blockquote>Synchronous/single-threaded is best useful for small apps, CLI app, or any program that don&#x2019;t do any computationally expensive task.</blockquote><p>Say, that one task like fetching a database (or an I/O operation), will take a long time, waiting to retrieve the data. The UI will freeze; no buttons will work until data is fetched.</p><h2 id="asynchronous">Asynchronous</h2><p>The whole process is still single-threaded. <strong>BUT</strong>, fetching the database for data will now be done asynchronously, the task is going to be done by Database, and the main thread will just say this task is <em>Async</em> and needs to be <em>await</em>ed. It will pass the task a <em>callback</em> function to callback when the data is retrieved and the task will return a <em>promise</em> (in JS) or a <em>future</em> (in Rust) which needs to be polled from time to time (in Rust since <em>futures</em> are lazy) done by the main thread until drawn to completion. By the time, the main thread gets a <strong>future</strong> it goes on executing other tasks and does not wait for that task to complete. To programmers, this still looks synchronous.</p><blockquote>Other than tasks that can be delegated to external hardware, Async can also be used with tasks that are OK to be done in a single-threaded application and requires CPU processing power or tasks which are basically <strong>waiting</strong>, which won&#x2019;t make UI unresponsive (or stop other tasks from continuing for some waitable period) and less computationally expensive but may take a little bit of time. In Python async can also be used to delegate task that can be handled by lower level code (like in C/C++) that can do multiprocessing themselves.</blockquote><h2 id="multithreading">Multithreading</h2><p>If our GUI app is doing some task in the background such as parsing a GIANT file, or encoding/decoding a video (if done on CPU) which takes a lot of time and can&#x2019;t be delegated to hardware (like Database/disk controller in the previous case) then the UI will freeze and stop responding.</p><p>The solution is to spawn another thread, another road to the CPU that can do tasks independent of what&apos;s going on in the thread it was spawned by. Both threads (or many) will continue doing their own assigned tasks and are going to be part of the App&apos;s process. Inside each thread, the tasks can then be done synchronously or asynchronously as required.</p><blockquote>In scenarios like this, use Mutli-threading. Delegate the task that requires the processor&#x2019;s time and is ULTRA computationally expensive to another thread. Threads also share the same resources as the whole process (the whole App is a single process, which can have multiple threads in itself.).</blockquote><p>Keep in mind though multi-threading is expensive, since spawning a new thread takes some resources, and switching between them as well. So, if possible stay with async. Unless you are into coroutines, which are lightweight threads.</p><p>Now, our UI will be responsive (running on the UI thread), while in the background we do video encoding or any expensive task (on the Main thread).</p><h2 id="multiprocessing">Multiprocessing</h2><p>In this we spin up unique processe<strong>s,</strong> each can have many threads but is usually kept single-threaded. They all have their own separate address space, separate memory, and separate resources, and communicate between them using Inter-process communication.</p><p>Plus you can spin them on multiple machines separate from your local host machine.</p><p>This is best left for a huge data-intensive application program.</p><p>Both Multithreading and multiprocessing deal with <em>parallelism</em>, having machine instruction run executed simultaneously. Async deal with <em>concurrency</em>, having the tasks that do not deal CPU to move out of the way and let other tasks to proceed.</p><hr><p>That&apos;s all for today. Hope you learned something new.</p><p>This is Anurag Dhadse, signing off.</p>]]></content:encoded></item><item><title><![CDATA[Cascade Design Pattern]]></title><description><![CDATA[Waterfall pipeline of Models]]></description><link>https://adhadse.com/cascade-design-pattern/</link><guid isPermaLink="false">63cbff86c8984e5debaf4063</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sun, 22 Jan 2023 05:30:11 GMT</pubDate><media:content url="https://adhadse.com/content/images/2023/01/m-rishal-nZnrd8s4ztM-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2023/01/m-rishal-nZnrd8s4ztM-unsplash.jpg" alt="Cascade Design Pattern"><p>Can we break a machine learning problem can be broken into a series of ML problems?</p><p>I wonder you never though of that. Sometimes, you can do, but how and when is what is going to be the theme of this article.</p><p>So, let&apos;s understand the problem first.</p><p>Say, we want to train a model for a anomality or for a task that requires it to predict both usual and unusual activity. The model unless done some preprocessing or rebalancing is done on the data, the model will not learn the unusual activity because it is rare. It the unusual activity is also associated with abnormal values, then trainability suffers.</p><p>Let&apos;s suppose we are trying to train a model to predict the likelihood that a customer will return an item that they have purchased. If we simply train a binary classifier model, the reseller&apos;s return behavior is hard to be captured because in comparison to returns made, there are millions of transactions by retail buyers. We might not know at the time of purchase, if the purchase is made by a retail buyer or a reseller. However, from other martketplaces, we have identified when items bought from us are subsequently being resold.</p><p>One way to solve this could be to overweight the reseller instances when training the model. But then we won&apos;t be able to get the more common retail buyer use case as correct as possible, trading off accuracy on retail buyer and instead optimizing just for reseller use case.</p><p>The best way might be to use Cascade design pattern, breaking the whole problem into 3 distinct problems:</p><ol><li>Predict whether a specific transaction is by a reseller &#x2013; reseller or not?</li><li>Training one model on sales to retail buyers &#x2013; retail buyer will return or not?</li><li>Training the second model on sales to resellers &#x2013; reseller will return or not?</li></ol><p>Combine the output of the three separate models to predict the return likelihood for every item purchased and the probability the transaction is by a reseller.</p><p>This allows for the probability of different decisions on items likely to be returned depending on the type of buyer and ensures that the models in step 2 and 3 are as accurate as possible.</p><p>In addition to that, in the first step, we can use rebalancing to address the imbalanced distribution of transactions from retail buyers and resellers.</p><p>But, how do we do this?</p><h2 id="solution">Solution</h2><p><strong>Any machine learning problem where the output of one model is an input to the following model or determines the selection of subsequent models is called a <em>cascade</em></strong>.</p><p>For example, a machine learning problem that sometimes involves unusual circumstances can be solved by treating it as a cascade of four machine learning problems:</p><ol><li>A classification model to identify the circumstance</li><li>One model trained on unusual circumstances</li><li>A separate model trained on typical circumstances</li><li>A model to combine the output of the two separate models because the output is a probabilistic combination of the two outputs</li></ol><p>This might look very similar to an Ensemble of models but is actually different because of the special experiment design required when doing a cascade. </p><p>Indeed, the subsequent models after step 1 are not supposed to be trained on the actual split of training data separate from the model in step 1 but instead in a union. The subsequent models instead are required to be trained with the inputs of the first model in the cascade and ground truth labels as guidance to the optimization function. </p><figure class="kg-card kg-gallery-card kg-width-wide"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2023/01/wrong_procedure_ml_cascade-3.png" width="1614" height="1353" loading="lazy" alt="Cascade Design Pattern" srcset="https://adhadse.com/content/images/size/w600/2023/01/wrong_procedure_ml_cascade-3.png 600w, https://adhadse.com/content/images/size/w1000/2023/01/wrong_procedure_ml_cascade-3.png 1000w, https://adhadse.com/content/images/size/w1600/2023/01/wrong_procedure_ml_cascade-3.png 1600w, https://adhadse.com/content/images/2023/01/wrong_procedure_ml_cascade-3.png 1614w" sizes="(min-width: 720px) 720px"></div><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2023/01/right_procedure_ml_cascade-5.png" width="1473" height="1209" loading="lazy" alt="Cascade Design Pattern" srcset="https://adhadse.com/content/images/size/w600/2023/01/right_procedure_ml_cascade-5.png 600w, https://adhadse.com/content/images/size/w1000/2023/01/right_procedure_ml_cascade-5.png 1000w, https://adhadse.com/content/images/2023/01/right_procedure_ml_cascade-5.png 1473w" sizes="(min-width: 720px) 720px"></div></div></div></figure><p>So, the predictions of the first model are used to create the training dataset for the next models.</p><p>Also, rather than training the model individually, it is better to automate the entire workflow, by using workflow automation frameworks such as Kubeflow Pipelines, TFX, and many others.</p><h2 id="trade-offs-and-alternatives">Trade-Offs and Alternatives</h2><p>Cascade is not necessarily the best practice. It adds quite a bit of complexity and can be hard to debug in case of bad data and hard to maintain. Remember, if the data changes all models in the cascade would be required to be retrained. </p><p>Also, avoid having, as in the Cascade pattern, multiple machine learning models in the same pipeline. Try to limit a pipeline to a single machine learning problem. </p><h3 id="deterministic-inputs">Deterministic inputs</h3><p>Splitting an ML problem is usually a bad idea since an ML model can/should learn combinations of multiple factors. For e.g.,</p><ul><li>If a condition can be known deterministically from the input (article is from a news website, vs from an individual), we should just add the condition as one more input tot the model.</li><li>If the condition involves extrema in just one input (some customers who live nearby versus far away, with the meaning of near/far needing to be learned from the data), we can use Mixed Input Representation to handle it.</li></ul><p><strong>The Cascade design pattern addresses an unusual scenario for which we do not have a categorical input, and for which extreme values need to be learned from multiple inputs.</strong> &#xA0;</p><h3 id="single-model">Single Model </h3><p>Problems which does seem to be simple enough that a large/medium size ML model will be sufficient should stay away from using the Cascade design pattern. These problems imply patterns and combinations which can be implied from the data itself and can be learned by the model.</p><h3 id="internal-consistency">Internal Consistency</h3><p>The Cascade is needed when we need to maintain internal consistency among the predictions of multiple models. </p><p>Suppose, we are training the model to predict a customer&apos;s propensity to buy is to make a discounted offer. Whether or not we make a discounted offer, and the amount of discount will very often depend on whether this customer is comparison shopping or not. Given this, we need internal consistency between the two models (the model for comparison shoppers and the model for propensity to buy). In this case, the Cascade design pattern might be needed.</p><h3 id="pre-trained-model">Pre-trained Model</h3><p>The cascade is also needed when we wish to reuse the output of a pre-trained model as an input into our model.</p><p>Say, we want to train a model that can convert a page full of mathematics formulas into <a href="https://www.latex-project.org/">LaTeX</a>. We might have an OCR model that can do this but only if given a photo of a formula and not a page filled with formulas. </p><p>We can do a cascade and train a YOLO model to detect the individual formula on a page and then forward this output to our OCR model. It is critical that we recognize that the YOLO model will have errors, so we should not train the OCR model with a perfect training set of photos and corresponding LaTeX formulas. Instead, we should train the model on the actual output of the YOLO.</p><p>This is a common scenario where when using a pre-trained model as the first step of a pipeline is using an object-detection model followed by a fine grained image classification model. In that case, Cascade is recommended so that the entire pipeline can be retrained whenever the object-detection model is updated.</p><h3 id="reframing-instead-of-cascade">Reframing instead of Cascade</h3><p>Suppose, we wish to predict hourly sales amounts. Most of the time, we&apos;ll serve retail buyers but once in a while, we&apos;ll have a wholesale buyer. </p><p>Reframing the regression problem to be a classification problem of a range of different sales amounts might be a better approach, instead of trying to get the retail versus wholesale classification correct.</p><h3 id="regression-in-rare-situations">Regression in rare situations</h3><p>The Cascade design pattern can be helpful when carrying out regression when some values are much more common than others. For example, if we want to predict the amount of rainfall from a satellite image. It might be the case that on 99% of the pixels, it doesn&apos;t rain. In such cases, we can:</p><ol><li>First, predict whether or not it is going to rain for each pixel.</li><li>For pixels, where the model predicts rain is not likely, predict a rainfall amount of zero.</li><li>Train a regression model to predict the rainfall amount on pixels where the model predicts that rain is likely. </li></ol><p>That&apos;s all for today. Hope you learned something new.</p><p>This is Anurag Dhadse, signing off.</p>]]></content:encoded></item><item><title><![CDATA[Keyed Predictions]]></title><description><![CDATA[Scaling your ML service to millions of inputs.]]></description><link>https://adhadse.com/keyed-predictions/</link><guid isPermaLink="false">632e64c3c8984e5debaf392b</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 31 Dec 2022 07:40:00 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/09/nikita-tikhomirov-j978_9Rc9ts-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/09/nikita-tikhomirov-j978_9Rc9ts-unsplash.jpg" alt="Keyed Predictions"><p>Up until now you might have only trained a single model, accepting a single or few inputs and probably deployed it so it send back the output sequentially as the model service serve each request sequentially.</p><p>Now, Imagine a scenario, you have a file with millions of inputs and then the service needs to responds with a file with millions of predictions. Ah, it&apos;s easy.</p><p>It isn&apos;t! Model deployment services poses scalability challenges. Often requiring the ML model servers to scale horizontally instead of vertically. </p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2022/12/vertical_horizontal_scaling.png" class="kg-image" alt="Keyed Predictions" loading="lazy" width="1602" height="862" srcset="https://adhadse.com/content/images/size/w600/2022/12/vertical_horizontal_scaling.png 600w, https://adhadse.com/content/images/size/w1000/2022/12/vertical_horizontal_scaling.png 1000w, https://adhadse.com/content/images/size/w1600/2022/12/vertical_horizontal_scaling.png 1600w, https://adhadse.com/content/images/2022/12/vertical_horizontal_scaling.png 1602w" sizes="(min-width: 720px) 720px"></figure><p>You might think that it should be obvious that the first output corresponds to the input instance and the second output to the second input instance. But for this to happen a server needs to process the full set of inputs serially; often requiring vertical scaling and becoming expensive to continue in the long run.</p><p>Instead, servers and hence ML models are deployed in large clusters (horizontally) and in the process they distribute the request to multiple machines, collect all resulting outputs and send them back. And hence horizontal scaling can be quite cheap. But in the process, you&apos;ll get jumbled output. </p><p>Server nodes that receive only a few requests will be able to keep up, but any server node that receives a particularly large array will start to fall behind. Therefore, many online serving systems will impose a limit on the number of instances that can be sent in one request.</p><p>So, How do we solve this problem?</p><h2 id="solution">Solution</h2><p>The solution is to use pass-through keys. Have the client supply a key associated with each input to identify each input instance. These keys will not be used as input to the model and hence called pass-through, not passing through the ML model. </p><p>Suppose, your model accepts inputs <code>a</code>, <code>b</code>, <code>c</code> to produce an output <code>d</code>. Then let the client also supply the key <code>k</code> along with the inputs as <code>(k, a, b, c)</code>. The key can be as simple as an integer for a batch of requests or even <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a> (Universally Unique Identifier) strings are great.</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2022/12/keyed_prediction.png" class="kg-image" alt="Keyed Predictions" loading="lazy" width="1282" height="892" srcset="https://adhadse.com/content/images/size/w600/2022/12/keyed_prediction.png 600w, https://adhadse.com/content/images/size/w1000/2022/12/keyed_prediction.png 1000w, https://adhadse.com/content/images/2022/12/keyed_prediction.png 1282w" sizes="(min-width: 720px) 720px"></figure><h3 id="here-is-how-you-pass-through-keys-in-keras">Here is how you pass through keys in Keras</h3><p>To get your Keras model to pass through keys, supply a serving signature when exporting the model.</p><p>For example, in this code below the exported model would take four inputs (`is_male`, <code>mother_Age</code>, <code>plurality</code>, and <code>gestation_weeks</code>) and have it also take a key that it will pass through to the output along with the original output of the model (the <code>babyweight</code>):</p><pre><code class="language-python"># Serving function that passes through keys
@tf.function(input_signature=[{
    &apos;is_male&apos;: tf.TensorSpec([None,], dtype=tf.string, name=&apos;is_male&apos;),
    &apos;mother_age&apos;: tf.TensorSpec([None,], dtype=tf.float32, name=&apos;mother_age&apos;),
    &apos;plurality&apos;: tf.TensorSpec([None,], dtype=tf.string, name=&apos;plurality&apos;),
    &apos;gestation_weeks&apos;: tf.TensorSpec([None,], dtype=tf.float32, name=&apos;gestation_weeks&apos;),
    &apos;key&apos;: tf.TensorSpec([None,], dtype=tf.string, name=&apos;key&apos;)
    }])
def keyed_prediction(inputs):
    feats = inputs.copy()
    key = feats.pop(&apos;key&apos;) # get the key out of inputs 
    output = model(feats) # invoke model
    return {&apos;key&apos;: key, &apos;babyweight&apos;: output}</code></pre><p>This model is then saved using the Keras model Saving API:</p><pre><code class="language-python">model.save(EXPORT_PATH, 
           signature={&apos;serving_default&apos;: keyed_prediction})</code></pre><h3 id="adding-keyed-prediction-capability-to-an-existing-model">Adding keyed prediction capability to an existing model</h3><p>To add a keyed prediction capability to the already saved model, just load the Keras model and attach a serving function and again save it.</p><p>While attaching our new keyed prediction serving function, do provide a serving function that replicated the older no-key behavior to maintain backward compatibility: </p><pre><code class="language-python"># Serving function that passes through keys
@tf.function(input_signature=[{
    &apos;is_male&apos;: tf.TensorSpec([None,], dtype=tf.string, name=&apos;is_male&apos;),
    &apos;mother_age&apos;: tf.TensorSpec([None,], dtype=tf.float32, name=&apos;mother_age&apos;),
    &apos;plurality&apos;: tf.TensorSpec([None,], dtype=tf.string, name=&apos;plurality&apos;),
    &apos;gestation_weeks&apos;: tf.TensorSpec([None,], dtype=tf.float32, name=&apos;gestation_weeks&apos;),
}])
def nokey_prediction(inputs):
    output = model(feats) # invoke model
    return {&apos;babyweight&apos;: output}</code></pre><p>And then add our already defined keyed prediction serving function:</p><pre><code class="language-python">model.save(EXPORT_PATH,
           signatures={&apos;serving_default&apos;: nokey_prediction,
                       &apos;keyed_prediction&apos;: keyed_prediction
})</code></pre><h2 id="trade-offs-and-alternatives">Trade-Offs and Alternatives</h2><p>Why can&apos;t servers just assign keys to the inputs it receives? For online prediction, it is possible for servers to assign unique request IDs. For batch prediction, the problem is that the inputs need to be associated with the outputs, so the server assigning a unique ID is not enough since it can&apos;t be joined back to the input.</p><p>What the server needs to do is to assign keys to the inputs it receives before it invokes the model, uses the keys to order the outputs, and then remove the keys before sending along the outputs.<strong> The problem is that ordering is computationally expensive in distributed data processing.</strong></p><h3 id="asynchronous-serving">Asynchronous Serving</h3><p>Nowadays, many production ML models are Neural Networks and they involve matrix multiplication this can be significantly more efficient if done on Hardware Accelerator. </p><p>It is therefore more efficient to ensure that the matrices are within certain size ranges and/or multiples of a certain number. It can therefore, be helpful to accumulate requests (obviously up to a maximum latency) and handle the incoming requests in chunks. <strong>Since the chunks will consist of interleaved requests from multiple clients, the key, in this case, needs to have some sort of client identifier as well. </strong></p><h2 id="continuous-evaluation">Continuous Evaluation</h2><p>If you are doing continuous evaluation, it can be helpful to log metadata about the prediction requests so that you can monitor whether performance drops across the board or only in specific situations.</p><p>That&apos;s all for today. Hope you learned something new.</p><p>This is Anurag Dhadse, signing off.</p>]]></content:encoded></item><item><title><![CDATA[Two-Phase Predictions—Hybrid mode of model deployment]]></title><description><![CDATA[<p>Did you ever try to figure out how Alexa, Google Assistant, or any always-listening digital voice assistant devices are able to respond to every query of the user, without featuring complex AI hardware?</p><p>We already know because of the device constraints, models deployed on edge devices need to balance the</p>]]></description><link>https://adhadse.com/two-phase-predictions-hybrid-mode-of-model-deployment/</link><guid isPermaLink="false">632e64a9c8984e5debaf3923</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 24 Sep 2022 17:44:29 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/09/ron-whitaker-DGf7Ft-6aHk-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/09/ron-whitaker-DGf7Ft-6aHk-unsplash.jpg" alt="Two-Phase Predictions&#x2014;Hybrid mode of model deployment"><p>Did you ever try to figure out how Alexa, Google Assistant, or any always-listening digital voice assistant devices are able to respond to every query of the user, without featuring complex AI hardware?</p><p>We already know because of the device constraints, models deployed on edge devices need to balance the trade-off between accuracy and size, complexity, update frequency, and low latency. </p><ul><li>Cloud deployed models often have high latency, causing bad user experience for voice assistant users.</li><li>Privacy is also an issue.</li></ul><p>Problems like this is where Two-phase predictions can help resolve the conflict.</p><hr><p><strong>The idea is to split the use cases into two phases, simpler phase being carried out on the edge device and when required complex one on cloud.</strong></p><p>For the use case we talked about earlier,</p><ul><li>We&apos;ll have <strong>one edge optimized model deployed on the device</strong>, <strong>listening to surrounding for wake-up words</strong> (like, &quot;Alexa&quot;, &quot;Hey, Google&quot;, etc) to determine if the user wants to begin a conversation. </li><li>Upon successful detection, proceed with recording of the sound and upon detection of conclusion of the conversation, <strong>send to cloud for complex phase predictions to determine the intention of the user</strong>.</li></ul><p>This implies the two phases are split as:</p><ol><li>Smaller, cheaper model deployed on edge device for the simpler task.</li><li>Larger, complex model deployed on cloud and triggered only when needed.</li></ol><hr><h2 id="lets-try-it-out">Let&apos;s try it out!</h2><h3 id="phase-1-building-the-offline-model">Phase 1: Building the offline model</h3><p>We&apos;ll need to convert a trained model to a model suitable to run and store on edge devices. This can be done via a process known as <strong>quantization</strong>, where learned model weights are represented with fewer bytes. </p><p>TensorFlow, for example, uses a format called TensorFlow Lite to convert saved models into a smaller format optimized for serving at the edge.</p><p>This approach is termed as <strong>post-training quantization. </strong>The goal is to find the maximum absolute weight value, \(m\), then map it to floating-point range (often <code>float32</code>) \(-m\) to \(+m\) to the fixed-point (integer) range \(-127\) to \(+127\). This also requires the inputs to be quantize inputs at inference time, which TFLite automatically does for us.</p><p>i.e., from 32 bit floating point values to 8 bit signed integers, reducing size to 1/4th of the original model.</p><p>To prepare the trained model for edge serving, we use TF Lite to export it in an optimized format:</p><pre><code class="language-python">converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
open(&apos;converted_model.tflite&apos;, &apos;wb&apos;).write(tflite_model)</code></pre><p>To generate a prediction on a TF Lite model, we use the TF Lite interpreter, which is optimized for low latency. On edge devices, the platform-specific libraries provide APIs to load and generate predictions/inference purposes.</p><p>For this, we create an instance of TF Lite&apos;s interpreter and get details on the input and output format it&apos;s expecting:</p><pre><code class="language-python">interpreter = tf.lite.Interpreter(model_path=&quot;converted_model.tflite&quot;)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()</code></pre><p>The <code>input_details</code> or <code>output_details</code> is a list with a single dictionary object specifying the input/output specs of the converted TF Lite model, which looks like the following:</p><pre><code class="language-text">[{&apos;name&apos;: &apos;serving_default_digits:0&apos;,
  &apos;index&apos;: 0,
  &apos;shape&apos;: array([  1, 784], dtype=int32),
  &apos;shape_signature&apos;: array([ -1, 784], dtype=int32),
  &apos;dtype&apos;: numpy.float32,
  &apos;quantization&apos;: (0.0, 0),
  &apos;quantization_parameters&apos;: {&apos;scales&apos;: array([], dtype=float32),
  &apos;zero_points&apos;: array([], dtype=int32),
  &apos;quantized_dimension&apos;: 0},
  &apos;sparsity_parameters&apos;: {}}]</code></pre><p>We&apos;ll then get the prediction from our validation batch to the loaded TF Lite model as follows:</p><pre><code class="language-python">input_data = np.array([test_batch[42]], dtype=np.float32)
interpreter.set_tensor(input_details[0][&apos;index&apos;], input_data)

interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0][&apos;index&apos;])</code></pre><p>It&apos;s worth noting here that, depending on how costly it is to call your model, you can change what metric we&apos;re optimizing for when you train the on-device model. For example, precision over recall in case we don&apos;t care about False Negatives.</p><p>The main problem with quantization is that it loses a bit of accuracy: kinda equivalent to adding noise to the weights and activations. If the accuracy drop is too severe, then we may need to use <strong>quantization-aware training</strong>. This means adding fake quantization operations to the model so it can learn to ignore the quantization noise during training; making the final weights to be more robust to quantization.</p><h3 id="phase-2-building-the-cloud-model">Phase 2: Building the cloud model</h3><p>Our cloud model doesn&apos;t really need to be bounded by any constraints we faced for edge optimized model. We can follow a more traditional approach for training, exporting, and deploying this model. This means we can combine multiple different design patterns, such as Transfer Learning, a cascade of models, or multiple different models depending on the second phase requirement.</p><p>After training, we can then deploy this model to a cloud AI service provider (AWS, GCP, etc). Or we can take complete pipeline-based model training and deployment setup using libraries like TFX. </p><p>To demonstrate, we&apos;ll pretend a model is already trained and then deploy it on Google Cloud AI Platform.</p><p>First, we&apos;ll directly save our model to our GCP project storage bucket:</p><figure class="kg-card kg-code-card"><pre><code class="language-python">cloud_model.save(&apos;gs://your_storage_bucket/path&apos;)</code></pre><figcaption>This will export our model in TF SavedModel format and upload it to Cloud Storage bucket</figcaption></figure><p>On Google Cloud AI Platform, a model resource contains different versions of your model. Each model can have hundreds of versions. We&apos;ll create the model resource using <code>gcloud</code>, the Google Cloud CLI.</p><pre><code class="language-bash">gcloud ai-platform models create second-phase-predictor</code></pre><p>Then to deploy our model, we&apos;ll use <code>gcloud</code> and point AI Platform at the storage subdirectory that contains our saved model assets:</p><pre><code class="language-bash">gcloud ai-platform versions create v1 \
  --model second-phase-predictor \
  --origin &apos;gs://your_storage_bucket/path/model_timestamp&apos; \
  --runtime-version=2.1 \
  --framework=&apos;tensorflow&apos; \
  --python-version=3.7</code></pre><h2 id="trade-offs-and-alternatives">Trade-Offs and Alternatives</h2><p>There might be situations where our end users may have very little or no internet connectivity, and thus the services of the second phase/cloud hosted model becomes impossible to access. How can we mitigate this issue? Other than that how we are supposed to perform continuous evaluation, check if the metrics haven&apos;t degraded over time and if accuracy is suffering on edge-deployed model?</p><h3 id="standalone-single-phase-model">Standalone single-phase model</h3><p>In situations where end users of our model may have little or no internet connectivity, instead of relying on a two-phase prediction flow, we can make our first model robust enough that it can be self-sufficient.</p><p>To do this, we can create a smaller version of our complex model, and give users the option to dowload this simpler, smaller model for use when they are offline. These offline models may not be quite as accurate as their larger online counterparts, but this solution is infinitely better than having no offline support at all. </p><p>To build more complex models designed for offline inference, it&apos;s best to utilize <strong>quantization aware training</strong>, whereby we quantize model&apos;s weights and other math operations both during and after training.</p><h3 id="offline-support-for-specific-use-cases">Offline support for specific use cases</h3><p>Another solution for making our application work for users with minimal internet connectivity is to make only certain parts of our app available offline. This means only a few common features work offline or caching the results of an ML model&apos;s prediction for later offline use.</p><p>This way, the app works sufficiently offline but provides full functionality when it regains connectivity.</p><h3 id="handling-many-predictions-in-near-real-time">Handling many predictions in near real time</h3><p>In some other cases, end users of our ML model may have reliable connectivity but might need to make hundreds or even thousands of predictions to our mode at once. This is the case of sensor stream data, maybe trying to detect some kind of anomaly. </p><p>Getting prediction responses on thousands of examples at once will take too much time due to the excess amount of requests and network bandwidth issues.</p><p>Instead of constantly sending requests over the network for anomaly detection, we can have a model deployed directly on the sensors to identify possible anomaly candidates from incoming data and then send only potential anomalies to our cloud model for verification. </p><p>The main difference being that both the offline and cloud models perform the same prediction task but with different inputs.</p><h3 id="continuous-evaluation-for-offline-models">Continuous evaluation for offline models</h3><p>We can save a subset of predictions that are received on-device. We could then periodically evaluate our model&apos;s performance on these examples and determine if the model needs retraining. </p><p>Another option is to create a replica of our on-device model to run <em>online</em>, only for continuous evaluation purposes. This solution is preferred if our offline and cloud models are running similar prediction tasks, like in Neural Machine Translation.</p><p>That&apos;s all for today. Hope you learned something new.</p><p>This is Anurag Dhadse, signing off.</p>]]></content:encoded></item><item><title><![CDATA[Ensemble—A Bundle of ML Models]]></title><description><![CDATA[It becomes harder to beat a group of weak warriors.]]></description><link>https://adhadse.com/ensembles-a-bundle-of-ml-models/</link><guid isPermaLink="false">631c2cc985e4100f9fa19a77</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Learning]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 17 Sep 2022 13:58:41 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/09/siniz-kim-Upik7lKpsAE-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/09/siniz-kim-Upik7lKpsAE-unsplash.jpg" alt="Ensemble&#x2014;A Bundle of ML Models"><p>Wisdom of crowd.</p><p>Wisdom of crowds is a theory that assumes that the knowledge or collective opinion of a diverse independent group of individuals results in better decision-making, innovation, and problem-solving than that of an individual. </p><p>In the machine learning space, when it&apos;s harder to build a model that has a substantially lower reducible error, instead of building for larger models, we can combine several diverse ml models.</p><p>But what reducible error we are talking about?</p><p>You see, the error of an ML model can be broken down into two parts:</p><p>$$\text{Error of model} = \text{Irreducible error} + \text{Reducible error}$$</p><p>The <strong>irreducible error</strong> is the inherent error in the model <strong>resulting from noise in the dataset, bad training examples, or framing of the problem. </strong></p><p>And the <strong>reducible error</strong> is made up of:</p><p>$$\text{reducible error} = \text{Bias}\space \text{\textbf{Or}}\space \text{Variance}$$</p><p>The <strong>bias</strong> is the <strong>model&apos;s inability to learn</strong> enough <strong>about the relationships between the model&apos;s features and labels. </strong>This is<strong> due to wrong assumptions </strong>such as the data is linearly separable when it is actually quadratic.</p><p>The <strong>variance</strong> captures the <strong>model&apos;s inability to generalize on new, unseen examples</strong> due to <strong>model&apos;s excessive sensitivity to small variations in the training data</strong>.</p><p>A model with <strong>high bias oversimplifies the relationship</strong> and becomes <em>underfit</em>, and a model with <strong>high variance learns too much</strong> (kind of cram everything) and is said to <em>overfit</em>.</p><p>Our task in modeling is to lower both bias and variance, but in practice, however, this is not possible. This is called as <em>bias-variance trade-off</em>. </p><hr><p>Ensemble is a solution for this trade-off applied to small and medium-scale problems to reduce the bias and/or variance to help improve performance. This involves as stated above to combine multiple models and aggregating their outputs to generate the final result.</p><p>The most common techniques in Ensemble Learning are:</p><ol><li>Bagging&#x2013;good for <strong>decreasing variance</strong></li><li>Boosting&#x2013;good for <strong>decreasing bias</strong></li><li>Stacking</li></ol><h3 id="bagging">Bagging</h3><p>Bagging or bootstrap aggregating is a type of parallel ensembling method <strong>where the same training algorithm for every predictor and train them on different random subsets</strong> of training set <strong>with replacement</strong>. </p><p>When sampling is performed <strong>without replacement</strong>, it is called <strong>pasting</strong>.</p><p>Then aggregation is performed on the output of these models&#x2013;either an average or majority vote in the case of classification.</p><blockquote>This works because each individual model can be off by a random amount, so<strong> when their results are averaged, these errors cancel out.</strong></blockquote><p>We can also have hard-voting or soft voting when performing aggregation for classification models:</p><ul><li>If the majority-vote classifier output is selected, then it is <strong>hard voting.</strong></li><li>If the highest class probability from all classifiers are averaged for final selection of classification output, then it is <strong>soft voting</strong>.</li></ul><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/09/Bagging.png" class="kg-image" alt="Ensemble&#x2014;A Bundle of ML Models" loading="lazy" width="1923" height="1053" srcset="https://adhadse.com/content/images/size/w600/2022/09/Bagging.png 600w, https://adhadse.com/content/images/size/w1000/2022/09/Bagging.png 1000w, https://adhadse.com/content/images/size/w1600/2022/09/Bagging.png 1600w, https://adhadse.com/content/images/2022/09/Bagging.png 1923w" sizes="(min-width: 720px) 720px"><figcaption>Bagging is good for decreasing variance of the resulting ensemble model</figcaption></figure><p>A very popular example of bagging is random forest.</p><p>You can very easily create a Radom forest in Scikit-Learn as:</p><pre><code class="language-python">from sklearn.ensemble import RandomForestRegressor

# Create the model with 50 trees
RF_model = RandomForestRegressor(n_estimators=50,
                                 max_features=&apos;sqrt&apos;,
                                 n_jobs=-1, verbose=1)
                                 
# and fit the training data
RF_model.fit(X_train, Y_train)</code></pre><p>To perform <em>pasting</em> just set <code>bootstrap=False</code> for <code>RandomForestRegressor</code>/ <code>BaggingClassifier</code>.</p><figure class="kg-card kg-code-card"><pre><code class="language-python">from sklearn.ensemble import BaggingClassifier

# Create the ensemble with 50 base estimators 
Bc_model = BaggingClassifier(n_estimators=50,
                             max_samples=100,
                             bootstrap=False
                             n_jobs=-1, verbose=1)
                                 
# and fit the training data
bg_model.fit(X_train, Y_train)</code></pre><figcaption><code>BaggingClassifier</code> automatically performs soft voting if the base classifier can estimate class probabilities, i.e, it has <code>predict_proba()</code> method</figcaption></figure><h3 id="boosting">Boosting</h3><p>Boosting refers to any Ensemble method that can combine several weak learners to produce strong learners with <em>more</em> capacity than the individual models. </p><p>Boosting <strong>iteratively improves upon a sequence of weak learners training them sequentially, each trying to correct its predecessor.</strong></p><blockquote>Boosting works because at each next iteration, the model is punished to predict according to the residuals of the previous iteration.</blockquote><p>The most popular boosting methods are <em>AdaBoost</em> (short for Adaptive Boosting) and <em>Gradient Boosting</em>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/09/boosting-1.png" class="kg-image" alt="Ensemble&#x2014;A Bundle of ML Models" loading="lazy" width="2000" height="724" srcset="https://adhadse.com/content/images/size/w600/2022/09/boosting-1.png 600w, https://adhadse.com/content/images/size/w1000/2022/09/boosting-1.png 1000w, https://adhadse.com/content/images/size/w1600/2022/09/boosting-1.png 1600w, https://adhadse.com/content/images/2022/09/boosting-1.png 2163w" sizes="(min-width: 720px) 720px"><figcaption>Boosting is an effective method to reduce bias of the resulting classifier</figcaption></figure><p>Once again, in scikit-learn we can implement it as follows:</p><pre><code class="language-python">from sklearn.ensemble import GradientBoostingRegressor

GB_model = GradientBoostingRegressor(n_estimators=1,
                                     max_depth=1,
                                     learning_rate=1,
                                     criterion=&apos;mse&apos;)
                                     
# fit on training data
GB_model.fit(X_train, Y_train)</code></pre><p>One important drawback of this sequential learning technique is that it cannot be parallelized. As a result, it does not scale as well as bagging or pasting.</p><h3 id="stacking">Stacking</h3><p>Stacking can be thought of as an extension of simple model averaging of k models trained on complete dataset but <strong>with different types/algorithms</strong>. More generally, we could modify the averaging step to take a weighted average of all outputs.</p><p>Stacking comprises of two steps:</p><ol><li>Initially, initial models (typically of different types) are trained to completion on the full training dataset.</li><li>In the second step, a meta-model is trained using the initial model outputs as features whose task is to best combine the outcomes of initial models to decrease the training error. Again it can be any machine learning model.</li></ol><blockquote>Stacking works because it combines the best of both bagging and boosting.</blockquote><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/09/stacking-1.png" class="kg-image" alt="Ensemble&#x2014;A Bundle of ML Models" loading="lazy" width="2000" height="640" srcset="https://adhadse.com/content/images/size/w600/2022/09/stacking-1.png 600w, https://adhadse.com/content/images/size/w1000/2022/09/stacking-1.png 1000w, https://adhadse.com/content/images/size/w1600/2022/09/stacking-1.png 1600w, https://adhadse.com/content/images/2022/09/stacking-1.png 2073w" sizes="(min-width: 720px) 720px"><figcaption>The simplest form of model averaging averages model outputs or could be a weighted average of the outputs based on the relative accuracy of the individual models</figcaption></figure><h2 id="trade-offs-and-alternatives">Trade-Offs and Alternatives</h2><h3 id="increased-training-and-design-time">Increased training and design time</h3><p>The obvious downtime to ensemble learning is increased training and design time. In ensemble design patterns, the complexity increases since instead of developing one single model, we are trying to model <em>k</em>-model, or maybe of different types if we are using <em>Stacking</em>.</p><p>However, we should carefully consider the overhead of building such ensemble models is worth it by comparing its accuracy and resource usage with simpler models. </p><h3 id="dropout-as-bagging">Dropout as bagging</h3><p>Dropout is a very popular regularization technique in Neural networks where a neuron is &quot;dropped&quot; during an interation based on its dropout-probability during training. It can be considered as an approximation of bagging, as a bagged ensemble of exponentially many neural networks. </p><p>Although, it&apos;s not exactly the same concept.</p><ul><li>In the case of bagging, the models are independent, while in the case of dropout, the parameters are shared.</li><li>In bagging, the models are trained to convergence on their resepective training set, while with dropout, the ensemble member model will only be trained for a single step.</li></ul><h3 id="decreased-model-interpretability">Decreased model interpretability</h3><p>For many production ML tasks model interpretability and explainability is important. Ensembles doesn&apos;t fullfill this requirement.</p><h3 id="choosing-the-right-tool-for-the-problem">Choosing the right tool for the problem</h3><p>It&apos;s also important to keep the problem we were trying to solve in the first place. So, it&apos;s important to keep in mind the bias-variance trade-off and select the right tool for your problem. Bagging if you want to reduce variance, Boosting to reduce bias otherwise Stacking.</p><p>Hope you learned something new.</p><p>This is Anurag Dhadse, signing off.</p>]]></content:encoded></item><item><title><![CDATA[Random splitting is Wicked.]]></title><description><![CDATA[Learn how to perform Repeatable Splitting]]></description><link>https://adhadse.com/random-splitting-is-wicked/</link><guid isPermaLink="false">631c2a9a85e4100f9fa19a6c</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 10 Sep 2022 00:27:00 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/09/tomas-mata-6nOJYIj0Vsc-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/09/tomas-mata-6nOJYIj0Vsc-unsplash.jpg" alt="Random splitting is Wicked."><p>You would have probably seen something like this in every machine learning tutorial out there:</p><pre><code class="language-python">from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.33, 
    random_state=42)</code></pre><p>But there is a problem, it is rare that the rows are independent. </p><p>Take, for example, if we were trying to predict the arrival delays of flights on a particular day, the instances/rows will be highly correlated. This can lead to leakage of information between the training and test dataset.</p><p>Plus, unless we set <code>random_state</code> the <code>train_test_split</code> will produce complete different splits every time it is run. This will pose a problem when we are trying to consider reproducibility in our machine learning workflow.</p><p>This is where Repeatable Splitting comes in handy. Repeatable splitting of the data that works regardless of programming language or random seeds. This also makes sure that correlated rows fall into the same split. </p><hr><p>The solution is to <strong>first identify a column that captures the correlation relationship between rows</strong>. Then, <strong>we use the last few digits as input to a hash function on that column to split the data. </strong></p><p>So, as in a time series dataset, where often the rows are correlated, we can use the <code>date</code> column and pass it to the Farm Fingerprint hashing algorithm to split the available data into required splits.</p><pre><code class="language-SQL">SELECT
  airline,
  feature_1,
  feature_2,
  feature_3,
  feature_4
FROM
  `timeseries-data`.airline_ontime_data.flights
WHERE
  ABS(MOD(FARM_FINGERPRINT(date), 10)) &lt; 8 -- 80% for TRAIN</code></pre><p>Here, we compute the hash using the <code>FARM_FINGERPRINT</code> function and then use the modulo function to find an arbitrary 80% subset of the rows. </p><p>This is now repeatable&#x2013;because the <code>FARM_FINGERPRINT</code> function returns the same value any time it is invoked on a specific timestamp, <strong>we can be sure we will get the same 80% of data each time.</strong></p><p>But, there are some considerations when choosing which column to split on:</p><ul><li><strong>Rows on the same date tend to be correlated. Correlation is the biggest factor in the selection of column(s) on which to split.</strong></li><li><code>date</code> is not an input to the model even though it is used as a criterion for splitting. <strong>We can&apos;t use an actual input as the field with which to split because the trained model will not have seen 20% of the possible input values for the date column if we use 80% of the data for training</strong> (say the <code>date</code> column, 80% of values would remain in the test set, 20% unseen).</li><li>There have to enough <code>date</code> values. <strong>A rule of thumb is to shoot for 3-5% the denominator for the modulo</strong>, so in this case, we want 40 or so unique dates.</li><li><strong>The label has to be well distributed among the dates.</strong> To be safe, look at the distribution graph and make sure that all three splits have a similar distribution of labels.</li></ul><h3 id="kolomogorov-smirnov-test">Kolomogorov-Smirnov Test</h3><p>To check whether the label distributions are similar across the three datasets, <strong>plot the cumulative distribution functions of the label in the three datasets and find the maximum distance between each pair.</strong></p><p><strong>The smaller the maximum distance, the better the split.</strong></p><h2 id="trade-offs-and-alternatives">Trade-Offs and Alternatives</h2><h3 id="single-query">Single Query</h3><p>We can have a single query to generate training, validation, test splits:</p><pre><code class="language-SQL">CREATE OR REPLACE TABLE mydataset.mytable AS
SELECT
  airline,
  feature_1,
  feature_2,
  feature_3,
  feature_4,
  CASE(ABS(FARM_FINGERPRINT(date), 10)))
       WHEN 9 THEN &apos;test&apos;
       WHEN 8 THEN &apos;validation&apos;
       ELSE &apos;training&apos; END AS split_col
FROM
  `timeseries-data`.airline_ontime_data.flights</code></pre><h3 id="random-split">Random split</h3><p>If the rows are not correlated, we can hash the entire row of data by converting it to a string and hashing that string:</p><figure class="kg-card kg-code-card"><pre><code class="language-SQL">SELECT
  airline,
  feature_1,
  feature_2,
  feature_3,
  feature_4,
FROM
  `timeseries-data`.airline_ontime_data.flights f
WHERE
  ABS(MOD(FARM_FINGERPRINT(TO_JSON_STRING(f), 10)) &lt; 8</code></pre><figcaption>Duplicate rows will always fall in the same split. If that&apos;s not the behavior we want, add a unique ID to <code>SELECT</code> query.</figcaption></figure><h3 id="split-on-multiple-columns">Split on multiple columns</h3><p>It might happen that a combination of multiple rows might be correlated, say the <code>date</code> and <code>weather</code>. In that case, we can simply concatenate the fields (creating a feature cross) before computing the hash.</p><pre><code class="language-SQL">CREATE OR REPLACE TABLE mydataset.mytable AS
SELECT
  airline,
  feature_1,
  feature_2,
  feature_3,
  arrival_airport,
FROM
  `timeseries-data`.airline_ontime_data.flights
WHERE
  ABS(MOD(FARM_FINGERPRINT(CONCAT(date, arrival_airport), 10)) &lt; 8</code></pre><p><strong>If we split on a feature cross of multiple columns, we can use <code>arrival_airport</code> (or any other feature used in conjunction) as one of the inputs to the model, since there will be examples of any particular airport in both the training and test sets.</strong></p><h3 id="repeatable-sampling">Repeatable sampling</h3><p>If we wanted to create a smaller dataset out of a bigger one (say for local development), how would we go about doing it repeatable? If we have a dataset of 50 million examples and we want a smaller dataset of one million flights? How would we <strong>pick 1 in 50 flights</strong>, and then<strong> 80%</strong> <strong>of those as training</strong>?</p><p><strong>What we cannot do is:</strong></p><figure class="kg-card kg-code-card"><pre><code class="language-SQL">SELECT
  airline,
  feature_1,
  feature_2,
  feature_3,
  feature_4,
FROM
  `timeseries-data`.airline_ontime_data.flights f
WHERE
  ABS(MOD(FARM_FINGERPRINT(date), 50)) = 0
  AND ABS(MOD(FARM_FINGERPRINT(date), 10)) &lt; 8</code></pre><figcaption>We shouldn&apos;t do!</figcaption></figure><p>We cannot pick 1 in 50 rows and then pick 8 in 10. Those rows which are divisible by 50 are also going to be divisible by 10.</p><p>What we can do however is:</p><pre><code class="language-SQL">SELECT
  airline,
  feature_1,
  feature_2,
  feature_3,
  feature_4,
FROM
  `timeseries-data`.airline_ontime_data.flights f
WHERE
  ABS(MOD(FARM_FINGERPRINT(date), 50)) = 0
  AND ABS(MOD(FARM_FINGERPRINT(date), 500)) &lt; 400</code></pre><p>In this query, the 500 is 70*10, and 400 is 50*8 (80% as training). </p><p>The first modulo picks 1 in 50 rows and the second modulo picks 8 in 10 of those rows.</p><p>For validation, you can change the query as:</p><pre><code class="language-SQL">  ABS(MOD(FARM_FINGERPRINT(date), 50)) = 0
  AND ABS(MOD(FARM_FINGERPRINT(date), 500)) BETWEEN 400 AND 449 -- (9*50)</code></pre><h3 id="sequential-split">Sequential split</h3><p>In the case of time series models, a very common approach is to use sequential splits of data. The idea is to assign blocks or intervals of series data to various splits preserving the correlation among those examples in the individual split.</p><p>Sequential split of data is also necessary for fast-moving environments such as fraud detection or spam detection even if the goal is not to predict the future value of time series. The goal instead is to quickly adapt to new data and predict behavior in sooner future. </p><p>Another instance where a sequential slit of data is needed is when there are high correlations between successive times and we need to take seasonality into account. Take weather forecasts for example. Successive day&apos;s weather depends on the previous day&apos;s one and is affected year long.</p><p>To do a sequential split in this case, we&apos;ll take the first 20 days of every month in the training dataset, the next 5 days in the validation dataset, and the last 5 days in the testing dataset.</p><h3 id="stratified-split">Stratified split</h3><p>In the above example, it was required that the splitting needs to happen after the dataset is <em>stratified</em>. Means we needed to account for the distribution of individual category/type of examples to remain same in splits, matching the distribution in the complete unsplitted dataset.</p><p>The larger the dataset, the less concerned we have to be with stratification. Therefore, in large-scale machine learning, the need to stratify isn&apos;t very common unless in the case of skewed datasets.</p><h3 id="unstructured-data">Unstructured data</h3><p>Performing repeatable splitting in the case of structured data is quite straightforward. In the case of unstructured data, we can perform the same by using metadata information. </p><p>It is worth noting that, many problems with poor performance of ML can be addressed by designing the data split (and data collection) with potential correlations in mind. </p><p>Hope you learned something new.</p><p>This is Anurag Dhadse, signing off.</p><p></p>]]></content:encoded></item><item><title><![CDATA[Build in Public]]></title><description><![CDATA[Creating things for free. What you can expect in return?]]></description><link>https://adhadse.com/build-in-public/</link><guid isPermaLink="false">6312a9ba85e4100f9fa19678</guid><category><![CDATA[Open Source]]></category><category><![CDATA[Learning]]></category><category><![CDATA[Developer]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 03 Sep 2022 16:08:33 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/09/scott-webb-hDyO6rr3kqk-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/09/scott-webb-hDyO6rr3kqk-unsplash.jpg" alt="Build in Public"><p>This is the last year of my Bachelor&apos;s degree. And soon after that, I&apos;ll be joining the industry to cater to customers&apos; needs. </p><p>While in turn earning some money. But that&apos;s not the end, is it?</p><p>The last three years made me reevaluate things I do, things I wake up for, and my motivation. And this reevaluation schedule isn&apos;t something I want to end anytime soon.</p><p>People out there work for money. I want to work for myself. Sure money does play a role.</p><p>While I work for myself, I want to sustain myself that way. Motivation comes when you do action, and put effort into something. One part of that motivation came as I learned to build things in public. Building in Public.</p><p>And of course, we are talking about sidekick software projects, but this can apply to a variety of different things: Content creation, learning a new skill, anything.</p><p>So why should you build in public?</p><h2 id="first-off-you-are-putting-yourself-in-front-of-everyone">First off, you are putting yourself in front of everyone</h2><p>What that means not only you are going to put the best of yourself, but also grow up upon that as you progress.</p><p>You&apos;ll learn the ability to showcase your identity.</p><p>Let me tell you by my example. One of the first websites I created was a portfolio website for myself, written in pure HTML/CSS. And that sucked. This was probably in my first year of my bachelor&apos;s. After that, I moved on to building a much more blog site (this site) and even my own personal <a href="https://wiki.anuragdhadse.com">wiki</a>. </p><h2 id="you-learn-from-your-mistakes">You learn from your mistakes</h2><p>As I moved on to learning various other tech stacks, Python, Django, I learned I can make websites faster if I use template CSS libraries like <a href="https://getbootstrap.com/">Bootstrap</a> or <a href="https://tailwindcss.com/">Tailwind</a>. </p><p>The result was <a href="https://github.com/adhadse/Shopiva">Shopiva</a>. An eCommerce website with a Backend, RESTful API, and fluent design. </p><p>And subsequently created this blogging site and a wiki. Learning from my mistakes, I worked on focusing on more important parts like functionalities, adaptability, and usability, and less on design. Learning what is much more important to complete the project, and leaving others for future self. </p><p>My wiki is basically a plain site with HTML/CSS just like my first web project, but unlike that, I used a static site generator called <a href="https://gohugo.io/">Hugo</a> to generate the webpages right from markdown files. Halving my work, while I focus on the theme.</p><h2 id="you-teach-others">You teach others</h2><p>As you embark on learning new and exciting, you always want to keep a check on your progress. In Software development, ideally, you&apos;d want to document, but for side projects, even good comments across source code work fine.</p><p>But like any task, you are very likely to forget what you learned. Building in public might help you here. As you build things up, you&apos;d expect others to read your code and ideally for your future self. In that case, you tend to make sure the code is at best readable, maintainable, and has proper comments for others to know without diving into the nitty-gritty code. </p><p>On the other hand, if you&apos;d been building in private, on yourself, it&apos;s very likely that you&apos;ll code do your heart desires, and not for anyone to appreciate your &quot;hard work&quot;. </p><figure class="kg-card kg-gallery-card kg-width-wide kg-card-hascaption"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2022/09/Forgetting_curve_decline.svg" width="307" height="244" loading="lazy" alt="Build in Public"></div><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2022/09/ForgettingCurve-1.svg" width="277" height="237" loading="lazy" alt="Build in Public"></div></div></div><figcaption>Forgetting Curve. Source: <a href="https://en.wikipedia.org/wiki/Forgetting_curve">Wikipedia</a></figcaption></figure><p>Or, you can even go on and write an article or a blog post or a YouTube video, documenting the things you tried, how to do it and what is required to fix. These are just a few of the methods of Active-recall, a technique to avoid falling down the Forgetting curve.</p><p>If you don&apos;t actively recall what you&apos;ve learned, how you&apos;ve learned. You&apos;ll forget.</p><p>Making sure others see your work, and learning from you as you preach to them, brings the benefit of gaining the skill for a far longer period.</p><h2 id="others-teach-you">Others teach you</h2><p>The open source community is ever so welcoming.</p><p>You&apos;ll hardly meet a jerk who&apos;d come and say your code is shi*t. If you do, just ignore them.</p><p>There will always be someone better than you, writing better code, better in skill set. Opening up to the public can get you the attention of like-minded people. People who share the same interest and like doing things as their profession as well as a hobby. </p><p>Public and Open Source projects help them and you to learn about their way of doing things. People who are better than you will find the issues with you (or your work) and give valuable feedback.</p><p>If you are one of the most amazing developers, for sure you&apos;ll get your work used by many, probably even gaining more users than those pesky little propriety apps.</p><p>And after all that, you&apos;ll get appreciation, and recognition from the community. You don&apos;t always need to keep your work behind a paywall. If you can build trust, the community will even support you financially as you keep giving them better.</p><p>I hope that inspires you to build something cool and usable for the public.</p><p>This is Anurag Dhadse, signing off.</p>]]></content:encoded></item><item><title><![CDATA[Demystifying Transfer Learning]]></title><description><![CDATA[Doing wonders while sitting on the shoulders of the giants.]]></description><link>https://adhadse.com/demystifying-transfer-learning/</link><guid isPermaLink="false">6309a4d385e4100f9fa19070</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sun, 28 Aug 2022 06:31:31 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/08/peter-herrmann-RCFiG86u8Us-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/08/peter-herrmann-RCFiG86u8Us-unsplash.jpg" alt="Demystifying Transfer Learning"><p>Did you ever wonder why newborns pick up so much and so fast of behavior, language, and expressions in just a matter of few years?</p><p>Well, the reason is the genetic transfer of DNA and characteristics from parents to offspring. This means that the offspring isn&apos;t dumb at all at the time of birth (unlike our machines and ahem..hem AI). It already knows how to catch attention when hungry or smile on seeing familiar faces. </p><p>This transfer of intelligence is driven by years of evolution and is hard to match with the current state of Artificial Intelligence.</p><p>This explains the difference between human intelligence where a toddler can learn to differentiate between cat and dog just after glancing few images and can even learn more, whereas ML models once trained to identify certain concept can&apos;t absorb different concept without forgetting the previous one.</p><p>But we can learn something from nature. The transfer of knowledge. Albeit is a different fashion.</p><p>Introducing Transfer Learning.</p><p><strong>In Transfer Learning, we take part of a previously trained model, freeze the weights, and incorporate these non-trainable layers into a new model that solves a similar problem, but on a smaller dataset.</strong></p><p>To go along with the analogy with humans; it&apos;s kinda like taking an intellect&apos;s brain and fusing it with a toddler one so that the learning can begin where the intellect&apos;s ended. Ah, that sounds evil.</p><p>But in ML space, it&apos;s quite common and not so evil. </p><hr><p>Let&apos;s take an example. Suppose you are tasked to build an ML model to classify between cats and dogs. If you read through my previous <a href="https://adhadse.com/checkpoints-not-every-ml-model-trains-in-minutes/#why-it-works">blog post on Checkpoints</a>, we know that the model goes through 3 different phases during training:</p><ol><li>In the first phase, training focuses on learning high-level organization of data.</li><li>In the second phase, the focus shifts to learning the details.</li><li>Finally, in the third phase, the model begins overfitting.</li></ol><p>So, even before the model can begin to objectify the concept of cats and dogs, it has to go through the first phase of absorbing high-level organizing of data. That corresponds to making sense of the pixels, their color values, edges, and shapes in the images. This is why we need a huge corpus of data to generalize the high-level concept. </p><p>Large image, text datasets like ImageNet (with over 14 million labeled examples) and GLUE can help in many ML tasks and reach high accuracy due to their immense size. But most organizations with specialized prediction problems don&apos;t have nearly as much data available for their domain or is expensive to gather, as in the Medical domain where experts are required for an accurate labeling process.</p><p>We need a solution that allows us to build a custom model using only the data we have available and with the labels that we care about.</p><hr><h2 id="understanding-transfer-learning">Understanding Transfer Learning</h2><p>With Transfer Learning, we can take a model that has been trained on the same type of data for a similar task and apply it to a specialized task using our own custom data. </p><p>By <em>same of type data</em>, we mean the same data modality&#x2013;images, text, and so forth. It is also ideal to use a model that has been pretrained on the same type of images. For example, if the end model gets input of cats/dogs from a smartphone camera, use images gathered from a smartphone camera. </p><p>By <em>similar task</em>, we&apos;re referring to the problem being solved. To do transfer learning for image classification, for example, it is better to start with a model that has been trained for image classification, rather than object detection.</p><p>Let&apos;s say we are trying to detemine if the given x-ray contains a broken bone or not. As this is a medical dataset, the size is small. Merely, 500 images for each label: <em>broken</em> and <em>not broken</em>. This obviously isn&apos;t enough to train a model from scratch, but we can use transfer learning to help a bit. We&apos;ll need to find a model that has already been trained on a large dataset to do image classification. We&apos;ll then remove the last layer from that model, freeze the weights of the model, and continue training using our 1000 x-ray images. </p><p>Ideally, we want the base model to be trained on a dataset with similar images to x-rays. However, we can still utilize transfer learning if the datasets are different, so long as the prediction task is the same. Which in this case is Image Classification.</p><p>The idea is to utilize the weights and layers from a model trained in the same domain as your prediction task. In most deep learning models, the final layer contains the classification label or output specific to your prediction task. So, we remove the layerand introduce our own final layer with the output for our specialized prediction task to continue training.</p><p>The penultimate layer of the model, <strong>the layer before the model&apos;s output layer is chosen as the</strong> <em>bottleneck layer</em>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/08/transfer_learning-1.png" class="kg-image" alt="Demystifying Transfer Learning" loading="lazy" width="2000" height="1539" srcset="https://adhadse.com/content/images/size/w600/2022/08/transfer_learning-1.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/transfer_learning-1.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/transfer_learning-1.png 1600w, https://adhadse.com/content/images/size/w2400/2022/08/transfer_learning-1.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>The &quot;top&quot; of the model, typically just the output layer is removed and remaining weights are frozen. The last layer of the remaining model is called the bottleneck layer.</figcaption></figure><h3 id="bottleneck-layer">Bottleneck layer</h3><p>The bottleneck layer represents the inputs in the lowest-dimensionality space.</p><p>Let&apos;s try implementing it in TensorFlow and Keras for X-ray images to <a href="https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia">detect viral Pneumonia</a>. We are going to use VGG19 pretrained model available in Keras <code>applications</code> module with pretrained weights of <code>imagenet</code> dataset.</p><pre><code class="language-python">vgg_model_withtop = tf.keras.applications.VGG19(
    include_top=true,
    weights=&apos;imagenet&apos;,
)</code></pre><figure class="kg-card kg-code-card"><pre><code class="language-text">Model: &quot;vgg19&quot;
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 ... more layer ...
                                                                 
 block4_pool (MaxPooling2D)  (None, 14, 14, 512)       0         
                                                                 
 block5_conv1 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv4 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
 flatten (Flatten)           (None, 25088)             0         
                                                                 
 fc1 (Dense)                 (None, 4096)              102764544 
                                                                 
 fc2 (Dense)                 (None, 4096)              16781312  
                                                                 
 predictions (Dense)         (None, 1000)              4097000   
                                                                 
=================================================================
Total params: 143,667,240
Trainable params: 0
Non-trainable params: 143,667,240
_________________________________________________________________</code></pre><figcaption>Output of <code>vgg_model_withtop.summary()</code></figcaption></figure><p>In this example, we choose the <code>block5_pool</code> layer as the bottleneck layer when we adapt this model to be trained on our Chest X-Ray Images dataset. The bottleneck layer produces a 7x7x512 dimensional array, which is a low-dimensional representation of the input image.</p><p>We hope that the information distillation will be sufficient to successfully carry out classification on our dataset.</p><figure class="kg-card kg-image-card kg-width-wide kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/08/vgg19_edited.png" class="kg-image" alt="Demystifying Transfer Learning" loading="lazy" width="2000" height="1044" srcset="https://adhadse.com/content/images/size/w600/2022/08/vgg19_edited.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/vgg19_edited.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/vgg19_edited.png 1600w, https://adhadse.com/content/images/size/w2400/2022/08/vgg19_edited.png 2400w" sizes="(min-width: 1200px) 1200px"><figcaption>Transfer Learning using VGG19 pretrained model</figcaption></figure><p>Since the model we are going to work with accepts images as 224x224x3 dimensional array, we need to either need to resize images to match this model input or change the model&apos;s input shape. Here we&apos;ll just go with resizing the input image.</p><pre><code class="language-python">vgg_model = tf.keras.applications.VGG19(
    include_top=False,
    weights=&apos;imagenet&apos;,
    input_shaoe=((224, 224, 3))
)

vgg_model.trainable = False</code></pre><p>By setting <code>include_top=False</code> we&apos;re specifying that the last layer of the VGG we want to load is the bottleneck layer. </p><p>Note that setting <code>include_top=False</code> is hardcoded to use <code>block5_pool</code> as the bottleneck layer, but if we wanted to customize this, we could have loaded the full model, like we previously did, and deleted additional layers.</p><figure class="kg-card kg-code-card"><pre><code class="language-text"> block5_conv2 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_conv4 (Conv2D)       (None, 14, 14, 512)       2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 7, 7, 512)         0         
                                                                 
=================================================================
Total params: 20,024,384
Trainable params: 0
Non-trainable params: 20,024,384
_________________________________________________________________
</code></pre><figcaption>Updated model with no &quot;top&quot;</figcaption></figure><p>With <code>keras.applications</code> module, by setting <code>input_shape</code> parameter; can change the Layers&apos;s dimensions to accomodate for the new input dimension.</p><p><strong>Well, do consider that, as a general rule of thumb, the bottleneck layer is typically the last, lowest-dimensionality, flattened layer before a flattening operation.</strong></p><p>It is also worth noting that pre-trained embeddings can also be used in Transfer Learning. With embeddings, however, the purpose is to represent an input more concisely. Whereas with Transfer Learning the purpose is to train a similar model, that could be utilized for transfer learning.</p><h3 id="implementing-transfer-learning">Implementing transfer learning</h3><p>We can implement transfer learning in Keras either by:</p><ul><li>Loading a pre-trained model, removing the layers after the bottlneck layer, and adding a new final layer with our own data and labels.</li><li>Using a pre-trained TensorFlow Hub (<a href="https://tfhub.dev">https://tfhub.dev</a>) module as the base for your transfer learning task.</li></ul><h3 id="transfer-learning-with-pre-trained-model">Transfer Learning with pre-trained model</h3><p>We have already set up our VGG model with a bottleneck layer. Let&apos;s add a few more layers to make our final model.</p><pre><code class="language-python">from tensorflow import keras

model = tf.keras.Sequential([
    vgg_model,
    keras.layers.GlobalAveragePooling2D(),
    keras.layers.Dense(2, activation=&quot;sigmoid&quot;)
])</code></pre><figure class="kg-card kg-code-card"><pre><code class="language-text">Model: &quot;sequential&quot;
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 vgg19 (Functional)          (None, 7, 7, 512)         20024384  
                                                                 
 global_average_pooling2d (G  (None, 512)              0         
 lobalAveragePooling2D)                                          
                                                                 
 dense (Dense)               (None, 2)                 1026      
                                                                 
=================================================================
Total params: 20,025,410
Trainable params: 1,026
Non-trainable params: 20,025,410
_________________________________________________________________</code></pre><figcaption>Our new model summary</figcaption></figure><p>As you can see the only trainable parameters are from the last layer (after bottleneck layer).</p><p>Had we wanted to use our own custom pre-trained model aside from what is offered in <code>keras.applications</code>, we would have done something like this:</p><figure class="kg-card kg-code-card"><pre><code class="language-python">model_A = keras.models.load_model(&quot;my_model_A.h5&quot;)
model_B_ontop_of_A = keras.models.Sequential(model_A.layers[:-1])
model_B_ontop_of_A.add(keras.layers.Dense(1, activation=&quot;sigmoid&quot;))</code></pre><figcaption>model_b_ontop_of_A` uses all layers except the last one of <code>model_A</code></figcaption></figure><p>Although, this method means the <code>model_B_ontop_of_A</code> and <code>model_A</code> shares some weight, and hence when training <code>model_B_ontop_of_A</code> will also affect <code>model_A</code>.</p><p>To avoid that, we need to clone the <code>model_A</code>&apos;s architecture with <code>clone_model()</code>, &#xA0;then copy &#xA0;its &#xA0;weights &#xA0;(since &#xA0;<code>clone_model()</code> &#xA0;does &#xA0;not &#xA0;clone &#xA0;the weights), and finally freeze the layers: </p><pre><code class="language-python">model_A_clone = keras.models.clone_model(model_A)
model_A_clone.set_weights(model_A.get_weights())

# initialize `model_B_ontop_of_A` 
model_B_ontop_of_A = keras.models.Sequential(model_A.layers[:-1])

# Freeze weights
for layer in model_B_ontop_of_A.layers[:-1]:
    layer.trainable = False</code></pre><h3 id="pre-trained-embeddings-with-tf-hub">Pre-trained embeddings with TF Hub</h3><p>With TF Hub, we can very easily load a much larger variety of pre-trained models (called modules) as a layer, and then add our own classification layer on top.</p><pre><code>hub_layer = hub.KerasLayer(
    &quot;https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1&quot;,
    input_shape=[], dtype=tf.string, trainable=False
)</code></pre><p>And, add additional layers on top:</p><pre><code>model = keras.Sequential([
    hub_layer,
    keras.layers.Dense(32, activation=&apos;relu&apos;),
    keras.layers.Dense(1, activation=&apos;sigmoid&apos;)
])</code></pre><hr><h2 id="trade-offs-and-alternatives">Trade-Offs and Alternatives</h2><p>Let&apos;s discuss the methods of modifying the weights of our original model when implementing transfer learning:</p><ul><li>Feature Extraction</li><li>Fine-tuning</li></ul><h3 id="fine-tuning-vs-feature-extraction">Fine-tuning vs Feature Extraction</h3><p><em>Feature Extraction </em>describes an approach to transfer learning where you freeze the weights of all layers before the bottleneck layer and train the following layers on our own data an labels.</p><p>In contrast, with <em>fine-tuning</em> we can either update the weights of each layer in the pre-trained model, or just a few of the layers right before the bottleneck.</p><p>One recommended approach to determining how many layers to freeze is known as <em>progressive fine-tuning</em>. This involves iteratively unfreezing layers after every training run to find the ideal number of layers to fine-tune. Also, it is recommended to lower down the learning rate as you begin unfreezing the layers. </p><p>Typically, when you&apos;ve got a small dataset, it&apos;s best to use pre-trained model as a feature extractor rather than fine-tuning.</p><!--kg-card-begin: markdown--><table>
<thead>
<tr>
<th>Criterion</th>
<th>Feature extraction</th>
<th>Fine-tuning</th>
</tr>
</thead>
<tbody>
<tr>
<td>How large is the dataset?</td>
<td>Small</td>
<td>Large</td>
</tr>
<tr>
<td>Is your prediction task the same as that of the pre-trained model?</td>
<td>Different tasks</td>
<td>Same task; or similar task with same class distribution of labels</td>
</tr>
<tr>
<td>Budget for training time and computational cost</td>
<td>Low</td>
<td>High</td>
</tr>
</tbody>
</table>
<!--kg-card-end: markdown--><h3 id="is-transfer-learning-possible-with-tabular-data">Is Transfer Learning possible with tabular data?</h3><p>Tabular data, however, cover a potentially infinite number of possible prediction tasks and data types. And so, as such currently Transfer Learning is not so common with tabular data.</p><p>That&apos;s all for today.</p><p>This is Anurag Dhadse, signing off.</p>]]></content:encoded></item><item><title><![CDATA[Checkpoints. Not every ML model trains in minutes.]]></title><description><![CDATA[Methods for Reliable and resilience training of large Deep Learning models]]></description><link>https://adhadse.com/checkpoints-not-every-ml-model-trains-in-minutes/</link><guid isPermaLink="false">63007f3485e4100f9fa18c21</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 20 Aug 2022 15:46:53 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/08/yousef-espanioly-AWYI4-h3VnM-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/08/yousef-espanioly-AWYI4-h3VnM-unsplash.jpg" alt="Checkpoints. Not every ML model trains in minutes."><p>Yeah, not every ML model trains in minutes, at least not in Deep Learning space.</p><p>The more complex the model is, the larger the dataset is required to train it because the subsequent increase in parameters. This leads to taking longer to fit a batch, and hence longer training time.</p><p>In that case, it is good to think about measures to withstand the chances of machine failure during the training process. We don&apos;t want to begin from scratch when we have already done half of the work.</p><hr><p><strong>Checkpoints allow us to store the full state of the partially trained model (the architecture, weights) along with hyerparameters/parameters required to begin training from that point, periodically during the training process.</strong></p><p>We can use this partially trained model as:</p><ul><li>Final models (in case of <em>early stoping</em>, discussed later)</li><li>Starting point to continue training (machine failure and fine-tuning)</li></ul><p>Checkpoints make sure to save the intermediate model state, as compared to <em>exporting</em> in which only the final model parameters (weights &amp; biases) and architecture are exported. To begin retraining more information is required other than the above two. Take, for example, the optimizer that was used, with what parameters it was running, its state, how many epochs were set, how many were completed, and so on.</p><p>In Keras, we can create checkpoints using Keras callback, <code><a href="https://keras.io/api/callbacks/model_checkpoint/">ModelCheckpoint</a></code> passed to <code>fit()</code> method.</p><pre><code class="language-python">import time

model_name = &quot;my_model&quot;
run_id = time.strftime(f&quot;{model_name}-run_%d_%m_%Y-%H_%M_%S&quot;)
checkpoint_path = f&quot;./checkpoint/{run_id}.h5&quot;
cp_callback = tf.keras.callbacks.ModelCheckpoint(
    checkpoint_path,
    save_weights_only=False,
    verbose=1)
  
history = model.fit(
    x_train, y_train,
    batch_size=64,
    epochs=3,
    validation_data=(x_val, y_val),
    verbose=2,
    callbacks=[cp_callback])</code></pre><p><code>ModelCheckpoint</code> allow us to save checkpoints after the end of each epoch. We can do checkpointing at the end of each batch, but the checkpoints size and I/O will add too much overhead.</p><hr><h2 id="why-it-works">Why it works</h2><p>Partially trained models offer more options than just continued training. This is because they are usually more generalizable than the models created in later iterations.</p><p>We can break the training into three phases:</p><ol><li>In the first phase, training focuses on learning high-level organization of data.</li><li>In the second phase, the focus shifts to learning the details.</li><li>Finally in the third phase, the model begins overfitting.</li></ol><p>A partially trained model from the end of phase 1 or from phase 2 becomes more advantageous because it has learned the high-level organization but still hasn&apos;t dived into the details.</p><hr><h2 id="trade-offs-and-alternatives">Trade-Offs and Alternatives</h2><h3 id="early-stopping">Early Stopping</h3><p>Usually, the longer the training continues, the lower the loss goes on the training dataset. However, at some point, the rror on the validation dataset might stop decreasing. This is where overfitting begins to take place. This phenomenon is evident with the increase in the validation error.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/08/ovefiting_model_training--1.png" class="kg-image" alt="Checkpoints. Not every ML model trains in minutes." loading="lazy" width="2000" height="1394" srcset="https://adhadse.com/content/images/size/w600/2022/08/ovefiting_model_training--1.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/ovefiting_model_training--1.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/ovefiting_model_training--1.png 1600w, https://adhadse.com/content/images/size/w2400/2022/08/ovefiting_model_training--1.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Once overfitting begins, the validation error starts climbing up</figcaption></figure><p>It can be helpful to look at the validation error at the end of every epoch and stop training when the validation error is more than that of the previous epoch.</p><h3 id="checkpoint-selection">Checkpoint selection</h3><p>It is not uncommon for the validation error to increase for a bit and then start to drop again. This usually happens because training initially focuses on more common scenarios (phase 1), then on rare samples (phase 2). Because rare situations may be imperfectly sampled between the training and validation datasets, occasional increases in the validation error during the training run are to be expected in phase 2.</p><p>So, <strong>we should train for longer and choose the optimal run as a preprocessing step.</strong></p><p>In our above example, we&apos;ll continue training for longer. Load the fourth checkpoint and export the final model. This is called <strong>checkpoint selection</strong> and in TensorFlow can be achieved using <a href="https://www.tensorflow.org/api_docs/python/tf/estimator/BestExporter"><code>BestExporter</code></a>.</p><h3 id="regularizations">Regularizations</h3><p>We can try to plateau both, validation error and training loss by adding L2 regularization to the model instead of the above two techniques.</p><p>Such a training loop is termed as a <em>well-behaved</em> training loop.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/08/ideal_model_training.png" class="kg-image" alt="Checkpoints. Not every ML model trains in minutes." loading="lazy" width="2000" height="1394" srcset="https://adhadse.com/content/images/size/w600/2022/08/ideal_model_training.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/ideal_model_training.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/ideal_model_training.png 1600w, https://adhadse.com/content/images/size/w2400/2022/08/ideal_model_training.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>In an ideal situation, the validation error and training loss should plateau.</figcaption></figure><p>However, recent studies suggest that double descent happens in a variety of machine learning problems, and therefore it is better to train longer rather than risk a suboptimal solution by stopping early.</p><p>In the <strong>experimentation phase</strong> (when we are exploring different model architectures, hypertuning, etc), it&apos;s recommended that you <strong>turn off early stopping and train with larger models</strong>. This will ensure that model has enough capacity to learn the predictive patterns. <strong>At the end of experimentation, you can use the evaluation dataset to diagnose how well your model doe</strong>s on data it has not encountered during training. </p><p>When training the <strong>model to deploy in production</strong>, <strong>turn on early stopping or checkpoint selection and monitor the error metric on the evaluation dataset</strong>. </p><p>When you need to control cost, choose early stopping, and when you want to prioritize model accuracy choose checkpoint selection.</p><h3 id="fine-tuning">Fine-tuning</h3><p>Fine-tuning is a process that takes a model that has already been trained for one given task and then tunes or tweaks the model to make it perform a second similar task. Since our checkpoint model, we can train on an already optimally performing model on small fresh data. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/08/fine_tuning.png" class="kg-image" alt="Checkpoints. Not every ML model trains in minutes." loading="lazy" width="2000" height="1394" srcset="https://adhadse.com/content/images/size/w600/2022/08/fine_tuning.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/fine_tuning.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/fine_tuning.png 1600w, https://adhadse.com/content/images/size/w2400/2022/08/fine_tuning.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Resume from a checkpoint from before the training loss starts to plateau. Then train only on fresh data for subsequent iterations.</figcaption></figure><p>Starting from an earlier checkpoint tends to provide better generalizations as compared to final models/checkpoints.</p><h3 id="redefining-an-epoch">Redefining an epoch</h3><p>Epochs are easy to understand. It is number of times the model has gone over the entire dataset during training. But the use of epochs can leads to bad effects in real-world ML models.</p><p>Let&apos;s take an example, were we are going to train a ml model for 15 epochs using a TensorFlow Dataset with one million examples.</p><pre><code class="language-python">cp_callback = tf.keras.callbacks.ModelCheckpoint(...)
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=15,
    batch_size=128,
    callbacks=[cp_callback])</code></pre><p>The problem with this are:</p><ul><li>If the model converges after having seen 14.3 million examples (i.e., after 14.3 epoch) <strong>we might want to exit and not waste any more computational resource</strong>.</li><li><code>ModelCheckpoint</code> creates checkpoint at each epoch end. <strong>For resilience, we might want to checkpoint more often</strong> instead of waiting to process 1 million examples.</li><li>Datasets grows over time. If we get 1,00,000 more examples and we train the model and get a higher error, is it because we need an early stop or the data is corrupt. W<strong>e can&apos;t tell &#xA0;because the prior training was on 15 million examples and the new one is on 16.5 million examples</strong> (15 million + 1,00,000 new examples * 15 epochs).</li><li>In distributed, parameter-server training the concept of an epoch is not clear. Because of potentially straggling workers, <strong>you can only instruct the system to train on some number of mini-batches</strong>.</li></ul><h3 id="steps-per-epoch">Steps per epoch</h3><p>Instead of training for 15 epochs, we might decide to train for 143,000 steps where <code>batch_size</code> is 100:</p><pre><code class="language-python">NUM_STEPS = 143_000
BATCH_SIZE = 100
NUM_CHECKPOINTS = 15
cp_callback = tf.keras.callbacks.ModelCheckpoint(...)

history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=NUM_CHECKPOINTS,
    steps_per_epoch=NUM_STEPS // NUM_CHECKPOINTS,
    batch_size=BATCH_SIZE,
    callbacks=[cp_callback])</code></pre><p>It works as long as we make sure to repeat the <code>train_ds</code> infinitely:</p><pre><code class="language-python">train_ds = train_ds.repeat()</code></pre><p>Although this gives us much more granularity, but we have to define an &quot;epoch&quot; as 1/15th of the total number of steps:</p><pre><code class="language-python">steps_per_epoch=NUM_STEPS // NUM_CHECKPOINTS</code></pre><h3 id="retraining-with-more-data">Retraining with more data</h3><p>Let&apos;s talk about the scenario when we added 1,00,000 more examples. Our code remains same and processes 143,000 steps except that 10% of the examples it sees are newer.</p><p>If the model converges, great. If it doesn&apos;t we know that these new data points are the issue because we are not training as we were before.</p><p>Once we have trained for 143,000 steps, we restart the training and run it a bit longer. as long as model continues to converge. Then, we update the number 143,000 in the code above (in reality, it will be a parameter to the code) to reflect the new number of steps.</p><p>This works fine until you begin hyperparameter tuning. Let&apos;s say you changes the batch size to 50, then you&apos;ll only be training for half the time because the steps are constant &#xA0;(143,000) and each step is now will only take half as long as before.</p><h3 id="introducing-virtual-epochs">Introducing Virtual epochs</h3><p>The <strong>solution is to keep the total number of training examples shown to the model constant</strong> and not the number of steps.</p><pre><code class="language-python">NUM_TRAINING_EXAMPLES = 1000 * 1000
STOP_POINT = 14.3
TOTAL_TRAINING_EXAMPLES = int(STOP_POINT * NUM_TRAINING_EXAMPLES)
BATCH_SIZE = 100
NUM_CHECKPOINTS = 15
steps_per_epoch = (
    TOTAL_TRAINING_EXAMPLES // (BATCH_SIZE * NUM_CHECKPOINTS)
)
cp_callback = tf.keras.callbacks.ModelCheckpoint(...)

history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=NUM_CHECKPOINTS,
    steps_per_epoch=steps_per_epoch,
    batch_size=BATCH_SIZE,
    callbacks=[cp_callback]
)</code></pre><p>When we get more data, first train it with the old settings, then increase the number of examples to reflect the new data, and finally change the <code>STOP_POINT</code> to reflect the number of times you have to traverse the data to attain convergence.</p><p>This will work even when we are doing hyperparameter tuning while retaining all the advantages of keeping the number of steps constant.</p><p>Hope you learned something wonderful.</p><p>This is Anurag Dhadse, Signing off.</p>]]></content:encoded></item><item><title><![CDATA[Diving Into REST APIs]]></title><description><![CDATA[Basics of creating talkable servers.]]></description><link>https://adhadse.com/diving-into-rest-apis/</link><guid isPermaLink="false">629ec8c18413e049cdd480d8</guid><category><![CDATA[Software Engineering]]></category><category><![CDATA[System Design]]></category><category><![CDATA[API]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sun, 14 Aug 2022 02:30:00 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/08/vasilis-chatzopoulos-6lwyzronth8-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/08/vasilis-chatzopoulos-6lwyzronth8-unsplash.jpg" alt="Diving Into REST APIs"><p>APIs or <strong>Application Programming Interfaces</strong> are magical creatures, I mean really; or at least to us, programmers. </p><p>Before I tell you more APIs, it&apos;s important to first briefly go through &quot;Interfaces&quot; first.</p><p>Interfaces are everywhere, in your smartphones as GUI (Graphical User Interfaces) based music player app offering to end users, or BASH shell offering CLI (Command Line Interfaces) to end users and even to programmers.</p><p>So what does an Interfaces do? Interfaces adds a layer of abstraction, hiding away the intricate details of what and how something works underneath. The users of interfaces don&apos;t need to know how music player app plays the music, or command line user to know how the command gets executed on press of a button.</p><p>That means, APIs bring abstraction but to different kind of users, they are not end users like in a music player example, but instead developers or Programmers (for &apos;P&apos; in API).</p><p>For example if we are the programmer of the music player app, we might not need to implement code for gesture or play/pause feature.</p><p>This can be provided by an API of an SDK (<strong>Software Development Kit</strong>) provided by the platform.</p><p>You&apos;ll also find other libraries offering their APIs to aid in your work, like for e.g. TensorFlow, a mathematical computation framework written in C++ offers it&apos;s API in Python and Java as well.</p><p>But for most part when we talk about APIs, we are talking about Web APIs, and not an API of a library or an SDK, unless as told.</p><p>So, In this article we&apos;ll go through all about APIs, what they do, how they do, and what do we need to keep in mind as a developer when building our Web APIs.</p><h2 id="what-exactly-is-an-api">What exactly is an API?</h2><p>API or a Web API is an software interface accessibly via internet which offers a connection between your software and the software running on the server as a type of service.</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2022/02/image-6.png" class="kg-image" alt="Diving Into REST APIs" loading="lazy" width="1800" height="576" srcset="https://adhadse.com/content/images/size/w600/2022/02/image-6.png 600w, https://adhadse.com/content/images/size/w1000/2022/02/image-6.png 1000w, https://adhadse.com/content/images/size/w1600/2022/02/image-6.png 1600w, https://adhadse.com/content/images/2022/02/image-6.png 1800w" sizes="(min-width: 720px) 720px"></figure><p>This allows us to perform some operations on the server, or get some data from the server or both; performing operation and getting the result back from the server.</p><p>This can be especially useful when:</p><ol><li>The client application needs some kind of service but don&apos;t have the capabilities/resources to do it. Say for example, Shazam (a popular music identification app) might need machine learning/Deep Learning services to run which the client side will not have resources to run them because of performance constraints.</li><li>or the service it reside in server and database needs to be accessed for <strong>CRUD</strong> operations (<strong>Create Read Update Delete</strong>). Like storing records of a user, </li></ol><p>API developer exposes <em>endpoints</em> relative to domain on which the API is hosted where on passing a request to the endpoint, we&apos;ll get a response. </p><p>So for example, an endpoint might look something like this:</p><ul><li><code>&lt;domain-name&gt;/reply</code> </li><li><code>&lt;domain-name&gt;/posts/reply</code></li><li>from Spotify <code>https://api.spotify.com/v1/albums/{id}/tracks</code></li><li>from twitter <code>https://api.twiiter.com/2/users/{id}/liked_tweets</code></li></ul><p>Making an HTTP request with a suitable <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods">HTTP method</a> to an endpoint gets the process started.</p><h3 id="a-hello-world-example-of-an-api">A &quot;Hello World&quot; example of an API</h3><p>What internally gets done depends upon your imagination. But let&apos;s create our own simple API using <a href="https://fastapi.tiangolo.com/">FastAPI</a> framework in Python.</p><pre><code class="language-Python">from fastapi import FastAPI

app = FastAPI()

@app.get(&quot;/&quot;)
def check_reply():
    return {&quot;reply&quot;: &quot;hello world&quot;}</code></pre><p>Running this app locally, when we visit <code>localhost (http://127.0.0.1/)</code>, we&apos;ll be greeted with a JSON reply.</p><figure class="kg-card kg-image-card"><img src="https://adhadse.com/content/images/2022/02/screely-1644917544393.png" class="kg-image" alt="Diving Into REST APIs" loading="lazy" width="1996" height="1083" srcset="https://adhadse.com/content/images/size/w600/2022/02/screely-1644917544393.png 600w, https://adhadse.com/content/images/size/w1000/2022/02/screely-1644917544393.png 1000w, https://adhadse.com/content/images/size/w1600/2022/02/screely-1644917544393.png 1600w, https://adhadse.com/content/images/2022/02/screely-1644917544393.png 1996w" sizes="(min-width: 720px) 720px"></figure><p>Couple of point to note here:</p><ul><li><code>@app.get(&quot;/&quot;)</code> decorator make sure that the method only runs when a request with <code>GET</code> method is made to <code>/</code> root URL. </li><li><a href="https://www.json.org/json-en.html">JSON</a> means<strong> JavaScript Object Notation</strong>, i.e., JavaScript&apos;s way of denoting an object. It&apos;s very similar to Python Dictionary and that&apos;s why we returned a dictionary with a key-value pair which FastAPI make sure to convert to JSON before sending the reply. This object can be anything you like, which could be JSON-ified. </li><li>There is another format in which a reply could be send, that is <a href="https://developer.mozilla.org/en-US/docs/Web/XML/XML_introduction">XML</a> (<strong>EXtensible Markup Languag</strong>e) which used to be popular for sending raw text data over the internet before JSON came along. </li></ul><figure class="kg-card kg-bookmark-card kg-card-hascaption"><a class="kg-bookmark-container" href="https://www.youtube.com/watch?v=kc8BAR7SHJI"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Discovering JavaScript Object Notation with Douglas Crockford</div><div class="kg-bookmark-description">Computer&#x2019;s multimedia editor Charles Severance captures a video interview with Douglas Crockford on the creation of JavaScript Object Notation (JSON). From C...</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.youtube.com/s/desktop/5bf1652c/img/favicon_144x144.png" alt="Diving Into REST APIs"><span class="kg-bookmark-author">YouTube</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://i.ytimg.com/vi/kc8BAR7SHJI/hqdefault.jpg" alt="Diving Into REST APIs"></div></a><figcaption>How JSON was discovered</figcaption></figure><p>When click on a link of a website, the HTTP request method that our browser creates is <code>GET</code> method.</p><p>And that&apos;s why we were able to get a response back. </p><p>Now what if instead of just getting some data, we want to upload some data too? </p><p>Well that requires us to change the HTTP method from <code>GET</code> to something like <code>POST</code>. Let&apos;s discuss HTTP methods first and what do they mean to our API.</p><h3 id="http-methods">HTTP Methods</h3><p>Our API relies completely on <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP">HTTP protocol</a> for communication. So which request needs to be deal with in which way is all defined by HTTP. </p><p>HTTP request methods indicate the desired action to be performed for a given resource identified by an endpoint. </p><p>There are variety of request methods but to us these five are most important when creating our own APIs.</p><ul><li><code>GET</code> method is used to <strong>read</strong> (or retrieve) a representation of a resource. Requests using <code>GET</code> should only retrieve data. A successful request is returned with an JSON/XML response with <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Status">HTPP status code</a> of 200 (0k) or in case of an error with 404 (Resource not found) or with 400 (Bad Request)</li><li><code>POST</code> method is most-often utilized to <strong>create</strong> new resources, by submitting an entity to the specified resource, often causing a change in state or side effects on the server, like creating a new tuple in a database table. On successful creation, return HTTP status 201, returning a Location header with a link to the newly-created resource in a JSON/XML response.</li><li><code>PUT</code> method is used for used for <strong>updating/replacing</strong> the current representations of the target resource with the request payload. On successful <code>PUT</code> request, return 200 (or 204 if not returning any content in the body).</li><li><code>PATCH</code> method applies<strong> partial modifications </strong>to a resource. The might look similar to <code>PUT</code>, but in <code>PATCH</code> request the body contains a set of instructions describing how a resource currently residing on server should be modified to produce a new version instead of just a modified part of resource. </li><li><code>DELETE</code> method <strong>deletes</strong> the specified resource identified by the URI. </li></ul><blockquote>A specific request should only lead to what the request method implies, nothing more, nothing less. For e.g., a GET method shouldn&apos;t update a resource, only fetch.</blockquote><h2 id="rest-%E2%80%93-representational-state-transfer">REST &#x2013; Representational State Transfer</h2><p>Today most APIs are designed to conform with REST design principles and is the most prominent architecture style that is used to design APIs today. REST is not a protocol, but rather a design philosophy that builds upon the principles of HTTP.</p><p>The reason for this dominance is the flexibility and freedom it provides for developers over other options such as SOAP or XML-RPC.</p><p>That&apos;s why APIs designed with REST in mind are often called RESTful APIs. </p><p>The only requirement being that the API needs to be written while abiding to these 6 architectural constraints:</p><ol><li><strong>Client-Server Architecture &#x2013; </strong>The Client and server applications must be completely independent of each other. Client is only supposed to know about the URI the resource is located and can&apos;t interact with server application in any form. The server application too can&apos;t perform any kind of interaction other that providing the request resource.</li><li><strong>Statelessness &#x2013; </strong>The designed API should be stateless,<strong> </strong>meaning that each request needs to include all the information necessary for processing it, no server-side sessions.</li><li><strong>Layered System</strong> &#x2013; In REST APIs, the calls and responses go through different layers. So, API need to be designed so that neither the client nor the server can tell whether it communicates with end application or an intermediary</li><li><strong>Cacheability &#x2013; </strong>When possible resources should be cacheable on client or server side. This has to do with performance improvement on client side, while increasing scalability on server side.</li><li><strong>Uniform Design</strong> &#x2013; All API request for same resource should look same, no matter where the request comes from. So a REST API needs to ensure that same piece of data, such as &#xA0;name or email address of a user belongs to only one Uniform Resource Identifier (URI).</li><li><strong>Code on Demand (optional)</strong> &#x2013; REST APIs usually send static resources, but sometimes can also contain executable code, in which case the code should only run on-demand.</li></ol><h3 id="soap-vs-rest">SOAP v/s REST</h3><p>SOAP or <strong>Simple Object Access Protocol</strong> in contrast to REST is an XML-based <strong>protocol</strong> for making network API requests. Although it most is most commonly used over HTTP, it aims to be independent from HTTP and avoids using most HTTP features (like HTTP methods). </p><p>SOAP had a lot of rules. It comes with a sprawling and complex multitude of related standards that add various features. SOAP web service is described using an XML-based language called the Web Services Description Language, or <a href="https://en.wikipedia.org/wiki/Web_Services_Description_Language">WSDL</a>. </p><p>WSDL is not designed to be human readable, and as SOAP messages are often too complex to construct manually, users of SOAP rely heavily on tool support, code generation and IDEs.</p><p>That means, even though SOAP and its various extensions offered standardization, interoperability between different vendors&apos; implementation often causes problems. For these reasons, SOAP fall out of favor and REST grew up.</p><p>REST&apos;s main idea was for each piece of data the URL should stay the same, but operation would change depending on what method was used. For example request to &quot;https://website.com/cart&quot; with GET will return all cart item, but a POST request to the same URL would add an item to cart.</p><p>REST also also offers a greater variety of data formats rather than just sticking to XML, while most APIs default to JSON which offers better support for browser clients, faster parsing and works better with data. This means superior performance, particularly through caching for information that&apos;s not altered and not dynamic.</p><p>REST services are often described using a definition format such as OpenAPI. Let&apos;s save that topic for some other day.</p><p>Hope you learned something.</p><p>This is Anurag Dhadse, signing off.</p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Deliberately Overfitting your model]]></title><description><![CDATA[Sometimes even too much is also good.]]></description><link>https://adhadse.com/deliberately-overfitting-your-model/</link><guid isPermaLink="false">62e8b48a85e4100f9fa18809</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sun, 07 Aug 2022 14:24:14 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/08/minseok-kwak-r0PIZn0bu9g-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/08/minseok-kwak-r0PIZn0bu9g-unsplash.jpg" alt="Deliberately Overfitting your model"><p>Remember those days of your life as an amateur ML enthusiast, celebrating when you trained your model on a toy dataset and received 100% accuracy!</p><p>Then you were introduced to the concept of <strong>Overfitting</strong>. </p><p>The problem occurs when a model starts to memorize the training data instead of generalizing it to new data. What you wanted was a generalized concept within a model but you got a rote learned model.</p><p>But, it&apos;s not always that bad. Sometimes you do intentionally want your model to Overfit. </p><p>Let&apos;s learn when you want to forget about the concept of generalization and accept the fate of rote learning.</p><hr><p>The goal of almost all use case scenarios of machine learning is to generalize and learn the overall correlation of features with the label. If our model <em>overfits</em> the training data (the training loss keeps decreasing but the validation loss has started to increase) then the model&apos;s ability to generalize suffers and we don&apos;t get an effective model.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/08/regression-1.png" class="kg-image" alt="Deliberately Overfitting your model" loading="lazy" width="2000" height="1524" srcset="https://adhadse.com/content/images/size/w600/2022/08/regression-1.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/regression-1.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/regression-1.png 1600w, https://adhadse.com/content/images/size/w2400/2022/08/regression-1.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Random points and a regression line</figcaption></figure><p>However, in cases such as simulating the behavior of physical or dynamical systems like those found in climate science, computational biology, or computational finance. These systems are often described by a mathematical function or set of <a href="https://en.wikipedia.org/wiki/Partial_differential_equation">partial differential equations (PDE)</a>. <strong>Although the equations that govern these systems can be formally expressed, they don&apos;t have a closed-form solution</strong>, an equation is said to be a closed-form solution if it solves a given problem in terms of functions and mathematical operations from a given generally-accepted set.</p><p>Or in other terms, a <strong><a href="https://en.wikipedia.org/wiki/Closed-form_expression">closed-form expression</a></strong> is a <a href="https://en.wikipedia.org/wiki/Expression_(mathematics)">mathematical expression</a> that uses a <a href="https://en.wikipedia.org/wiki/Finite_set">finite</a> number of standard operations. It may contain <a href="https://en.wikipedia.org/wiki/Constant_(mathematics)">constants</a>, <a href="https://en.wikipedia.org/wiki/Variable_(mathematics)">variables</a>, certain well-known <a href="https://en.wikipedia.org/wiki/Operation_(mathematics)">operations</a> (e.g., + &#x2212; &#xD7; &#xF7;), and <a href="https://en.wikipedia.org/wiki/Function_(mathematics)">functions</a> (e.g., <a href="https://en.wikipedia.org/wiki/Nth_root"><em>n</em>th root</a>, <a href="https://en.wikipedia.org/wiki/Exponent">exponent</a>, <a href="https://en.wikipedia.org/wiki/Logarithm">logarithm</a>, <a href="https://en.wikipedia.org/wiki/Trigonometric_functions">trigonometric functions</a>, and <a href="https://en.wikipedia.org/wiki/Inverse_hyperbolic_functions">inverse hyperbolic functions</a>), but usually no <a href="https://en.wikipedia.org/wiki/Limit_of_a_sequence">limit</a>, <a href="https://en.wikipedia.org/wiki/Derivative">differentiation</a>, or <a href="https://en.wikipedia.org/wiki/Integral">integration</a>.</p><p>For example, the quadratic equation,</p><p>$$ax^2 + bx + c = 0$$</p><p>is tractable since its solutions can be expressed as a closed-form expression, i.e. in terms of elementary functions (no limit, differentiation, or integration):</p><p>$$x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$$</p><p>So, these dynamic systems instead use classical numerical methods to approximate solutions. Unfortunately, for many real-world applications, these methods can be too slow to be used in practice.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/08/overfitting_example.png" class="kg-image" alt="Deliberately Overfitting your model" loading="lazy" width="2000" height="1119" srcset="https://adhadse.com/content/images/size/w600/2022/08/overfitting_example.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/overfitting_example.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/overfitting_example.png 1600w, https://adhadse.com/content/images/size/w2400/2022/08/overfitting_example.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>One such example of useful overfitting is when the entire domain of input data points and solutions is already tabulated and a physical model capable of computing the precise solution is available.</figcaption></figure><p>In such situations, ml models need to learn the precisely calculated and non-overlapping lookup table of inputs and outputs. <strong>Splitting such a dataset into the usual training-testing-validation split is also unnecessary since we aren&apos;t looking for generalization.</strong></p><hr><p>In this scenario, there is no &quot;unseen&quot; data that needs to be generalized, since all possible inputs have been tabulated. </p><p>Here, there is some physical phenomenon that you are trying to learn that is governed by an underlying PDE or system of PDEs. Machine Learning merely provides a data-driven approach to approximate the precise solution.</p><p>The Dynamic system that we are talking about here is a set of equations governed by some established laws&#x2013;there is no unobserved variable, no noise, and no statistical variability. For a given set of inputs, there is only one precisely calculated output. Also, unlike other ml problems that suffer from probabilistic nature (like predicting rainwater amount), there are no overlapping examples in the training dataset. For this reason, we don&apos;t bother about overfitting our model.</p><p>You might ask, why not use an actual lookup table instead of using an ml model in these kinds of situations? </p><p>The problem is the training dataset can be too large (in size of Terabytes and Petabytes). Using an actual lookup table is just not possible in production settings. An ml model will be able to infer the approximate solution in a fraction of the time as compared to a lookup table or an actual physics model.</p><h2 id="why-does-it-work">Why does it work?</h2><p>The usual ML modeling involves training on data points sampled from the population. This sample represents the actual distribution of the data that we want to conceptualize. </p><figure class="kg-card kg-gallery-card kg-width-wide"><div class="kg-gallery-container"><div class="kg-gallery-row"><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2022/08/perfectfitting_model.png" width="2000" height="1311" loading="lazy" alt="Deliberately Overfitting your model" srcset="https://adhadse.com/content/images/size/w600/2022/08/perfectfitting_model.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/perfectfitting_model.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/perfectfitting_model.png 1600w, https://adhadse.com/content/images/size/w2400/2022/08/perfectfitting_model.png 2400w" sizes="(min-width: 720px) 720px"></div><div class="kg-gallery-image"><img src="https://adhadse.com/content/images/2022/08/overfitting_model.png" width="2000" height="1311" loading="lazy" alt="Deliberately Overfitting your model" srcset="https://adhadse.com/content/images/size/w600/2022/08/overfitting_model.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/overfitting_model.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/overfitting_model.png 1600w, https://adhadse.com/content/images/size/w2400/2022/08/overfitting_model.png 2400w" sizes="(min-width: 720px) 720px"></div></div></div></figure><p>When the observation space represents all possible data points, clearly we don&apos;t need the model to generalize. We would ideally want the model to learn as many data points as possible with no training error.</p><p>Deep learning approaches to solving differential equations or complex dynamical systems aim to represent a function defined implicitly by a differential equation, or system of equations, using a neural network.</p><p><strong>Overfitting becomes useful </strong>when these two conditions are met,</p><ul><li><strong>There is no noise</strong>, so the labels are accurate for all instances.</li><li><strong>You have the complete dataset at your disposal</strong>, overfitting becomes interpolating the dataset.</li></ul><h2 id="alternatives-and-use-cases">Alternatives and Use cases</h2><h3 id="interpolation-and-chaos-theory">Interpolation and chaos theory</h3><p>The machine learning model we are trying to build here is essentially an approximation to a lookup table of inputs and outputs via interpolation of the given dataset. If the lookup table is small, just use a lookup table, there is no need to approximate it by a machine learning model.</p><p>Such interpolation works only if the underlying system is not chaotic. In chaotic systems, &#xA0;(suffering from probabilistic behavior) even if the system is deterministic, small differences can lead to drastically different outcomes.</p><p>In practice, however, each specific chaotic phenomenon has a specific resolution threshold beyond which it is possible for models to forecast it over a short period of time.</p><p>So, as long as the lookup table is fine-grained enough and the limits of resolvability are understood, useful approximations via ml techniques are possible.</p><h3 id="distilling-knowledge-of-neural-network">Distilling knowledge of neural network</h3><p>Another use case where overfitting comes useful is in knowledge distillation from a large machine learning model where its large computational complexity and learning capacity might not be fully utilized. While smaller models have enough capacity to represent the knowledge, they may lack the capacity to learn the knowledge efficiently.</p><p>In such cases, the solution is to train the smaller model on a large amount of generated data that is labeled by the larger model. The smaller model learns the soft output of the larger model, instead of actual hard labels on real data. This is similar to the above discussion, where we are trying to approximate the numerical function of the larger model to match the predictions. </p><p>The second training step of training the smaller model can employ useful overfitting.</p><h3 id="overfitting-a-batch">Overfitting a batch</h3><p>In the Deep Learning area, it is often preached to start with a complex enough model that can learn the dataset which has the ability to overfit. To generalize such a large model, we then employ regularization techniques such as Data augmentation, Dropout, etc to avoid overfitting. </p><p>A complex enough model <em>should</em> be able to overfit on a small enough batch of data, assuming everything is set up correctly. If you are not able to overfit a small batch with any model, it&apos;s worth rechecking model code, input and preprocessing pipeline, and loss function for any errors or bugs. This serves as a little checkbox when starting the modeling experimentation.</p><p>In Keras, you can use an instance of <code>tf.data.Dataset</code> to pull a single batch of data and try overfitting it:</p><figure class="kg-card kg-code-card"><pre><code class="language-python">BATCH_SIZE = 256
single_batch = train_ds.batch(BATCH_SIZE).take(1)

model.fit(single_batch.repeat(),
          validation_data=valid_ds,
          ...)</code></pre><figcaption>Note that we are apply <code>repeat()</code> so that we won&apos;t run out of data when training on that single batch.&#xA0;</figcaption></figure><p>That&apos;s all for today.</p><p>This is Anurag Dhadse, signing off.</p>]]></content:encoded></item><item><title><![CDATA[Active Learning. An alternative to Lengthy Data Labeling process.]]></title><description><![CDATA[From small dataset to effective model.]]></description><link>https://adhadse.com/active-learning-an-alternative-to-lengthy-data-labeling/</link><guid isPermaLink="false">62de1d9685e4100f9fa18584</guid><category><![CDATA[Data Science]]></category><category><![CDATA[Machine Learning]]></category><dc:creator><![CDATA[Anurag Dhadse]]></dc:creator><pubDate>Sat, 30 Jul 2022 01:15:00 GMT</pubDate><media:content url="https://adhadse.com/content/images/2022/08/david-brooke-martin-lFTtQqVfx6g-unsplash.jpg" medium="image"/><content:encoded><![CDATA[<img src="https://adhadse.com/content/images/2022/08/david-brooke-martin-lFTtQqVfx6g-unsplash.jpg" alt="Active Learning. An alternative to Lengthy Data Labeling process."><p>Suppose you are a Data Scientist who has solved many Data problems. Many of them were probably solved by creating models from readily, freely available data or maybe payed to the organization who owned it.</p><p>What if you came across a problem, where the data (and labels) are not available. Depending on the specific business requirement, you will have a talk with the Data engineering team and set up Data acquisition and Data Labeling.</p><p>And often this process becomes expensive depending on the domain (medical/industrial) and the amount of data that is required to be labeled.</p><p>For a few suitable scenarios, you can escape from labeling your entire data and get away with labeling a few specific examples and propagating them to the entire dataset. This saves money and time.</p><p>That&apos;s what Active Learning is.</p><p><strong>Active learning is the process of prioritising the data which needs to be labelled in order to have the highest impact to training a supervised model.</strong></p><hr><p>But you may ask, why specific examples? Why not choose to label a random sample from the acquired data?</p><p>The problem lies with the quality of the labeled data. </p><p>Machine learning programs are decidedly effective at spotting patterns, associations, and rare occurrences in a pool of data. With randomly labeled data, the quality suffers and it becomes impossible for ML models to learn these complex patterns.</p><p>What we want the model to do is to grab essential complex properties about the dataset just enough that the performance becomes modest. Our goal should be to create a dataset that includes variations in each of our classes.</p><p>And get predictions from this modest model to get a much larger dataset training on which we&apos;ll have an even more reliable and performing model.</p><p>So for example, if we have new 10,000 data points, containing examples for 10 different classes, and <strong>we can label only 1000 of them</strong>, that&apos;s the budge (or if that number is suitable to create a modest performing model). We&apos;ll create a 1000 data points labeled dataset with the same amount of examples for each class and represent each variation possible in that class. We&apos;ll then label additional data after evaluating the generated model.</p><p>Let&apos;s go over a few common techniques of Active Learning. </p><p><strong>All active learning techniques rely on us leveraging some number of examples with ground truth, and accurate labels. What they differ in the way they use these accurately labeled data points to identify and label unknown data points.</strong></p><hr><h2 id="pool-based-sampling">Pool-Based Sampling</h2><p>Pool-based sampling is probably the most common technique in active learning, despite being memory intensive.</p><p>In Pool-based sampling we identify the &quot;<strong>information usefulness</strong>&quot; of all given training examples, and select the top N examples for training our model.</p><p>So for example, if we already have 1000 perfectly labelled data points, we can train on 800 labeled examples and validate on remaining 200 examples. The model so generated will now enable us to identify examples out of the rest 9000 unlabeled examples that going to be most helpful to improving performance. These examples will have <strong>lowest predicted precision</strong>. </p><p>We&apos;ll select top N examples out of these lowest predicted precision examples, and label them. </p><p>This new &quot;important&quot; data points of size N along with our previous 1000 will pave path to create a much more effective model.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/07/pool_based_sampling.png" class="kg-image" alt="Active Learning. An alternative to Lengthy Data Labeling process." loading="lazy" width="2000" height="800" srcset="https://adhadse.com/content/images/size/w600/2022/07/pool_based_sampling.png 600w, https://adhadse.com/content/images/size/w1000/2022/07/pool_based_sampling.png 1000w, https://adhadse.com/content/images/size/w1600/2022/07/pool_based_sampling.png 1600w, https://adhadse.com/content/images/size/w2400/2022/07/pool_based_sampling.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Pool-Based sampling</figcaption></figure><h2 id="stream-based-selective-sampling">Stream-Based Selective Sampling</h2><p>Another active learning technique which is intuitive to understand. </p><p>In this technique, as the model is training, the <strong>active learning system determines whether to query for the perfect ground truth label or assign the model-predicted label based on some threshold</strong> set by us. </p><p>Unlike pool based it&apos;s not memory intensive but exhaustive search since each example is required to be examined one-by-one. This can easily exhaust our 1000 limit budget, if model don&apos;t get enough &quot;important&quot; examples soon enough and keep querying for true label.</p><p>Let&apos;s again take an example. We have a a moderate performing model trained using 1000 perfectly labeled data point and we want to increase performance on top of this. For that, we consider labeling 1000 more examples using stream-based selective sampling. We would go through the remaining 9,000 examples in our dataset one-by-one, ask model to evaluate. If the confidence is lower than the set threshold, ask labeller to assign label to it; otherwise leave the prediction output as generated label.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/07/stream_based_selective_samping-2.png" class="kg-image" alt="Active Learning. An alternative to Lengthy Data Labeling process." loading="lazy" width="2000" height="757" srcset="https://adhadse.com/content/images/size/w600/2022/07/stream_based_selective_samping-2.png 600w, https://adhadse.com/content/images/size/w1000/2022/07/stream_based_selective_samping-2.png 1000w, https://adhadse.com/content/images/size/w1600/2022/07/stream_based_selective_samping-2.png 1600w, https://adhadse.com/content/images/size/w2400/2022/07/stream_based_selective_samping-2.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Stream-Based selective sampling</figcaption></figure><p>Evidently, we&apos;ll not go through all 9,000 examples since our labeling budget is capped at 1000 examples. So, the resulting dataset might not contain relatively very important data points as compared to Pool-based sampling.</p><h2 id="membership-query-synthesis">Membership Query Synthesis</h2><p>This is an active learning technique wherein we create new training examples based on already available data points. This might sounds spurious, but is actually plausible.</p><p>Analyzing the trend in the data and then careful use of regression or GANs can expand our starting training dataset. Data Augmentation is another technique often used in balancing datasets and regularizing can be used for generating new data points. </p><p>This technique is less limiting as compared to above two methods but requires a careful analysis of dataset and variations possible in the examples. If some variations in subset of examples are missing, then effective data augmentation technique will be required to fulfill that missing space. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://adhadse.com/content/images/2022/08/augly_example.png" class="kg-image" alt="Active Learning. An alternative to Lengthy Data Labeling process." loading="lazy" width="1920" height="1080" srcset="https://adhadse.com/content/images/size/w600/2022/08/augly_example.png 600w, https://adhadse.com/content/images/size/w1000/2022/08/augly_example.png 1000w, https://adhadse.com/content/images/size/w1600/2022/08/augly_example.png 1600w, https://adhadse.com/content/images/2022/08/augly_example.png 1920w" sizes="(min-width: 720px) 720px"><figcaption><a href="https://github.com/facebookresearch/AugLy">Augly</a> &#x2013; Image augmentation library</figcaption></figure><p>For example, if we are building image classification model for sea creatures, it is possible that lighting conditions to be not so good and images to be often in lower light conditions. We can use brightness augmentation for this purpose. Or if images are sourced from screenshots in production, we can augment image to be a part of a fake screenshot, and so on.</p><p>Popular library for these kind of augmentations are for example &#x2013; </p><ul><li><a href="https://github.com/facebookresearch/AugLy">Augly</a></li><li><a href="https://github.com/keras-team/keras-cv">Keras-CV</a></li></ul><p>The 1000 labeling budget becomes less of a concern with this technique, as no actual human labeler is required. &#xA0;But you can think of this budget getting spend whenever we synthesize a new example. And for sure you are free to exhaust your labeling budget in this technique.</p><hr><p>That&apos;s all for today.</p><p>Thi is Anurag Dhadse, signing off</p>]]></content:encoded></item></channel></rss>