<div class="header" id="top">
	<div class="cell-logo"><img src="assets/images/logo-white.png" class="logo" (click)="navigate('')" /></div>
	<!--<div class="cell-1">Privacy Policy</div>-->
	<!--<div class="cell-2">Terms of Service</div>-->
	<!--<div class="cell-3">Style Guide</div>-->
</div>

<div class="content">
	<div class="toc">
		<div class="last-updated">Last updated<br />July 23, 2024</div>
		<div class="red"><a href="#top" (click)="navigate('blog/01')">1. Introductions</a></div>
		<div class="orange"><a href="#top" (click)="navigate('blog/02')">2. From the beginning</a></div>
		<div class="yellow"><a href="#top" (click)="navigate('blog/03')">3. The first year</a></div>
		<div class="lightgreen"><a href="#top" (click)="navigate('blog/04')">4. Initial cloud hosting</a></div>
		<div class="green"><a href="#top" (click)="navigate('blog/05')">5. UI/UX</a></div>
		<div class="cyan active"><a href="#top" (click)="navigate('blog/06')">6. Performance tuning</a></div>
		<!-- <div class="blue"><a href="#top" (click)="navigate('blog/07')">7. Mental health</a></div>
		<div class="darkblue"><a href="#top" (click)="navigate('blog/08')">8. Funding vs bootstrapping</a></div>
		<div class="violet"><a href="#top" (click)="navigate('blog/09')">9. What's next</a></div> -->
	</div>
	<div>
		<div class="banner cyan">
			<div class="title">Photonomy thoughts and insights:<br /><span>Performance tuning</span></div>
		</div>
		<div class="text-container">
			<p>In present day, bootstrapping a social media app is bascially unheard of. These apps must scale to
				millions, even billions of users, which is an expensive proposition. These sites are notorious
				for bleeding VC money until reaching an inflection point. These sites have hundreds of people
				responsible for maintaining each and every part of the app, which is something I don't have the
				luxory of. </p>

			<p>The following outlines some of the necessary optimizations that were required to scale:</p>

			<p class="title">Memory optimizations</p>
			<p>As described in <a href="#top" (click)="navigate('blog/04')">Initial cloud hosting</a>, the first issue
				I encountered was memory issues from streaming all the images through the service layer. Having the
				service layer return the images was done as an easy way to control security on each photo.</p>

			<p>For uploading, I would make two calls. The first would stream the original image, which I would then
				compress and store both versions of the file in Backblaze. The second call added metadata, security,
				and file path information to the database. For downloading, I would determine if the user has permissions
				to the photo and stream the compressed version back. Simple, but highly inefficient.</p>

			<p>The first optimization I made was to chunk the files, which
				<a href="https://www.baeldung.com/java-read-lines-large-file" target="_blank">Baeldung <i class="fas fa-external-link-alt"></i></a>
				does a great job of describing various techniques. Instead of loading the entire image into memory,
				the idea is to handle it chunk-by-chunk, releasing memory as needed. This helped, with minimal gains.
			</p>

			<p class="title">Client-side compression</p>

			<p>The bigger problem of passing the images through the service layer was the compression. Compression works
				by finding patterns across the entire file, and removing the recurrences. Compression techniques that I
				have access to require the entire file be loaded into memory. Some claim to perform this in chunks, but
				I could not find a single one that actually succeeded at scale.</p>

			<p>Using the best library I found, compressing a 5MB file actually required 10MB of memory to complete the
				entire upload operation. If 10,000 people are all uploading 5MB files at once, this would require
				unthinkable amounts of memory. At this point I had two options: implement a microservice dedicated to
				compression that dynamically scales with usage, or move the compression client-side.</p>

			<p>If I was an enterprise, I might have chose the microservice. There are benefits to this technique, such as
				reliability by not having to rely on the users device to handle compressing all the photos. However, sometimes
				concessions have to be made. Moving the compression to the client-side required that I send two copies of
				each photo (compressed and uncompressed) for each upload. This helped with memory management, but also
				doubled the amount of calls going to the service layer.</p>

			<p class="title">Pre-signed URLs</p>
			<p>With the above changes, stress testing results were considerably better. It was clear that I could scale
				the current architecture into the thousands, possibly tens of thousands of users, using moderately
				powerful servers with minimal scaling. However, I can't afford to scale this type of setup on my own.</p>

			<p>To truly scale I would need to bypass sending images through the service layer at all. Instead, I would
				need to upload/download them directly from Backblaze using
				<a href="https://www.backblaze.com/docs/cloud-storage-s3-compatible-api#pre-signed-urls" target="_blank">pre-signed URLs <i class="fas fa-external-link-alt"></i></a>.
				The idea is that you authorize the user on the back-end, generate a token, then use the token to upload
				or download a file. The concern using this technique is security. When passing images through the service
				layer, it's easy to control access to that photo and simply not return the stream. When using signed URLs,
				the service layer is now responsible for generating access tokens and supplying these to the front-end.
			</p>

			<p>Along with compression, pre-signed URLs gave the single best gains in terms of scaling.</p>

			<p class="title">Threading</p>
			<p>Now that we don't have to send files through service layer, we can thread the uploads on the client-side. I
				went from sending 1 file at a time to 5, which was a true 5x gain in terms of time to upload photos from
				the users perspective.</p>

			<p>The largest challenge was file handling. Performing hundreds of calls to Backblaze, need bullet-proof
				error handling.
			</p>

			<p class="title">Additional optimization techniques</p>
			<p><strong>CDN</strong>: A CDN caches your sites data at various geographical points. It not only speeds up the site, but also
				reduces access fees from the storage providers. Make sure your providers are part of the
				<a href="https://www.cloudflare.com/bandwidth-alliance/" target="_blank">Bandwidth Alliance <i class="fas fa-external-link-alt"></i></a>.
			</p>

			<p><strong>Optimize SQL queries</strong>: I spent multiple days optimizing my queries where possible. I
				specifically targeted sums, counts, and joins. I also removed the use of the random() query, which I was
				using to display results in a randomized order. This feature was really cool, but the query is brutal on
				large databases, which mine will eventually qualify as. I will revisit this in the future.</p>

			<p><strong>Database connection pool issues</strong>: Due to the nature of the app, Photonomy requires a lot of queries to the database to retrieve information
				about photos. SaaS database solutions can be great, but beware of connection pool limitations, especially
				when using the lower-tier options.
				Render does provide useful documentation for understanding the
				<a href="https://docs.render.com/postgresql-connection-pooling" target="_blank">connection pool <i class="fas fa-external-link-alt"></i></a>.
			</p>

			<!-- <p class="title">Moving image paths</p>
						<p>One issue I encountered was
							Requires a copy and delete.</p> -->

			<!-- <p class="title">Profile images</p>
			<p>...</p> -->

			<p class="title">Updated architecture</p>
			<p>While I started with Heroku, I did switch to hosting the service layer in
				<a href="https://render.com/" target="_blank">Render <i class="fas fa-external-link-alt"></i></a>.
				The longer term goal is to migrate to a
				<a href="https://kubernetes.io/" target="_blank">Kubernetes <i class="fas fa-external-link-alt"></i></a>
				cluster which will reduce costs. The current architecture is as follows:
			</p>
			<p><img src="/assets/images/blog/tad-02.jpg" /></p>
		</div>
	</div>
</div>
<app-footer></app-footer>